WO2022116207A1 - Coding method, decoding method, coding apparatus, and decoding apparatus - Google Patents

Coding method, decoding method, coding apparatus, and decoding apparatus Download PDF

Info

Publication number
WO2022116207A1
WO2022116207A1 PCT/CN2020/134085 CN2020134085W WO2022116207A1 WO 2022116207 A1 WO2022116207 A1 WO 2022116207A1 CN 2020134085 W CN2020134085 W CN 2020134085W WO 2022116207 A1 WO2022116207 A1 WO 2022116207A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
image
communication link
network model
encoding
Prior art date
Application number
PCT/CN2020/134085
Other languages
French (fr)
Chinese (zh)
Inventor
周焰
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/134085 priority Critical patent/WO2022116207A1/en
Priority to CN202080078054.3A priority patent/CN114731406A/en
Publication of WO2022116207A1 publication Critical patent/WO2022116207A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to the field of image processing, and more particularly, to an encoding method, a decoding method, an encoding device, and a decoding device.
  • Another way is to send the model parameters of the neural network model to the decoder through the code stream, so that the model parameters can be flexibly adjusted according to the needs of the encoder.
  • the amount is generally relatively large. If it is directly put into the code stream for transmission, it will increase the bit consumption and reduce the video compression rate.
  • coding based on neural network coding technology brings greater challenges to the low-latency transmission of image transmission scenarios due to the management and transmission requirements of model parameters.
  • the present application provides an encoding method, a decoding method, an encoding device, and a decoding device, which can well solve the management and transmission problems of model parameters of a neural network model; and alleviate the bit consumption of the code stream; The management of model parameters and transmission requirements challenge low-latency transmission.
  • a first aspect provides an encoding method, comprising: encoding an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded; transmitting the code of the image to be encoded through a first communication link Streaming; transmitting model parameters of the neural network model included in the neural network-based encoding technique through the second communication link.
  • a decoding method comprising: receiving a code stream of an image to be decoded through a first communication link; receiving model parameters of a neural network model through a second communication link; using the model parameters of the neural network model to pair The code stream is decoded to obtain a decoded image.
  • an encoding device comprising: a processor for encoding an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded; transmitting the image through a first communication link The code stream of the image to be encoded; the model parameters of the neural network model included in the neural network-based encoding technology are transmitted through the second communication link.
  • a decoding device comprising a code stream for receiving an image to be decoded through a first communication link; receiving model parameters of a neural network model through a second communication link; using the model parameters of the neural network model The code stream is decoded to obtain a decoded image.
  • an encoding apparatus including a processor and a memory.
  • the memory is used for storing a computer program
  • the processor is used for calling and running the computer program stored in the memory to execute the method in the above-mentioned first aspect or each implementation manner thereof.
  • a decoding apparatus including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or each of its implementations.
  • a chip is provided for implementing the method in the above-mentioned first aspect or each of its implementation manners.
  • the chip includes: a processor for invoking and running a computer program from a memory, so that a device installed with the chip executes the method in the first aspect or each of its implementations.
  • a chip is provided for implementing the method in the second aspect or each of its implementation manners.
  • the chip includes: a processor for invoking and running a computer program from a memory, so that a device on which the chip is installed executes the method in the second aspect or each of its implementations.
  • a computer-readable storage medium for storing a computer program, the computer program comprising instructions for performing the method in the first aspect or any possible implementation manner of the first aspect.
  • a computer-readable storage medium for storing a computer program, the computer program comprising instructions for performing the method of the second aspect or any possible implementation of the second aspect.
  • a computer program product comprising computer program instructions, the computer program instructions causing a computer to execute the method in the first aspect or each implementation manner of the first aspect.
  • a twelfth aspect provides a computer program product, comprising computer program instructions, the computer program instructions causing a computer to execute the method in the second aspect or each implementation manner of the second aspect.
  • the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end respectively receives the code stream through the above two communication links.
  • the model parameters of the neural network model which can well solve the management and transmission problems of the model parameters of the neural network model; since the model parameters of the neural network model are transmitted through the second communication link, the bit consumption of the code stream is alleviated; in addition, It can also reduce the challenge of low-latency transmission due to management of model parameters and transmission requirements.
  • Fig. 1 is the architecture diagram of applying the technical solution of the embodiment of the present application.
  • FIG. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
  • Fig. 5 is the schematic diagram of the image transmission transmission of the intelligent coding technology provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a coding framework 2 provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a training neural network model provided by an embodiment of the present application.
  • FIG. 8a is a schematic flowchart of a video encoder applying an intelligent encoding technology provided by an embodiment of the present application
  • 8b is a schematic flowchart of a video decoder applying an intelligent coding technology provided by an embodiment of the present application
  • FIG. 9 is a schematic diagram of an encoding device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a decoding apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an encoding apparatus provided by another embodiment of the present application.
  • FIG. 12 is a schematic diagram of a decoding apparatus provided by another embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • FIG. 1 is an architectural diagram of applying the technical solution of the embodiment of the present application.
  • the system 100 may receive data 102 to be processed, process the data 102 to be processed, and generate processed data 108 .
  • system 100 may receive data to be encoded and encode the data to be encoded to generate encoded data, or system 100 may receive data to be decoded and decode the data to be decoded to generate decoded data.
  • components in system 100 may be implemented by one or more processors, which may be processors in computing devices or processors in mobile devices (eg, drones).
  • the processor may be any type of processor, which is not limited in this embodiment of the present invention.
  • the processor may include an encoder, a decoder, or a codec, among others.
  • One or more memories may also be included in system 100 .
  • the memory may be used to store instructions and data, for example, computer-executable instructions, data to be processed 102 , processed data 108 , etc. that implement the technical solutions of the embodiments of the present invention.
  • the memory may be any type of memory, which is also not limited in this embodiment of the present invention.
  • the data to be encoded may include text, images, graphic objects, animation sequences, audio, video, or any other data that needs to be encoded.
  • the data to be encoded may include sensory data from sensors, which may be visual sensors (eg, cameras, infrared sensors), microphones, near-field sensors (eg, ultrasonic sensors, radar), position sensors, temperature sensor, touch sensor, etc.
  • the data to be encoded may include information from the user, eg, biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA sampling, and the like.
  • FIG. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application.
  • each frame of the video to be encoded is encoded in sequence.
  • the current coded frame mainly undergoes: prediction (Prediction), transformation (Transform), quantization (Quantization) and entropy coding (Entropy Coding), etc., and finally outputs the code stream of the current coded frame.
  • the decoding process usually decodes the received code stream according to the inverse process of the above process, so as to recover the video frame information before decoding.
  • the video coding framework 2 includes a coding control module 201 for performing decision control actions and parameter selection in the coding process.
  • the encoding control module 201 controls parameters used in transformation, quantization, inverse quantization, and inverse transformation, controls selection of intra-frame mode or inter-frame mode, and parameter control of motion estimation and filtering, And the control parameters of the encoding control module 201 will also be input into the entropy encoding module to be encoded to form a part of the encoded code stream.
  • the coded frame is divided 202, and specifically, the coded frame is divided into slices first, and then divided into blocks.
  • the coding frame is divided into a plurality of non-overlapping largest coding tree units (Coding Tree Units, CTUs), and each CTU can also be in the form of a quad tree, or a binary tree, or a ternary tree, respectively.
  • CTUs Coding Tree Units
  • a CU may also include a prediction unit (Prediction Unit, PU) and a transform unit (Transform Unit, TU) associated with it, where PU is the basic unit of prediction, and TU is the basic unit of transform and quantization.
  • Prediction Unit PU
  • Transform Unit TU
  • PUs and TUs are obtained by dividing into one or more blocks on the basis of CUs, wherein one PU includes multiple prediction blocks (Prediction Blocks, PBs) and related syntax elements.
  • the PU and the TU may be the same, or may be obtained by the CU through different partitioning methods.
  • At least two of the CUs, PUs, and TUs are the same, eg, CUs, PUs, and TUs are not distinguished, and all are predicted, quantized, and transformed in units of CUs.
  • the CTU, CU or other formed data units are referred to as coding blocks.
  • a data unit targeted for video coding may be a frame, a slice, a coding tree unit, a coding unit, a coding block, or a group of any of the above.
  • the size of the data unit may vary.
  • a prediction process is performed to remove redundant information in the spatial and temporal domains of the current coded frame.
  • the commonly used predictive coding methods include intra-frame prediction and inter-frame prediction. Intra-frame prediction only uses the reconstructed information in this frame to predict the current coding block, while inter-frame prediction uses the information in other frames (also called reference frames) that have been reconstructed before to predict the current coding block. Make predictions.
  • the encoding control module 201 is configured to decide whether to select intra-frame prediction or inter-frame prediction.
  • the process of intra-frame prediction 203 includes obtaining the reconstructed block of the coded adjacent blocks around the current coding block as a reference block, and using the prediction mode method to calculate the prediction value based on the pixel value of the reference block to generate the prediction block. , the corresponding pixel values of the current coding block and the prediction block are subtracted to obtain the residual of the current coding block.
  • the residual of the current coding block is transformed 204 , quantized 205 and entropy encoded 210 to form the code stream of the current coding block. Further, all the encoded blocks of the current encoded frame form a part of the encoded code stream of the encoded frame after undergoing the above encoding process.
  • the control and reference data generated in the intra prediction 203 are also encoded by entropy encoding 210 to form part of the encoded code stream.
  • the transform 204 is used to de-correlate the residuals of image blocks in order to improve coding efficiency.
  • two-dimensional discrete cosine transform Discrete Cosine Transform, DCT
  • two-dimensional discrete sine transform Discrete Sine Transform, DST
  • quantization 205 is used to further improve the compression efficiency.
  • the quantized coefficients can be obtained, and then the quantized coefficients are entropy encoded 210 to obtain the residual code stream of the current encoding block.
  • the entropy encoding method includes: But not limited to content adaptive binary arithmetic coding (Context Adaptive Binary Arithmetic Coding, CABAC) entropy coding.
  • CABAC Context Adaptive Binary Arithmetic Coding
  • the bit stream obtained by entropy encoding and the encoded encoding mode information are stored or sent to the decoding end.
  • inverse quantization 206 is also performed on the quantized result, and inverse transformation 207 is performed on the inverse quantization result.
  • the reconstructed pixels are obtained using the inverse transformation result and the motion compensation result. Afterwards, the reconstructed pixels are filtered (ie loop filtered) 211. After 211, the filtered reconstructed image (belonging to the reconstructed video frame) is output. Subsequently, the reconstructed image can be used as a reference frame image of other frame images for inter-frame prediction. In this embodiment of the present application, the reconstructed image may also be referred to as a reconstructed image or a reconstructed image.
  • the coded adjacent blocks in the process of intra-frame prediction 203 are: adjacent blocks that have been coded before the current coded block is coded, and the residuals generated in the coding process of the adjacent blocks are transformed 204, quantized 205, After inverse quantization 206 and inverse transform 207, the reconstructed block is obtained by adding the prediction block of the adjacent block.
  • inverse quantization 206 and inverse transform 207 are inverse processes of quantization 206 and transform 204, and are used to restore residual data before quantization and transform.
  • the inter prediction process when the inter prediction mode is selected, the inter prediction process includes motion estimation (Motion Estimation, ME) 208 and motion compensation (Motion Compensation, MC) 209 .
  • the encoder can perform motion estimation 208 according to the reference frame image in the reconstructed video frame, and search for the image block most similar to the current encoding block in one or more reference frame images as the prediction block according to certain matching criteria,
  • the relative displacement between the prediction block and the current coding block is the motion vector (Motion Vector, MV) of the current coding block.
  • the original value of the pixel of the coding block is subtracted from the pixel value of the corresponding prediction block to obtain the residual of the coding block.
  • the residual of the current coded block is transformed 204, quantized 205 and entropy coded 210 to form a part of the coded code stream of the coded frame.
  • motion compensation 209 may be performed based on the motion vector and the prediction block determined above to obtain the current coding block.
  • the reconstructed video frame is a video frame obtained after filtering 211 .
  • the reconstructed video frame includes one or more reconstructed images.
  • Filtering 211 is used to reduce compression distortions such as blocking and ringing effects during the encoding process.
  • the reconstructed video frame is used to provide reference frames for inter-frame prediction during the encoding process.
  • the reconstructed video frame is post-processed and output. for the final decoded video.
  • the inter prediction mode may include an advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) mode, a merge (Merge) mode, or a skip (skip) mode.
  • AMVP Advanced Motion Vector Prediction
  • merge Merge
  • skip skip
  • the motion vector prediction can be determined first. After the MVP is obtained, the starting point of the motion estimation can be determined according to the MVP, and a motion search can be performed near the starting point. After the search is completed, the optimal MV, the position of the reference block in the reference image is determined by the MV, the reference block is subtracted from the current block to obtain the residual block, the MV is subtracted from the MVP to obtain the Motion Vector Difference (MVD), and the difference between the MVD and the MVP is obtained.
  • the index is transmitted to the decoder through the code stream.
  • the MVP can be determined first, and the MVP can be directly determined as the MV of the current block.
  • a MVP candidate list (merge candidate list) can be constructed first.
  • the MVP candidate list at least one candidate MVP can be included, and each candidate MVP can have an index corresponding to the MVP candidate list.
  • the MVP index can be written into the code stream, and the decoder can find the MVP corresponding to the index from the MVP candidate list according to the index, so as to decode the image block.
  • Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoder determines that the current block is basically the same as the reference block, it does not need to transmit the residual data, only the index of the MVP needs to be passed, and further a flag can be passed, which can indicate that the current block can be directly Obtained from the reference block.
  • the decoding end perform operations corresponding to the encoding end.
  • the residual information is obtained by entropy decoding, inverse quantization and inverse transformation, and according to the decoded code stream, it is determined whether the current image block uses intra-frame prediction or inter-frame prediction. If it is intra-frame prediction, use the reconstructed image blocks in the current frame to construct prediction information according to the intra-frame prediction method; if it is inter-frame prediction, you need to parse out the motion information, and use the parsed motion information in the reconstructed image. Then, the prediction information and the residual information are superimposed, and the reconstruction information can be obtained through the filtering operation.
  • This application mainly relates to the encoding and decoding of images or videos based on neural network encoding and decoding technology (also referred to as “intelligent encoding and decoding technology”). Therefore, the following briefly introduces the relevant content of neural networks.
  • Neural network also known as “artificial” neural network (Artificial Neural Network, ANN)
  • NN also known as "artificial” neural network
  • ANN Artificial Neural Network
  • Neural network systems consist of many simple and highly interconnected processing components that process information through dynamic state responses to external inputs.
  • the processing components can be thought of as neurons in the human brain, where each perceptron accepts multiple inputs and computes a weighted sum of the inputs.
  • perceptrons are considered mathematical models of biological neurons.
  • these interconnected processing components are typically organized in layers.
  • the external input can correspond to a pattern presented to the network that communicates with one or more intermediate layers, also known as “hidden layers", where the actual processing is done through a system of weighted "connections".
  • ANNs can use different architectures to specify which variables are involved in the network and their topological relationships.
  • the variables involved in a neural network might be the weights of connections between neurons, and the activity of neurons.
  • Most artificial neural networks contain some form of "learning rule" that modifies the weights of connections based on the input patterns presented. In a sense, artificial neural networks learn by example just like their biological counterparts.
  • Deep Neural Networks or deep multilayer neural networks correspond to neural networks with multiple levels of interconnected nodes, which allow them to compactly represent highly nonlinear and highly varying functions.
  • the computational complexity of DNNs grows rapidly with the number of nodes associated with a large number of layers.
  • one way is to fix the model parameters of the neural network model in a common library file available to both the encoder and the decoder.
  • the encoder and decoder can obtain the model parameters of the neural network model from the common library file when encoding or decoding, but this method will reduce the effect once some encoding tools of the encoder are changed.
  • Another way is to send the model parameters of the neural network model to the decoder through the code stream, so that the model parameters can be flexibly adjusted according to the needs of the encoder.
  • the amount is generally relatively large. If it is directly put into the code stream for transmission, it will increase the bit consumption and reduce the video compression rate.
  • coding based on neural network coding technology brings greater challenges to the low-latency transmission of image transmission scenarios due to the management and transmission requirements of model parameters.
  • the present application provides an encoding method, a decoding method, an encoding device, and a decoding device, which can well solve the management and transmission problems of model parameters of a neural network model, and also alleviate the bit consumption of the code stream. There are challenges for low-latency transmission due to management of model parameters and transmission requirements.
  • the encoding method 300 provided by this embodiment of the present application will be described in detail below with reference to FIG. 3 .
  • FIG. 3 is a schematic diagram of an encoding method 300 provided by an embodiment of the present application, and the method 300 may include steps 310-330.
  • using a neural network-based encoding technology to encode an image to be encoded can be understood as: using a neural network model-based encoding module to replace or supplement part of the encoding and decoding framework
  • the encoding module encodes the to-be-encoded image, or replaces the original encoding frame with a neural network model-based encoding frame to encode the to-be-encoded image.
  • the neural network-based coding technology mainly includes three aspects: hybrid neural network video coding (that is, the neural network coding module is embedded in the traditional video coding framework instead of the traditional coding module), neural network rate-distortion optimization coding and end-to-end neural network coding Network video encoding.
  • a coding module based on a neural network model can be used to replace part of the coding module in the coding framework to be coded. image to encode.
  • a filtering module based on a neural network model may also be added to the existing filtering module to filter the reconstructed image.
  • the intra-frame prediction method based on neural network is compared with the original intra-frame prediction method to determine the optimal intra-frame prediction method for prediction;
  • the image super-resolution technology based on neural network can be used for prediction, which can improve the performance of motion estimation, thereby improving the efficiency of inter prediction.
  • the context probability estimation method based on neural network technology can be used to replace the traditional rule-based context probability prediction model, which is applied to the entropy coding process of coefficient coding or some other syntax elements.
  • a Neural Network Filter (NNF) technology can be added after deblocking filtering.
  • End-to-end neural network video coding is to completely abandon the traditional hybrid video coding framework, directly input video images and related information, and use neural network-based prediction methods for prediction and compression coding.
  • the neural network model in this embodiment of the present application may include a neural network model trained offline or a neural network model trained online, which is not limited.
  • the neural network model in the embodiments of the present application may include DNN, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or other neural network variants, etc., which is not specifically limited in the present application .
  • the model parameters of the neural network model in the embodiments of the present application include, but are not limited to, the number of layers of the neural network, the number of neurons, the weight of connections between neurons, and the like.
  • FIG. 4 is a schematic diagram of a decoding method 400 provided by an embodiment of the present application, and the method 400 may include steps 410-430.
  • a decoding module based on a neural network model can be used to replace part of the decoding in the decoding framework.
  • the module decodes the image to be decoded.
  • a filtering module based on a neural network model may also be added to the existing filtering module to filter the predicted reconstructed image.
  • the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end respectively receives the code stream through the above two communication links.
  • the model parameters of the neural network model which can well solve the management and transmission problems of the model parameters of the neural network model; since the model parameters of the neural network model are transmitted through the second communication link, the bit consumption of the code stream is alleviated; in addition, It can also reduce the challenge of low-latency transmission due to management of model parameters and transmission requirements.
  • the first communication link and the second communication link have different physical characteristics.
  • the physical characteristics in the embodiments of the present application may be determined by specific application requirements, and may include, for example, transmission delay and/or transmission bandwidth. However, it should be understood that the physical properties may also be other properties, which should not be particularly limited to the present application.
  • the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link.
  • the first communication link has a lower transmission delay, or, the transmission bandwidth is lower, or, the transmission delay and the transmission bandwidth are both lower; in other words, The transmission delay and/or transmission bandwidth of the second communication link is higher than that of the first communication link.
  • the encoding end may transmit the encoded code stream in real time, so that the decoding end can decode it in time. Therefore, it may be considered that the first communication link with lower time delay transmits the code stream of the image to be encoded.
  • the model parameters of the neural network model can be transmitted through the second communication link.
  • the model parameters of the neural network model with a large number are more complex, therefore, it may be considered to transmit the model parameters of the neural network model by using a second communication link with a higher bandwidth.
  • the solution provided by this application can be well solved by transmitting the code stream of the image to be encoded through the first communication link with lower transmission delay, and transmitting the model parameters of the neural network model through the second communication link with higher transmission bandwidth.
  • the management and transmission of the model parameters of the neural network model can further reduce the challenge of low-latency transmission due to the management and transmission requirements of the model parameters of the neural network model.
  • the first communication link includes a link with a delay less than or equal to a first threshold
  • the second communication link includes a link with a bandwidth greater than or equal to a second threshold
  • the first threshold and/or the second threshold in the embodiment of the present application may be specified by a protocol or configured by a server; the first threshold and/or the second threshold may be a fixed value, or a continuously adjusted and changed value; No restrictions.
  • the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol
  • the second communication link includes a link based on a mobile communication protocol
  • the private image transmission protocol includes a software radio (Software Defination Radio, SDR) protocol
  • the wireless local area network protocol includes a wireless fidelity (Wireless Fidelity, WiFi) protocol
  • the mobile communication protocol Including 4G or 5G protocols.
  • the private image transmission protocol shows the SDR protocol
  • the private image transmission protocol may also include other protocols, such as the Open Network Video Interface Forum (ONVIF) protocol, etc., No restrictions.
  • OVIF Open Network Video Interface Forum
  • the wireless local area network protocol in the embodiment of the present application may also include other protocols, such as Bluetooth (Bluetooth), ZigBee (ZigBee), etc., which are not limited.
  • Bluetooth Bluetooth
  • ZigBee ZigBee
  • the mobile communication protocol in the embodiments of the present application may also include other protocols, such as the future 6G protocol.
  • the first communication link may also be a link based on the SDR protocol
  • the second communication link may be a link based on the WiFi protocol.
  • the first communication link includes a private image transmission link
  • the second communication link includes a public network transmission link
  • FIG. 5 it is a schematic diagram of the image transmission transmission of the intelligent coding technology provided by the embodiment of the present application.
  • the acquisition end (which can also be understood as the encoding end) collects the video
  • the code stream and the corresponding model parameters of the neural network model are obtained after compression and encoding by the encoder and the neural network computing platform, and then the code stream and the corresponding neural network model parameters are obtained through the wireless image transmission system.
  • the model parameters of the neural network model are transmitted to the display side.
  • the display end After receiving the code stream and the model parameters of the corresponding neural network model, the display end (also known as the decoding end) decodes the reconstructed video through the decoder of the display end and the neural network computing platform to obtain the reconstructed video, and displays the reconstruction through the display of the display end. video.
  • the above-mentioned wireless image transmission system may be a wireless video transmission system, and may include a private image transmission link and a public network transmission link for respectively transmitting the code stream and model parameters of the neural network model.
  • the private image transmission link in the embodiment of the present application may include a link based on the SDR protocol, a link based on the WiFi protocol, or a link based on the ONVIF protocol;
  • the public network transmission link in the embodiment of the present application may include a link based on 4G or 5G or 6G protocol link.
  • the solution provided by this application transmits the code stream of the image to be encoded through the private image transmission link, and transmits the model parameters of the neural network model through the public network transmission link, which can well solve the management and transmission problems of the model parameters of the neural network model. .
  • the first communication link and/or the second communication link is selected from one or more of the following links:
  • the encoding end when the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model, it can also be flexibly selected from one or more links, thereby improving flexibility.
  • neural network models include offline training neural network models or online training neural network models.
  • the following two neural network models will be used as examples to introduce the use of neural network-based encoding technology to encode images related to encoding. content.
  • Case 1 Encoding the to-be-encoded image using the encoding technology based on the online training neural network
  • the neural network model is an online training neural network model
  • the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
  • the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
  • the target image in this embodiment of the present application may be an image after the image to be encoded is divided according to any one of a video sequence level, a Group of Picture (GOP) level, and a picture level (or frame level).
  • the length of the GOP can be defined independently.
  • the image from the current I frame to the next I frame in the video sequence can be used as a GOP.
  • GOP includes a group of continuous pictures, consisting of one I frame and several B frames and/or P frames, and is the basic unit accessed by video image encoders and decoders.
  • the I frame (also known as the key frame) is the intra-coded image frame, which can be understood as the complete preservation of this frame;
  • the B frame is the bidirectional reference frame or the bidirectional difference frame, which can be understood as the B frame records the current frame and the The difference between the before and after frames, the previous cached image and the decoded image are obtained when decoding, and the final image is obtained by superimposing the data of the previous and previous images and the data of the current frame;
  • P frame is the forward reference frame or forward prediction frame, it can be understood For the difference between this frame and the previous frame, when decoding, it is necessary to superimpose the difference defined in this frame with the previously cached image to generate the final image.
  • target images in the embodiments of the present application differ according to the levels of the images to be encoded.
  • encoding the n-th target image using the neural network model obtained by training the n-m-th target image can be understood as:
  • the image is used as a target image, which is encoded based on the pre-trained neural network model.
  • the neural network model obtained by training the n-mth target image to encode the nth target image can be understood as: using the n-mth GOP for training to obtain
  • the neural network model encodes each image included in the nth GOP, where the GOP includes one I frame and several B frames and/or P frames.
  • the neural network model obtained by training the n-mth target image to encode the n-th target image can be understood as: using the n-mth image to obtain training
  • the neural network model encodes the nth image, and the image here can be understood as an image frame, such as the I frame, B frame or P frame above.
  • n ⁇ n that is, any target image before the nth target image can be used to encode it.
  • the neural network model obtained by training the second target image can be used to encode it, and at the same time, the model parameters of the neural network model obtained by training the second target image are transmitted to decoding. end, which is convenient for decoding at the decoding end; for the third target image, the neural network model obtained by training the first target image can be used to encode it, because encoding the second target image will transmit the first target image for processing.
  • the model parameters of the neural network model obtained by training therefore, in this case, the model parameters of the neural network model used by the third target image can not be transmitted additionally, and only the The model parameters are enough, especially for a complex neural network model with a large number of layers, this method can not only meet the requirements of delay transmission due to the large number of model parameters, but also further save bandwidth.
  • the solution provided by the present application by using the neural network model obtained by training the n-m th target images to encode the n-th target image, can improve the flexibility of encoding, can be adapted to various scenarios, and is suitable for various The neural network model trained by the scene, the prediction effect based on the neural network model is better.
  • the encoding of the model parameters of the neural network model obtained by training with spaced target images can not only meet the requirement of delayed transmission due to too many model parameters, but also further save bandwidth.
  • the neural network-based coding technology can be applied to any stage in the coding process, and the following takes the neural network-based filtering technology as an example for description.
  • the encoding of the n th target image by using a neural network model obtained by training the n-m th target images includes:
  • the n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
  • FIG. 6 it is a schematic diagram of an encoding framework 2 provided in an embodiment of the present application.
  • the encoding end may perform filtering 211 on the reconstructed pixel.
  • the reconstructed pixels can be deblocking filter (Deblocking Filter, DF), NNF, sample adaptive offset (Sample adaptive offset, SAO) or adaptive loop filter (Adaptive Loop Fitler, ALF) Any one or more of them, and output the filtered reconstructed image.
  • the image of the first GOP is used as a training set, and the model training process based on the neural network filtering technology is performed.
  • the whole training process can refer to the following process: During the encoding process, the current image to be encoded can be sent to the frame shown in Figure 6 for encoding training. The reconstructed image is subjected to DF, NNF, SAO and ALF. After the encoding of all images in the first GOP is completed, the encoded code stream of all images in the first GOP and the neural network model obtained by training the neural network framework based on all the images in the first GOP are obtained.
  • the neural network model trained by the first GOP is used to perform the filtering technology based on the neural network.
  • the specific filtering process can refer to the following process: when encoding the image in the second GOP, the second GOP is inversely quantized and inversely transformed to obtain a reconstructed image, and then the reconstructed image is sent to the first
  • the filtering module of the neural network model obtained by GOP training obtains the filtered reconstructed image, wherein the filtering module here includes the DF module, the NNF module, the SAO module and the ALF module, and the NNF module includes the first GOP training to obtain the image. neural network model.
  • the neural network model trained by the n-mth GOP is used to perform the filtering technology based on the neural network.
  • the specific filtering process can refer to the following process: when encoding the image in the nth GOP, the nth GOP is inversely quantized and inversely transformed to obtain a reconstructed image, and then the reconstructed image is sent to the deployment of the n-mth GOP.
  • the filtering module of the neural network model obtained by GOP training obtains the filtered reconstructed image.
  • the filtering module here includes DF module, NNF module, SAO module and ALF module, and the NNF module includes training with n-mth GOP The resulting neural network model.
  • filtering order shown in FIG. 6 is only an example, and other orders, such as DF, SAO, NNF, and ALF, should be used, which should not limit the present application.
  • n may be any positive integer smaller than n.
  • the third GOP it can be encoded by using the neural network model trained by the second GOP, or it can be encoded by using the neural network model trained by the first GOP, which is not limited.
  • the flexibility of filtering can be improved.
  • m is a parameter fixed to or preset at the encoding end before encoding; or, m is a parameter formed during encoding.
  • the neural network model is an online training neural network model
  • the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
  • the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair
  • the image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
  • the transmitting, through the second communication link, the model parameters included in the neural network-based encoding technology includes: transmitting, through the second communication link, a model of the neural network model obtained by training the second target image parameter.
  • the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
  • the first target image and the second target image in this embodiment of the present application may be images obtained by dividing the image to be encoded according to any one of video sequence level, GOP level, and image level.
  • a neural network model obtained by training the encoded second target image may be used to encode it.
  • the neural network model obtained by training the encoded first GOP can be used to encode it; assuming that the first target image is the third GOP, it can be encoded using the The neural network model obtained by training the encoded second GOP encodes it; this application does not specifically limit this.
  • q may be a positive integer greater than or equal to 0.
  • the target image as a GOP as an example, if q is 0, the second target image is the previous GOP adjacent to the first target image; if q If q is 1, the second target image is the previous GOP separated from the first target image by one GOP; if q is 2, the second target image is the previous GOP separated from the first target image by two GOPs.
  • the q is a parameter that is fixed or preset at the encoding end before encoding.
  • the encoding end uses the neural network model obtained by training the encoded second target image to encode the first target image to be encoded, which can improve the flexibility of encoding, can adapt to various scenarios, and is suitable for The neural network model trained in various scenarios, the prediction effect based on the neural network model is better.
  • the code stream of the first target image is decoded by using the received model parameters of the neural network model obtained by training the second target image to obtain the first target image.
  • the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
  • the q is a parameter that is solidified or preset at the decoding end before decoding.
  • the decoding end may use the received model parameters of the neural network model obtained by training the second target image to decode the code stream of the first target image.
  • the decoding end can determine the target image to be decoded corresponding to the model parameter of the neural network model obtained by training the second target image according to the parameter q that is solidified or preset at the decoding end.
  • the decoding end can use the model parameters of the neural network model obtained by training the previous GOP adjacent to it for the first target image.
  • a code stream of a target image is decoded; if q is 1, when decoding the first target image, the decoding end can use the model parameters of the neural network model obtained by training the previous GOP separated by one GOP to decode the first target image.
  • the code stream of the image is decoded; if q is 2, when decoding the first target image, the decoding end can use the model parameters of the neural network model obtained by training the previous GOP separated by two GOPs for the first target image.
  • the code stream is decoded.
  • the decoding end uses the received model parameters of the neural network model obtained by training the second target image to decode the code stream of the first target image, which can improve flexibility and adapt to various scenarios. Moreover, for the neural network models trained in various scenarios, the prediction and reconstruction effect based on the neural network model is better.
  • Scenario 2 Encoding the to-be-encoded image using an offline-trained neural network-based encoding technique
  • the neural network model is an offline training neural network model
  • the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
  • the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level.
  • the image after being divided by any one of , the p is an integer greater than or equal to 0.
  • using the offline trained neural network model to encode the p th target image may include: using the offline trained neural network model to predict, transform, quantify, entropy encode, filter the p th target image at least one of them.
  • the training of the neural network model is obtained, and the trained neural network model is obtained.
  • the acquisition end deploys the trained neural network model to the neural network computing platform and encodes the p-th GOP in combination with the encoder to obtain a code stream, and then the code stream and the model parameters of the neural network model are passed through the wireless image transmission system (using dual transmission by means of a communication link) to the display terminal.
  • the display terminal After receiving the model parameters and code stream of the neural network model, the display terminal deploys the neural network model based on the model parameters to the neural network computing platform of the display terminal, and decodes it with the decoder to obtain the reconstructed video and then displays it.
  • the encoding end uses the offline trained neural network model to encode the p-th target image, and transmits the code stream and the model parameters of the neural network model through the first communication link and the second communication link, respectively. It is a good solution to the management and transmission of model parameters of neural network models.
  • the model parameters of the neural network model included in the neural network-based encoding technology can be transmitted through the second communication link.
  • the model parameters of the neural network model trained online, since the model parameters of the neural network model are continuously updated with the encoding, the model parameters of multiple neural network models will be transmitted to the decoding end, and the decoding end needs to determine the corresponding code stream to be decoded.
  • the model parameters of multiple neural network models are also transmitted to the decoding end, and the decoding end also needs to determine the model parameters of the neural network model corresponding to the code stream to be decoded.
  • the following will introduce the relevant content of the decoding end determining the model parameters of the neural network model corresponding to the code stream to be decoded.
  • the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate a neural network model used when encoding the image to be encoded The identity of the model parameters.
  • the neural network model trained by the n-mth GOP is used to perform the filtering technology based on the neural network.
  • the specific filtering process can refer to the following process: when encoding the image in the nth GOP, the image in the n-mth GOP is sent to the HM platform for encoding to obtain a reconstructed image, and then the reconstructed image is sent to the deployment Use the filtering module of the neural network model trained by the n-mth GOP to obtain the filtered reconstructed image, and then perform the subsequent SAO and ALF processes on the reconstructed image to obtain the final reconstructed image.
  • the corresponding code stream when the encoding end encodes the first GOP, may include first indication information, indicating the identifier of the model parameter of the neural network model used when encoding the first GOP; encoding
  • its corresponding code stream may include first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the second GOP; and so on, the encoding terminal is encoding
  • the corresponding code stream may include first indication information, which indicates the identifier of the model parameter of the neural network model used when encoding the nth GOP.
  • the decoding end After receiving the code stream and the model parameters of the neural network model, the decoding end can identify the identifier of the model parameter of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream.
  • the model parameters of the neural network model used in the encoding of the code stream of the image to be decoded are determined, and the code stream is decoded according to the model parameters of the neural network model, thereby obtaining the final reconstructed image.
  • the encoding end trains the neural network model based on the neural network filtering technology
  • the encoding end filters different target images in the image to be encoded, it may Different neural network models are used for filtering.
  • the corresponding code stream may include first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the first GOP; when encoding the second GOP, the neural network model 2 is used to encode it, The corresponding code stream can include the first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the 2nd GOP; when encoding the 3rd GOP, the neural network model 3 is used to filter it, The corresponding code stream may include first indication information, indicating the identifier of the model parameter of the neural network model used when encoding the third GOP.
  • the decoding end can identify the identifier of the model parameter of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream, and based on this If the code stream of the image to be decoded is encoded, the model parameters of the neural network model used in encoding can be determined, and the code stream is decoded according to the model parameters of the neural network model to obtain the final reconstructed image.
  • the encoding end can use any trained neural network model to filter the target image.
  • the neural network model 1 when encoding the first GOP and the second GOP, the neural network model 1 can be used to filter them; when the third GOP is encoded, the neural network model 3 can be used to filter them; there is no limitation.
  • first indication information by adding first indication information to the code stream, it indicates the identifier of the model parameters of the neural network model used by the encoding end when encoding the image to be encoded, so that the decoding end can determine the corresponding data of the image to be decoded during decoding.
  • the model parameters can improve the accuracy of decoding.
  • the code stream of the to-be-encoded image further includes first indication information, where the first indication information is used to indicate that the The identification of the model parameters of the neural network model used when the image to be encoded is encoded. If the image to be encoded is a non-key frame, the first indication information may not be included in the code stream.
  • the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when encoding other frames between the current key frame and the next key frame.
  • the key frame in the embodiment of the present application is the I frame above. It can be understood that if the current frame to be encoded is an I frame, the encoding end may include the first indication in the code stream after encoding the current I frame information, indicating the identification of the model parameters of the neural network model used when encoding the current I frame, so as to facilitate correct decoding at the decoding end.
  • the above-mentioned first indication information indicates the model parameters of the neural network model used when encoding the current I frame.
  • the identifier of the model parameters of the neural network model may also be indicated when encoding other frames (eg, B frames and/or P frames) between the current I frame and the next I frame.
  • the encoding end adds first indication information to the encoded code stream of the current I frame, and the first indication information can be used to indicate the current I frame and the current I frame to the next
  • the decoding end when the decoding end decodes an I frame, if the adjacent frame after the I frame is a B frame and/or a P frame, the model parameters used for decoding the I frame can continue to be used after the I frame. Adjacent B and/or P frames are decoded until the next I frame is decoded.
  • the first indication information included in the code stream indicates the identification of the model parameters of the neural network model used when encoding the current I frame and other frames between the current I frame and the next I frame, which is convenient for the decoding end. Determining the model parameters corresponding to the image to be decoded during decoding can improve the accuracy of decoding.
  • the first indication information may be carried only in the I frame, and the first indication information may not be carried in the B frame or P frame. Further reduce bit consumption.
  • the transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
  • the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
  • the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are received through the second communication link.
  • the encoding end may also transmit the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model through the second communication link, and the decoding end receives the information of the neural network model through the second communication link.
  • an available neural network model can be obtained by offline training based on the existing video scene training set, as a basic neural network model implemented by the coding end.
  • the neural network model can be retrained based on different application scenarios.
  • the encoding end can choose to use the retrained neural network model or the existing basic neural network model for encoding, and decide whether to use the retrained neural network model or the existing basic neural network model.
  • the syntax elements of the retrained neural network model are encoded and written into the codestream.
  • the model parameters of the retrained neural network model can be transmitted through the second communication link, and the syntax elements of the retrained neural network model can be used for encoding and writing into the code stream; when the encoding end selects the existing basic neural network model for encoding, it is not necessary to transmit the model parameters of the retrained neural network model to the decoding end through the second communication link. Writes the syntax elements that are not encoded using the retrained neural network model into the codestream.
  • the decoding end decodes from the received code stream whether to use the identifier of the retrained neural network model, and uses the existing basic neural network model for decoding or uses the retrained neural network model transmitted through the second communication link. to decode.
  • the encoder obtains a usable neural network model 1 through offline training of the existing video scene training set.
  • the encoder determines that the neural network model 1 cannot be used to encode the first GOP, then the first GOP can be used as a training set to perform a neural network based on neural network encoding technology.
  • the encoded code stream of all the images in the first GOP and the neural network model obtained by training the neural network framework based on all the images in the first GOP are obtained.
  • the model parameters of the neural network model can be transmitted to the decoding end through the second communication link, and the syntax elements using the retrained neural network model can be written in the code stream.
  • a similar method can also be used for coding, which will not be repeated here.
  • the encoder obtains a usable neural network model 1 through offline training of the existing video scene training set.
  • the encoder determines that the neural network model 1 can be used to encode the first GOP, then the neural network model 1 can be used to encode the first GOP to obtain all the information in the first GOP.
  • the codestream after the image is encoded, and the syntax elements of the neural network model without retraining are written in the codestream.
  • the model parameters of the neural network model 1 may be preset at the decoding end, or may be transmitted to the decoding end through the second communication link, which is not limited. For other GOPs of the image to be coded, a similar method can also be used for coding, and details are not repeated here.
  • the encoding end can choose to use the existing basic neural network model for encoding, or choose to use the retrained neural network model for encoding, which can improve the flexibility of encoding.
  • the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on one of image groups, frames, or sequences. trained.
  • the code stream transmitted through the first communication link may include second indication information, indicating that the neural network model is trained based on one of image groups, frames, or sequences.
  • the decoding end After receiving the code stream transmitted by the encoding end, the decoding end can determine the training method of the neural network model through the second indication information in the code stream, and determine the training method of the neural network model in combination with the received model parameters of the neural network model. A neural network model, and decoding the code stream based on the neural network model.
  • the encoding end transmits the model parameters of the neural network model through the second communication link.
  • the model parameters of the neural network model may be processed, and the processed model parameters are transmitted to the decoding end, See below for details.
  • the method 300 further includes:
  • the transmitting through the second communication link the model parameters of the neural network model included in the neural network-based coding technology includes: transmitting the compressed model parameters through the second communication link.
  • receiving the model parameters of the neural network model through the second communication link in the above-mentioned step 420 including: receiving the compressed model parameters through the second communication link;
  • Decoding the code stream using the model parameters of the neural network model in the above step 430 includes: decompressing the compressed model parameters to obtain a target format; converting the target format; using the conversion The code stream is decoded according to the model parameters of the latter format.
  • the encoding end can perform format conversion on the model parameters of the neural network model to obtain a target format that can be compressed.
  • the compressed model parameters are obtained by processing, and the compressed model parameters are transmitted to the decoding end through the second communication link.
  • the decoding end can decompress it first to obtain the target format, and then convert the target format to obtain the model parameters of the neural network model corresponding to the code stream, and use the neural network model.
  • the model parameters of the network model decode the code stream.
  • the target format includes Neural Network Exchange Format (NNEF) or Open Neural Network Exchange (Open Neural Network Exchange) ONNX format.
  • NEF Neural Network Exchange Format
  • Open Neural Network Exchange Open Neural Network Exchange
  • the NNEF and ONNX formats in the embodiments of this application are two similar open formats used to represent and exchange neural networks between deep learning frameworks and inference engines. At their core, both formats are based on a set of common operations from which networks can be built.
  • NNEF reduces the fragmentation of machine learning deployments by enabling applications on multiple devices and platforms to use a rich combination of neural network training tools and inference engines, with the main goal of being able to export networks from deep learning frameworks and import them into in the hardware vendor's inference engine.
  • ONNX defines a common set of operators - building blocks for machine learning and deep learning models and a common file format to enable artificial intelligence (AI) developers to integrate models with various frameworks, tools, run When used with encoders and decoders.
  • AI artificial intelligence
  • Both formats can store neural network models generated by commonly used deep learning frameworks.
  • the purpose is to realize the interaction and commonality of neural network models between different deep learning frameworks.
  • the model parameters of the neural network model into the target format, the interaction and generalization of the neural network model between different deep learning frameworks can be realized; in addition, the target format obtained after conversion is compressed and transmitted.
  • the latter model parameters can further save bandwidth.
  • the compressing the model parameters of the target format includes: using a neural network representation (Neural Network Representation, NNR) of the Moving Pictures Experts Group (MPEG)
  • NNR Neural Network Representation
  • MPEG Moving Pictures Experts Group
  • the transmitting the compressed model parameters through the second communication link includes: transmitting the code stream of the NNR through the second communication link.
  • the code stream of the NNR is received through the second communication link; and the code stream of the NNR is decompressed.
  • NNR is a method for representing and compressing a neural network model in a manner similar to video compression coding.
  • the neural network model parameters are compressed into an NNR bitstream composed of multiple NNR units by adopting methods such as weight sparsification, network parameter pruning, quantization, low-rank approximation, predictive coding, and entropy coding.
  • the training of the neural network model may be performed in the following manner.
  • FIG. 7 it is a schematic flowchart of training a neural network model according to an embodiment of the present application.
  • a neural network model is obtained by compressing and encoding the image to be encoded by the encoder and the neural network computing platform.
  • the model parameters in NNEF or ONNX format are compressed to obtain the NNR code stream.
  • the encoding may be performed based on the process shown in FIG. 8a.
  • FIG. 8a it is a schematic flowchart of applying an intelligent coding technology to a video encoder according to an embodiment of the present application.
  • the NNR of MPEG can be obtained by decompression, and the NNR of the MPEG is inversely converted into the format of NNEF or ONNX to obtain the model parameters of the neural network model, and the obtained neural network model
  • the model parameters of the model are deployed on the neural network computing platform.
  • the encoder encodes the image or video, it can be encoded in combination with the neural network computing platform on which the model parameters of the neural network model are deployed to generate an encoded code stream.
  • the coding based on the neural network coding technology can be realized by modifying the traditional encoder.
  • the coding of the intelligent coding-related syntax element can be added to the existing header information syntax parameter set, for example, the sequence parameter set (Sequence Parameter Set, SPS), image parameter set ( Picture Parameter Set (PPS), Slice Header (Slice Header) adds a syntax element to identify the switch that controls the opening or closing of the neural network-based coding technology.
  • This syntax element can be directly added to the above syntax element parameter set, or you can Select to add this syntax element to the User Extension Data of the above syntax element parameter set.
  • decoding can be performed based on the process shown in Figure 8b.
  • FIG. 8b it is a schematic flowchart of applying an intelligent coding technology to a video decoder according to an embodiment of the present application.
  • the MPEG NNR after receiving the NNR code stream, the MPEG NNR can be obtained by decompression, and the MPEG NNR can be inversely converted in the NNEF or ONNX format to obtain the model parameters of the neural network model, and the obtained neural network
  • the model parameters of the model are deployed on the neural network computing platform.
  • the decoder decodes the code stream of the image or video to be decoded, it can be decoded in combination with the neural network computing platform on which the model parameters of the neural network model are deployed to obtain the to-be-decoded image. image or video.
  • the compressing the model parameters of the target format includes: using a compression method of the Artificial Intelligence Industry Technology Innovation Strategic Alliance (AITISA) to compress the The model parameters in the target format are compressed to obtain compressed data;
  • AITISA Artificial Intelligence Industry Technology Innovation Strategic Alliance
  • the transmitting the compressed model parameters through the second communication link includes: transmitting the compressed data through the second communication link.
  • the compressed data is received through the second communication link; and the compressed data is decompressed.
  • the encoding end uses the AITISA compression method to compress the model parameters of the target format, and transmits the compressed data obtained after compression through the second communication link; the decoding end receives the compressed data, and decompresses it. compression.
  • the specific implementation process is similar to the above-mentioned process of compression encoding and decoding using the NNR of MPEG, which is not repeated here.
  • the compressed data in this embodiment of the present application may also be called a compressed code stream, which is not limited.
  • the code stream further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
  • the code stream transmitted through the first communication link may further include third indication information, indicating whether the encoding end has enabled encoding using the neural network encoding technology during encoding.
  • the decoding end can determine whether to use the decoding technology of the neural network to perform decoding according to the third indication information in the code stream.
  • the decoding end uses the neural network decoding technology to decode; if the third indication information indicates that the neural network encoding technology encoding is not enabled, the decoding end does not Use the decoding technology of neural network to decode.
  • the decoding end receives the indication. After receiving the information, it is determined to use the decoding technology of the neural network for decoding; if the third indication information in the code stream indicates "0", after receiving the indication information, the decoding end determines not to use the decoding technology of the neural network for decoding. .
  • the third indication information is added to the code stream to indicate whether encoding using the neural network coding technology is enabled. Determining whether to use the decoding technology of the neural network for decoding can further improve the accuracy of decoding.
  • the transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
  • Some or all of the model parameters of the neural network model are transmitted over the second communication link.
  • part or all of the model parameters of the neural network model are received through the second communication link.
  • the model parameters of the neural network model transmitted through the second communication link may be some model parameters or all model parameters of the model parameters of the neural network model.
  • the part of the model parameters may include but not limited to the number of layers of the neural network, the number of neurons, and other model parameters (including but not limited to the weight of connections between neurons, etc.) can be preset at the encoding end and the decoding end .
  • the decoding end After the decoding end receives part of the model parameters of the model parameters of the neural network model through the second communication link, it can decode the code stream of the image to be decoded in combination with other model parameters preset at the decoding end; or, receive through the second communication link All model parameters to the model parameters of the neural network model decode the code stream of the image to be decoded.
  • the encoding end transmits some model parameters of the model parameters of the neural network model through the second communication link, and other model parameters are preset at the encoding end and the decoding end, which can save the transmission bandwidth, and the decoding end is performing the code stream processing.
  • the decoding accuracy can be guaranteed during decoding; the encoding end transmits all model parameters of the model parameters of the neural network model through the second signal link, and the decoding end can ensure the decoding accuracy when decoding the code stream.
  • FIG. 9 is an encoding apparatus 900 provided by an embodiment of the present application.
  • the encoding apparatus 900 may include a processor 910 .
  • a processor 910 configured to encode an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded;
  • the model parameters of the neural network model included in the neural network-based encoding technique are transmitted over the second communication link.
  • the first communication link may be provided by the first communication module, and the second communication link may be provided by the second image module.
  • the first communication module is an SDR module/wifi module
  • the second communication module is a 4G/5G module.
  • the first communication link and the second communication link have different physical characteristics.
  • the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link.
  • the first communication link includes a link with a delay less than or equal to a first threshold
  • the second communication link includes a link with a bandwidth greater than or equal to a second threshold
  • the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol
  • the second communication link includes a link based on a mobile communication protocol
  • the private image transmission protocol includes a software-defined radio SDR protocol
  • the wireless local area network protocol includes a wireless fidelity WiFi protocol
  • the mobile communication protocol includes a 4G or 5G protocol.
  • the first communication link includes a private image transmission link
  • the second communication link includes a public network transmission link
  • the neural network model includes an offline trained neural network model or an online trained neural network model.
  • the neural network model is an online training neural network model
  • the processor 910 is further configured to:
  • the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
  • the processor 910 is further configured to:
  • the n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
  • the m is a parameter that is solidified or preset at the encoding end before encoding; or, the m is a parameter formed during the encoding process.
  • the neural network model is an online training neural network model
  • the processor 910 is further configured to:
  • the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair
  • the image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
  • the model parameters of the neural network model obtained by training the second target image are transmitted through the second communication link.
  • the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
  • the q is a parameter that is fixed or preset at the encoding end before encoding.
  • the neural network model is an offline training neural network model
  • the processor 910 is further configured to:
  • the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level.
  • the image after being divided by any one of , the p is an integer greater than or equal to 0.
  • the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate a neural network model used when encoding the image to be encoded The identity of the model parameters.
  • the code stream of the to-be-encoded image further includes first indication information, where the first indication information is used to indicate that the The identification of the model parameters of the neural network model used when the image to be encoded is encoded.
  • the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when encoding other frames between the current key frame and the next key frame.
  • the processor 910 is further configured to:
  • the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
  • the code stream of the to-be-encoded image further includes second indication information, where the second indication information is used to indicate that the neural network model is based on an image group, frame, or sequence. a kind of training.
  • the processor 910 is further configured to:
  • the compressed model parameters are transmitted over the second communication link.
  • the target format includes a neural network transformation format NNEF or an open neural network transformation ONNX format.
  • the processor 910 is further configured to:
  • NNR compression method of the Moving Picture Experts Group MPEG Utilize the neural network expression NNR compression method of the Moving Picture Experts Group MPEG to compress the model parameters of the target format to obtain the code stream of NNR;
  • the code stream of the NNR is transmitted through the second communication link.
  • the processor 910 is further configured to:
  • the compressed data is transmitted over the second communication link.
  • the processor 910 is further configured to:
  • At least one of prediction, transformation, quantization, entropy encoding or filtering is performed on the to-be-encoded image by using the neural network-based encoding technology.
  • the code stream of the image to be encoded further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
  • the processor 910 is further configured to:
  • Some or all of the model parameters of the neural network model are transmitted over the second communication link.
  • the first communication link and/or the second communication link is selected from one or more of the following links:
  • FIG. 10 is a decoding apparatus 1000 provided by an embodiment of the present application.
  • the decoding apparatus 1000 may include a processor 1010 .
  • a processor 1010 configured to receive the code stream of the image to be decoded through the first communication link
  • the code stream is decoded using the model parameters of the neural network model to obtain a decoded image.
  • the first communication link and the second communication link have different physical characteristics.
  • the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link.
  • the first communication link includes a link with a delay less than or equal to a first threshold
  • the second communication link includes a link with a bandwidth greater than or equal to a second threshold
  • the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol
  • the second communication link includes a link based on a mobile communication protocol
  • the private image transmission protocol includes a software-defined radio SDR protocol
  • the wireless local area network protocol includes a wireless fidelity WiFi protocol
  • the mobile communication protocol includes a 4G or 5G protocol.
  • the first communication link includes a private image transmission link
  • the second communication link includes a public network transmission link
  • the neural network model includes an offline trained neural network model or an online trained neural network model.
  • the code stream further includes first indication information, where the first indication information is used to indicate the identifier of the model parameter of the neural network model used by the encoding end when encoding the image to be encoded. ;
  • the processor 1010 is further configured to:
  • the code stream is decoded by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
  • the code stream further includes first indication information, and the first indication information is used to indicate that the encoding end is to be encoded.
  • the processor 1010 is further configured to:
  • the code stream is decoded using the model parameters of the neural network model and the identification of the model parameters.
  • the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when decoding other frames between the current key frame and the next key frame.
  • the neural network model is an online training neural network model
  • the processor 1010 is further configured to:
  • the target image is an image obtained by dividing the image to be decoded according to any one of the video sequence level, the picture group level, and the picture level.
  • the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
  • the q is a parameter that is solidified or preset at the decoding end before decoding.
  • the processor 1010 is further configured to:
  • the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are received through the second communication link.
  • the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on one of image groups, frames, or sequences. trained.
  • the processor 1010 is further configured to:
  • the code stream is decoded using the model parameters in the converted format.
  • the target format includes a neural network transformation format NNEF or an open neural network transformation ONNX format.
  • the processor 1010 is further configured to:
  • the processor 1010 is further configured to:
  • the compressed data is decompressed.
  • the processor 1010 is further configured to:
  • At least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction or filtering is performed on the code stream using the model parameters of the neural network.
  • the code stream further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
  • the processor 1010 is further configured to:
  • Some or all of the model parameters of the neural network model are received over the second communication link.
  • the first communication link and/or the second communication link is selected from one or more of the following links:
  • FIG. 11 is a schematic structural diagram of an encoding apparatus provided by still another embodiment of the present application.
  • the encoding apparatus 1100 shown in FIG. 11 includes a processor 1110, and the processor 1110 can call and run a computer program from a memory to implement the encoding method described in the embodiments of the present application.
  • the encoding apparatus 1100 may further include a memory 1120 .
  • the processor 1110 may call and run a computer program from the memory 1120 to implement the encoding method in the embodiments of the present application.
  • the memory 1120 may be a separate device independent of the processor 1110, or may be integrated in the processor 1110.
  • the encoding device 1100 may further include a transceiver 1130, and the processor 1110 may control the transceiver 1130 to communicate with other devices, specifically, may send information or data to other devices, or receive other devices Information or data sent by the device.
  • the processor 1110 may control the transceiver 1130 to communicate with other devices, specifically, may send information or data to other devices, or receive other devices Information or data sent by the device.
  • the encoding device can be, for example, an encoder, a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the encoding device can implement the corresponding processes in the encoding methods of the embodiments of the present application, for the sake of brevity. , and will not be repeated here.
  • FIG. 12 is a schematic structural diagram of a decoding apparatus provided by still another embodiment of the present application.
  • the decoding apparatus 1200 shown in FIG. 12 includes a processor 1210, and the processor 1210 can call and run a computer program from a memory to implement the decoding method described in the embodiments of the present application.
  • the decoding apparatus 1200 may further include a memory 1220 .
  • the processor 1210 may call and run a computer program from the memory 1220 to implement the decoding method in the embodiment of the present application.
  • the memory 1220 may be a separate device independent of the processor 1210, or may be integrated in the processor 1210.
  • the decoding apparatus 1200 may further include a transceiver 1230, and the processor 1210 may control the transceiver 1230 to communicate with other apparatuses, specifically, may send information or data to other apparatuses, or receive other apparatuses Information or data sent by the device.
  • the processor 1210 may control the transceiver 1230 to communicate with other apparatuses, specifically, may send information or data to other apparatuses, or receive other apparatuses Information or data sent by the device.
  • the decoding device can be, for example, a decoder, a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the decoding device can implement the corresponding processes in the decoding methods in the embodiments of the present application, for the sake of brevity. , and will not be repeated here.
  • FIG. 13 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • the chip 1300 shown in FIG. 13 includes a processor 1310, and the processor 1310 can call and run a computer program from a memory to implement the encoding method or the decoding method in the embodiments of the present application.
  • the chip 1300 may further include a memory 1320 .
  • the processor 1310 may call and run a computer program from the memory 1320 to implement the encoding method or the decoding method in the embodiments of the present application.
  • the memory 1320 may be a separate device independent of the processor 1310, or may be integrated in the processor 1310.
  • the chip 1300 may further include an input interface 1330 .
  • the processor 1310 can control the input interface 1330 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.
  • the chip 1300 may further include an output interface 1340 .
  • the processor 1310 can control the output interface 1340 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-chip, or a system-on-a-chip, or the like.
  • the processor in this embodiment of the present application may be an integrated circuit image processing system, which has signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other available Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory Synchlink DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the memory in the embodiments of the present application may provide instructions and data to the processor.
  • a portion of the memory may also include non-volatile random access memory.
  • the memory may also store device type information.
  • the processor may be configured to execute the instruction stored in the memory, and when the processor executes the instruction, the processor may execute each step corresponding to the terminal device in the foregoing method embodiments.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.
  • the pixels in the image may be located in different rows and/or columns, wherein the length of A may correspond to the number of pixels located in the same row included in A, and the height of A may be Corresponds to the number of pixels in the same column included in A.
  • the length and height of A may also be referred to as the width and depth of A, respectively, which are not limited in this embodiment of the present application.
  • distributed at the boundary of A may refer to at least one pixel point away from the boundary of A, and may also be referred to as "not adjacent to the boundary of A” or “not located at the boundary of A”.
  • Embodiments of the present application further provide a computer-readable storage medium for storing a computer program.
  • the computer-readable storage medium can be applied to the encoding device or the decoding device in the embodiments of the present application, and the computer program enables the computer to execute the corresponding processes implemented by the encoding device or the decoding device in each method of the embodiments of the present application. , for brevity, will not be repeated here.
  • Embodiments of the present application also provide a computer program product, including computer program instructions.
  • the computer program product can be applied to the encoding device or the decoding device in the embodiments of the present application, and the computer program instructions cause the computer to execute the corresponding processes implemented by the encoding device or the decoding device in each method of the embodiments of the present application, For brevity, details are not repeated here.
  • the embodiments of the present application also provide a computer program.
  • the computer program can be applied to the encoding device or the decoding device in the embodiments of the present application.
  • the encoding device or the decoding device implements each method in the computer to perform the embodiments of the present application. The corresponding process, for the sake of brevity, will not be repeated here.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application provides a coding method, a decoding method, a coding apparatus, and a decoding apparatus. The coding method comprises: coding, by using a neural network-based coding technology, an image to be coded to obtain a code stream of said image; transmitting the code stream of said image by means of a first communication link; and transmitting, by means of a second communication link, model parameters of a neural network model comprised in the neural network-based coding technology. According to the solution provided by the present application, the management and transmission problems of model parameters of neural network models can be solved well, and the bit consumption of code streams is reduced; in addition, challenges of low-latency transmission caused by the presence of the management and transmission requirements of model parameters can be reduced.

Description

编码方法、解码方法和编码装置、解码装置Encoding method, decoding method, encoding apparatus, decoding apparatus
版权申明Copyright notice
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。The disclosure of this patent document contains material that is subject to copyright protection. This copyright belongs to the copyright owner. The copyright owner has no objection to the reproduction by anyone of the patent document or the patent disclosure as it exists in the official records and archives of the Patent and Trademark Office.
技术领域technical field
本申请涉及图像处理领域,并且更为具体地,涉及一种编码方法、解码方法和编码装置、解码装置。The present application relates to the field of image processing, and more particularly, to an encoding method, a decoding method, an encoding device, and a decoding device.
背景技术Background technique
目前,基于神经网络的编解码技术的研究逐渐引起关注,对于基于神经网络模型的模型参数的管理和传输,一种方式是将神经网络模型的模型参数固定,并将其作为编码器和解码器都可得的一个公共库文件,但是这种方式一旦编码器的某些编码工具发生变化,其效果会降低。At present, the research on encoding and decoding technology based on neural network has gradually attracted attention. For the management and transmission of model parameters based on neural network models, one way is to fix the model parameters of neural network models and use them as encoders and decoders. A common library file is available, but in this way, once some encoding tools of the encoder are changed, its effect will be reduced.
另一种方式是将神经网络模型的模型参数通过码流传送到解码端,这样可以根据编码器的需求对模型参数进行灵活的调整,但是对于层度教深的深度学习神经网络模型来说参数量一般比较大,如果直接放到码流中进行传输增加了比特消耗,降低了视频压缩率。此外,基于神经网络的编码技术进行的编码由于有模型参数的管理和传输需求对图传场景的低延时传输带来了更大的挑战。Another way is to send the model parameters of the neural network model to the decoder through the code stream, so that the model parameters can be flexibly adjusted according to the needs of the encoder. The amount is generally relatively large. If it is directly put into the code stream for transmission, it will increase the bit consumption and reduce the video compression rate. In addition, coding based on neural network coding technology brings greater challenges to the low-latency transmission of image transmission scenarios due to the management and transmission requirements of model parameters.
发明内容SUMMARY OF THE INVENTION
本申请提供一种编码方法、解码方法和编码装置、解码装置,可以很好地解决神经网络模型的模型参数的管理和传输问题;而且缓解了码流的比特消耗;此外,还可以降低由于有模型参数的管理和传输需求对低时延传输的挑战。The present application provides an encoding method, a decoding method, an encoding device, and a decoding device, which can well solve the management and transmission problems of model parameters of a neural network model; and alleviate the bit consumption of the code stream; The management of model parameters and transmission requirements challenge low-latency transmission.
第一方面,提供一种编码方法,包括:利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流;通过第一通信链路传输所述待编码图像的码流;通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。A first aspect provides an encoding method, comprising: encoding an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded; transmitting the code of the image to be encoded through a first communication link Streaming; transmitting model parameters of the neural network model included in the neural network-based encoding technique through the second communication link.
第二方面,提供一种解码方法,包括:通过第一通信链路接收待解码图像的码流;通过第二通信链路接收神经网络模型的模型参数;利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。In a second aspect, a decoding method is provided, comprising: receiving a code stream of an image to be decoded through a first communication link; receiving model parameters of a neural network model through a second communication link; using the model parameters of the neural network model to pair The code stream is decoded to obtain a decoded image.
第三方面,提供一种编码装置,包括:处理器,用于利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流;通过第一通信链路传输所述待编码图像的码流;通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。In a third aspect, an encoding device is provided, comprising: a processor for encoding an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded; transmitting the image through a first communication link The code stream of the image to be encoded; the model parameters of the neural network model included in the neural network-based encoding technology are transmitted through the second communication link.
第四方面,提供一种解码装置,包括用于通过第一通信链路接收待解码图像的码流;通过第二通信链路接收神经网络模型的模型参数;利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。In a fourth aspect, a decoding device is provided, comprising a code stream for receiving an image to be decoded through a first communication link; receiving model parameters of a neural network model through a second communication link; using the model parameters of the neural network model The code stream is decoded to obtain a decoded image.
第五方面,提供了一种编码装置,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第一方面或其各实现方式中的方法。In a fifth aspect, an encoding apparatus is provided, including a processor and a memory. The memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to execute the method in the above-mentioned first aspect or each implementation manner thereof.
第六方面,提供一种解码装置,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第二方面或其各实现方式中的方法。In a sixth aspect, a decoding apparatus is provided, including a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or each of its implementations.
第七方面,提供一种芯片,用于实现上述第一方面或其各实现方式中的方法。In a seventh aspect, a chip is provided for implementing the method in the above-mentioned first aspect or each of its implementation manners.
具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面或其各实现方式中的方法。Specifically, the chip includes: a processor for invoking and running a computer program from a memory, so that a device installed with the chip executes the method in the first aspect or each of its implementations.
第八方面,提供一种芯片,用于实现上述第二方面或其各实现方式中的方法。In an eighth aspect, a chip is provided for implementing the method in the second aspect or each of its implementation manners.
具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第二方面或其各实现方式中的方法。Specifically, the chip includes: a processor for invoking and running a computer program from a memory, so that a device on which the chip is installed executes the method in the second aspect or each of its implementations.
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。In a ninth aspect, a computer-readable storage medium is provided for storing a computer program, the computer program comprising instructions for performing the method in the first aspect or any possible implementation manner of the first aspect.
第十方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于执行第二方面或第二方面的任意可能的实现方式中的方法的指令。In a tenth aspect, a computer-readable storage medium is provided for storing a computer program, the computer program comprising instructions for performing the method of the second aspect or any possible implementation of the second aspect.
第十一方面,提供了一种计算机程序产品,包括计算机程序指令,该计 算机程序指令使得计算机执行上述第一方面或第一方面的各实现方式中的方法。In an eleventh aspect, a computer program product is provided, comprising computer program instructions, the computer program instructions causing a computer to execute the method in the first aspect or each implementation manner of the first aspect.
第十二方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第二方面或第二方面的各实现方式中的方法。A twelfth aspect provides a computer program product, comprising computer program instructions, the computer program instructions causing a computer to execute the method in the second aspect or each implementation manner of the second aspect.
本申请提供的方案,编码端分别通过第一通信链路和第二通信链路传输待编码图像的码流和神经网络模型的模型参数,解码端分别通过上述两个通信链路对应接收码流和神经网络模型的模型参数,可以很好地解决神经网络模型的模型参数的管理和传输问题;由于神经网络模型的模型参数通过第二通信链路传输,缓解了码流的比特消耗;此外,还可以降低由于有模型参数的管理和传输需求对低时延传输的挑战。In the solution provided by this application, the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end respectively receives the code stream through the above two communication links. and the model parameters of the neural network model, which can well solve the management and transmission problems of the model parameters of the neural network model; since the model parameters of the neural network model are transmitted through the second communication link, the bit consumption of the code stream is alleviated; in addition, It can also reduce the challenge of low-latency transmission due to management of model parameters and transmission requirements.
附图说明Description of drawings
图1是应用本申请实施例的技术方案的架构图;Fig. 1 is the architecture diagram of applying the technical solution of the embodiment of the present application;
图2是根据本申请实施例的视频编码框架2示意图;2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application;
图3是本申请一实施例提供的编码方法的示意性流程图;3 is a schematic flowchart of an encoding method provided by an embodiment of the present application;
图4是本申请一实施例提供的解码方法的示意性流程图;4 is a schematic flowchart of a decoding method provided by an embodiment of the present application;
图5是本申请实施例提供的智能编码技术的图传传输的示意图;Fig. 5 is the schematic diagram of the image transmission transmission of the intelligent coding technology provided by the embodiment of the present application;
图6是本申请实施例提供的一种编码框架2的示意图;6 is a schematic diagram of a coding framework 2 provided by an embodiment of the present application;
图7是本申请实施例提供的一种训练神经网络模型的流程示意图;7 is a schematic flowchart of a training neural network model provided by an embodiment of the present application;
图8a是本申请实施例提供的一种视频编码器应用智能编码技术的流程示意图;8a is a schematic flowchart of a video encoder applying an intelligent encoding technology provided by an embodiment of the present application;
图8b是本申请实施例提供的一种视频解码器应用智能编码技术的流程示意图;8b is a schematic flowchart of a video decoder applying an intelligent coding technology provided by an embodiment of the present application;
图9是本申请一实施例提供的编码装置的示意性图;FIG. 9 is a schematic diagram of an encoding device provided by an embodiment of the present application;
图10是本申请一实施例提供的解码装置的示意性图;FIG. 10 is a schematic diagram of a decoding apparatus provided by an embodiment of the present application;
图11是本申请另一实施例提供的编码装置的示意性图;11 is a schematic diagram of an encoding apparatus provided by another embodiment of the present application;
图12是本申请另一实施例提供的解码装置的示意性图;FIG. 12 is a schematic diagram of a decoding apparatus provided by another embodiment of the present application;
图13是本申请实施例提供的芯片的示意性结构图。FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式Detailed ways
下面对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application are described below.
除非另有说明,本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的,不是旨在限制本申请的范围。Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terminology used in this application is for the purpose of describing specific embodiments only and is not intended to limit the scope of the application.
图1是应用本申请实施例的技术方案的架构图。FIG. 1 is an architectural diagram of applying the technical solution of the embodiment of the present application.
如图1所示,***100可以接收待处理数据102,对待处理数据102进行处理,产生处理后的数据108。例如,***100可以接收待编码数据,对待编码数据进行编码以产生编码后的数据,或者,***100可以接收待解码数据,对待解码数据进行解码以产生解码后的数据。在一些实施例中,***100中的部件可以由一个或多个处理器实现,该处理器可以是计算设备中的处理器,也可以是移动设备(例如无人机)中的处理器。该处理器可以为任意种类的处理器,本发明实施例对此不做限定。在一些可能的设计中,该处理器可以包括编码器、解码器或编解码器等。***100中还可以包括一个或多个存储器。该存储器可用于存储指令和数据,例如,实现本发明实施例的技术方案的计算机可执行指令、待处理数据102、处理后的数据108等。该存储器可以为任意种类的存储器,本发明实施例对此也不做限定。As shown in FIG. 1 , the system 100 may receive data 102 to be processed, process the data 102 to be processed, and generate processed data 108 . For example, system 100 may receive data to be encoded and encode the data to be encoded to generate encoded data, or system 100 may receive data to be decoded and decode the data to be decoded to generate decoded data. In some embodiments, components in system 100 may be implemented by one or more processors, which may be processors in computing devices or processors in mobile devices (eg, drones). The processor may be any type of processor, which is not limited in this embodiment of the present invention. In some possible designs, the processor may include an encoder, a decoder, or a codec, among others. One or more memories may also be included in system 100 . The memory may be used to store instructions and data, for example, computer-executable instructions, data to be processed 102 , processed data 108 , etc. that implement the technical solutions of the embodiments of the present invention. The memory may be any type of memory, which is also not limited in this embodiment of the present invention.
待编码数据可以包括文本、图像、图形对象、动画序列、音频、视频、或者任何需要编码的其他数据。在一些情况下,待编码数据可以包括来自传感器的传感数据,该传感器可以为视觉传感器(例如,相机、红外传感器),麦克风、近场传感器(例如,超声波传感器、雷达)、位置传感器、温度传感器、触摸传感器等。在一些情况下,待编码数据可以包括来自用户的信息,例如,生物信息,该生物信息可以包括面部特征、指纹扫描、视网膜扫描、嗓音记录、DNA采样等。The data to be encoded may include text, images, graphic objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensory data from sensors, which may be visual sensors (eg, cameras, infrared sensors), microphones, near-field sensors (eg, ultrasonic sensors, radar), position sensors, temperature sensor, touch sensor, etc. In some cases, the data to be encoded may include information from the user, eg, biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA sampling, and the like.
图2是根据本申请实施例的视频编码框架2示意图。如图2所示,在接收待编码视频后,从待编码视频的第一帧开始,依次对待编码视频中的每一帧进行编码。其中,当前编码帧主要经过:预测(Prediction)、变换(Transform)、量化(Quantization)和熵编码(Entropy Coding)等处理,最终输出当前编码帧的码流。对应的,解码过程通常是按照上述过程的逆过程对接收到的码流进行解码,以恢复出解码前的视频帧信息。FIG. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application. As shown in FIG. 2 , after receiving the video to be encoded, starting from the first frame of the video to be encoded, each frame of the video to be encoded is encoded in sequence. Among them, the current coded frame mainly undergoes: prediction (Prediction), transformation (Transform), quantization (Quantization) and entropy coding (Entropy Coding), etc., and finally outputs the code stream of the current coded frame. Correspondingly, the decoding process usually decodes the received code stream according to the inverse process of the above process, so as to recover the video frame information before decoding.
具体地,如图2所示,所述视频编码框架2中包括一个编码控制模块201,用于进行编码过程中的决策控制动作,以及参数的选择。例如,如图2所示, 编码控制模块201控制变换、量化、反量化、反变换的中用到的参数,控制进行帧内模式或者帧间模式的选择,以及运动估计和滤波的参数控制,且编码控制模块201的控制参数也将输入至熵编码模块中,进行编码形成编码码流中的一部分。Specifically, as shown in FIG. 2 , the video coding framework 2 includes a coding control module 201 for performing decision control actions and parameter selection in the coding process. For example, as shown in FIG. 2, the encoding control module 201 controls parameters used in transformation, quantization, inverse quantization, and inverse transformation, controls selection of intra-frame mode or inter-frame mode, and parameter control of motion estimation and filtering, And the control parameters of the encoding control module 201 will also be input into the entropy encoding module to be encoded to form a part of the encoded code stream.
对当前编码帧开始编码时,对编码帧进行划分202处理,具体地,首先对其进行条带(slice)划分,再进行块划分。可选地,在一个示例中,编码帧划分为多个互不重叠的最大的编码树单元(Coding Tree Unit,CTU),各CTU还可以分别按四叉树、或二叉树、或三叉树的方式迭代划分为一系列更小的编码单元(Coding Unit,CU),一些示例中,CU还可以包含与之相关联的预测单元(Prediction Unit,PU)和变换单元(Transform Unit,TU),其中PU为预测的基本单元,TU为变换和量化的基本单元。一些示例中,PU和TU分别是在CU的基础上划分成一个或多个块得到的,其中一个PU包含多个预测块(Prediction Block,PB)以及相关语法元素。一些示例中,PU和TU可以是相同的,或者,是由CU通过不同的划分方法得到的。一些示例中,CU、PU和TU中的至少两种是相同的,例如,不区分CU、PU和TU,全部是以CU为单位进行预测、量化和变换。为方便描述,下文中将CTU、CU或者其它形成的数据单元均称为编码块。When the coding of the current coded frame is started, the coded frame is divided 202, and specifically, the coded frame is divided into slices first, and then divided into blocks. Optionally, in an example, the coding frame is divided into a plurality of non-overlapping largest coding tree units (Coding Tree Units, CTUs), and each CTU can also be in the form of a quad tree, or a binary tree, or a ternary tree, respectively. Iteratively divided into a series of smaller coding units (Coding Unit, CU), in some examples, a CU may also include a prediction unit (Prediction Unit, PU) and a transform unit (Transform Unit, TU) associated with it, where PU is the basic unit of prediction, and TU is the basic unit of transform and quantization. In some examples, PUs and TUs are obtained by dividing into one or more blocks on the basis of CUs, wherein one PU includes multiple prediction blocks (Prediction Blocks, PBs) and related syntax elements. In some examples, the PU and the TU may be the same, or may be obtained by the CU through different partitioning methods. In some examples, at least two of the CUs, PUs, and TUs are the same, eg, CUs, PUs, and TUs are not distinguished, and all are predicted, quantized, and transformed in units of CUs. For the convenience of description, hereinafter, the CTU, CU or other formed data units are referred to as coding blocks.
应理解,在本申请实施例中,视频编码针对的数据单元可以为帧,条带,编码树单元,编码单元,编码块或以上任一种的组。在不同的实施例中,数据单元的大小可以变化。It should be understood that, in this embodiment of the present application, a data unit targeted for video coding may be a frame, a slice, a coding tree unit, a coding unit, a coding block, or a group of any of the above. In different embodiments, the size of the data unit may vary.
具体地,如图2所示,编码帧划分为多个编码块后,进行预测过程,用于去除当前编码帧的空域和时域冗余信息。当前比较常用的预测编码方法包括帧内预测和帧间预测两种方法。帧内预测仅利用本帧图像中己重建的信息对当前编码块进行预测,而帧间预测会利用到之前已经重建过的其它帧图像(也被称作参考帧)中的信息对当前编码块进行预测。具体地,在本申请实施例中,编码控制模块201用于决策选择帧内预测或者帧间预测。Specifically, as shown in FIG. 2 , after the coded frame is divided into multiple coding blocks, a prediction process is performed to remove redundant information in the spatial and temporal domains of the current coded frame. Currently, the commonly used predictive coding methods include intra-frame prediction and inter-frame prediction. Intra-frame prediction only uses the reconstructed information in this frame to predict the current coding block, while inter-frame prediction uses the information in other frames (also called reference frames) that have been reconstructed before to predict the current coding block. Make predictions. Specifically, in this embodiment of the present application, the encoding control module 201 is configured to decide whether to select intra-frame prediction or inter-frame prediction.
当选择帧内预测模式时,帧内预测203的过程包括获取当前编码块周围已编码相邻块的重建块作为参考块,基于该参考块的像素值,采用预测模式方法计算预测值生成预测块,将当前编码块与预测块的相应像素值相减得到当前编码块的残差,当前编码块的残差经过变换204、量化205以及熵编码210后形成当前编码块的码流。进一步的,当前编码帧的全部编码块经过上 述编码过程后,形成编码帧的编码码流中的一部分。此外,帧内预测203中产生的控制和参考数据也经过熵编码210编码,形成编码码流中的一部分。When the intra-frame prediction mode is selected, the process of intra-frame prediction 203 includes obtaining the reconstructed block of the coded adjacent blocks around the current coding block as a reference block, and using the prediction mode method to calculate the prediction value based on the pixel value of the reference block to generate the prediction block. , the corresponding pixel values of the current coding block and the prediction block are subtracted to obtain the residual of the current coding block. The residual of the current coding block is transformed 204 , quantized 205 and entropy encoded 210 to form the code stream of the current coding block. Further, all the encoded blocks of the current encoded frame form a part of the encoded code stream of the encoded frame after undergoing the above encoding process. In addition, the control and reference data generated in the intra prediction 203 are also encoded by entropy encoding 210 to form part of the encoded code stream.
具体地,变换204用于去除图像块的残差的相关性,以便提高编码效率。对于当前编码块残差数据的变换通常采用二维离散余弦变换(Discrete Cosine Transform,DCT)变换和二维离散正弦变换(Discrete Sine Transform,DST)变换,例如在编码端将编码块的残差信息分别与一个N×M的变换矩阵及其转置矩阵相乘,相乘之后得到当前编码块的变换系数。Specifically, the transform 204 is used to de-correlate the residuals of image blocks in order to improve coding efficiency. For the transformation of the residual data of the current coding block, two-dimensional discrete cosine transform (Discrete Cosine Transform, DCT) transform and two-dimensional discrete sine transform (Discrete Sine Transform, DST) transform are usually used. Multiply with an N×M transformation matrix and its transposed matrix respectively, and obtain the transformation coefficient of the current coding block after multiplication.
在产生变换系数之后用量化205进一步提高压缩效率,变换系数经量化可以得到量化后的系数,然后将量化后的系数进行熵编码210得到当前编码块的残差码流,其中,熵编码方法包括但不限于内容自适应二进制算术编码(Context Adaptive Binary Arithmetic Coding,CABAC)熵编码。最后将熵编码得到的比特流及进行编码后的编码模式信息进行存储或发送到解码端。在编码端,还会对量化的结果进行反量化206,对反量化结果进行反变换207。在反变换207之后,利用反变换结果以及运动补偿结果,得到重建像素。之后,对重建像素进行滤波(即环路滤波)211。在211之后,输出滤波后的重建图像(属于重建视频帧)。后续,重建图像可以作为其他帧图像的参考帧图像进行帧间预测。本申请实施例中,重建图像又可称为重建后的图像或重构图像。After the transform coefficients are generated, quantization 205 is used to further improve the compression efficiency. After the transform coefficients are quantized, the quantized coefficients can be obtained, and then the quantized coefficients are entropy encoded 210 to obtain the residual code stream of the current encoding block. The entropy encoding method includes: But not limited to content adaptive binary arithmetic coding (Context Adaptive Binary Arithmetic Coding, CABAC) entropy coding. Finally, the bit stream obtained by entropy encoding and the encoded encoding mode information are stored or sent to the decoding end. At the encoding end, inverse quantization 206 is also performed on the quantized result, and inverse transformation 207 is performed on the inverse quantization result. After the inverse transformation 207, the reconstructed pixels are obtained using the inverse transformation result and the motion compensation result. Afterwards, the reconstructed pixels are filtered (ie loop filtered) 211. After 211, the filtered reconstructed image (belonging to the reconstructed video frame) is output. Subsequently, the reconstructed image can be used as a reference frame image of other frame images for inter-frame prediction. In this embodiment of the present application, the reconstructed image may also be referred to as a reconstructed image or a reconstructed image.
具体地,帧内预测203过程中的已编码相邻块为:当前编码块编码之前,已进行编码的相邻块,该相邻块的编码过程中产生的残差经过变换204、量化205、反量化206、和反变换207后,与该相邻块的预测块相加得到的重建块。对应的,反量化206和反变换207为量化206和变换204的逆过程,用于恢复量化和变换前的残差数据。Specifically, the coded adjacent blocks in the process of intra-frame prediction 203 are: adjacent blocks that have been coded before the current coded block is coded, and the residuals generated in the coding process of the adjacent blocks are transformed 204, quantized 205, After inverse quantization 206 and inverse transform 207, the reconstructed block is obtained by adding the prediction block of the adjacent block. Correspondingly, inverse quantization 206 and inverse transform 207 are inverse processes of quantization 206 and transform 204, and are used to restore residual data before quantization and transform.
如图2所示,当选择帧间预测模式时,帧间预测过程包括运动估计(Motion Estimation,ME)208和运动补偿(Motion Compensation,MC)209。具体地,编码端可以根据重建视频帧中的参考帧图像进行运动估计208,在一张或多张参考帧图像中根据一定的匹配准则搜索到与当前编码块最相似的图像块作为预测块,该预测块与当前编码块的相对位移即为当前编码块的运动矢量(Motion Vector,MV)。并将该编码块像素的原始值与对应的预测块像素值相减得到编码块的残差。当前编码块的残差经过变换204、量化205以及熵编码210后形成编码帧的编码码流中的一部分。对于解码端来说, 可以基于上述确定的运动矢量和预测块进行运动补偿209,获得当前编码块。As shown in FIG. 2 , when the inter prediction mode is selected, the inter prediction process includes motion estimation (Motion Estimation, ME) 208 and motion compensation (Motion Compensation, MC) 209 . Specifically, the encoder can perform motion estimation 208 according to the reference frame image in the reconstructed video frame, and search for the image block most similar to the current encoding block in one or more reference frame images as the prediction block according to certain matching criteria, The relative displacement between the prediction block and the current coding block is the motion vector (Motion Vector, MV) of the current coding block. The original value of the pixel of the coding block is subtracted from the pixel value of the corresponding prediction block to obtain the residual of the coding block. The residual of the current coded block is transformed 204, quantized 205 and entropy coded 210 to form a part of the coded code stream of the coded frame. For the decoding end, motion compensation 209 may be performed based on the motion vector and the prediction block determined above to obtain the current coding block.
其中,如图2所示,重建视频帧为经过滤波211之后得到视频帧。重建视频帧包括一个或多个重建后的图像。滤波211用于减少编码过程中产生的块效应和振铃效应等压缩失真,重建视频帧在编码过程中用于为帧间预测提供参考帧,在解码过程中,重建视频帧经过后处理后输出为最终的解码视频。Wherein, as shown in FIG. 2 , the reconstructed video frame is a video frame obtained after filtering 211 . The reconstructed video frame includes one or more reconstructed images. Filtering 211 is used to reduce compression distortions such as blocking and ringing effects during the encoding process. The reconstructed video frame is used to provide reference frames for inter-frame prediction during the encoding process. During the decoding process, the reconstructed video frame is post-processed and output. for the final decoded video.
具体地,帧间预测模式可以包括高级运动矢量预测(Advanced Motion Vector Prediction,AMVP)模式、合并(Merge)模式或跳过(skip)模式。Specifically, the inter prediction mode may include an advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) mode, a merge (Merge) mode, or a skip (skip) mode.
对于AMVP模式而言,可以先确定运动矢量预测(Motion Vector Prediction,MVP),在得到MVP之后,可以根据MVP确定运动估计的起始点,在起始点附近,进行运动搜索,搜索完毕之后得到最优的MV,由MV确定参考块在参考图像中的位置,参考块减去当前块得到残差块,MV减去MVP得到运动矢量差值(Motion Vector Difference,MVD),并将该MVD和MVP的索引通过码流传输给解码端。For the AMVP mode, the motion vector prediction (MVP) can be determined first. After the MVP is obtained, the starting point of the motion estimation can be determined according to the MVP, and a motion search can be performed near the starting point. After the search is completed, the optimal MV, the position of the reference block in the reference image is determined by the MV, the reference block is subtracted from the current block to obtain the residual block, the MV is subtracted from the MVP to obtain the Motion Vector Difference (MVD), and the difference between the MVD and the MVP is obtained. The index is transmitted to the decoder through the code stream.
对于Merge模式而言,可以先确定MVP,并直接将MVP确定为当前块的MV。其中,为了得到MVP,可以先构建一个MVP候选列表(merge candidate list),在MVP候选列表中,可以包括至少一个候选MVP,每个候选MVP可以对应有一个索引,编码端在从MVP候选列表中选择MVP之后,可以将该MVP索引写入到码流中,则解码端可以按照该索引从MVP候选列表中找到该索引对应的MVP,以实现对图像块的解码。For the Merge mode, the MVP can be determined first, and the MVP can be directly determined as the MV of the current block. Among them, in order to obtain the MVP, a MVP candidate list (merge candidate list) can be constructed first. In the MVP candidate list, at least one candidate MVP can be included, and each candidate MVP can have an index corresponding to the MVP candidate list. After the MVP is selected, the MVP index can be written into the code stream, and the decoder can find the MVP corresponding to the index from the MVP candidate list according to the index, so as to decode the image block.
应理解,以上过程只是Merge模式的一种具体实现方式。Merge模式还可以具有其他的实现方式。It should be understood that the above process is only a specific implementation manner of the Merge mode. The Merge pattern can also have other implementations.
例如,Skip模式是Merge模式的一种特例。按照Merge模式得到MV之后,如果编码端确定当前块和参考块基本一样,那么不需要传输残差数据,只需要传递MVP的索引,以及进一步地可以传递一个标志,该标志可以表明当前块可以直接从参考块得到。For example, Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoder determines that the current block is basically the same as the reference block, it does not need to transmit the residual data, only the index of the MVP needs to be passed, and further a flag can be passed, which can indicate that the current block can be directly Obtained from the reference block.
也就是说,Merge模式特点为:MV=MVP(MVD=0);而Skip模式还多一个特点,即:重构值rec=预测值pred(残差值resi=0)。That is to say, the Merge mode is characterized by: MV=MVP (MVD=0); and the Skip mode has one more feature, namely: the reconstruction value rec=prediction value pred (residual value resi=0).
对于解码端,则进行与编码端相对应的操作。首先利用熵解码以及反量化和反变换得到残差信息,并根据解码码流确定当前图像块使用帧内预测还是帧间预测。如果是帧内预测,则利用当前帧中已重建图像块按照帧内预测方法构建预测信息;如果是帧间预测,则需要解析出运动信息,并使用所解 析出的运动信息在已重建的图像中确定参考块,得到预测信息;接下来,再将预测信息与残差信息进行叠加,并经过滤波操作便可以得到重建信息。For the decoding end, perform operations corresponding to the encoding end. Firstly, the residual information is obtained by entropy decoding, inverse quantization and inverse transformation, and according to the decoded code stream, it is determined whether the current image block uses intra-frame prediction or inter-frame prediction. If it is intra-frame prediction, use the reconstructed image blocks in the current frame to construct prediction information according to the intra-frame prediction method; if it is inter-frame prediction, you need to parse out the motion information, and use the parsed motion information in the reconstructed image. Then, the prediction information and the residual information are superimposed, and the reconstruction information can be obtained through the filtering operation.
本申请主要涉及基于神经网络的编解码技术对图像或视频进行编解码(也可以称为“智能编解码技术”),因此,下文先对神经网络的相关内容进行简单介绍。This application mainly relates to the encoding and decoding of images or videos based on neural network encoding and decoding technology (also referred to as "intelligent encoding and decoding technology"). Therefore, the following briefly introduces the relevant content of neural networks.
神经网络(Neural Network,NN),也称为“人工”神经网络(Artificial Neural Network,ANN),具有与生物神经网络共同的某些性能特征的信息处理***。神经网络***由许多简单且高度互连的处理组件组成,通过对外部输入的动态状态响应来处理信息。处理组件可以被认为是人脑中的神经元,其中每个感知器接受多个输入并计算输入的加权和。在神经网络领域,感知器被认为是生物神经元的数学模型。此外,这些互连的处理组件通常以层的形式组织。对于识别应用,外部输入可以对应于呈现给网络的模式,该模式与一个或多个中间层通信,也称为“隐藏层”,其中实际处理是通过加权“连接”***完成的。Neural network (Neural Network, NN), also known as "artificial" neural network (Artificial Neural Network, ANN), is an information processing system that has certain performance characteristics in common with biological neural networks. Neural network systems consist of many simple and highly interconnected processing components that process information through dynamic state responses to external inputs. The processing components can be thought of as neurons in the human brain, where each perceptron accepts multiple inputs and computes a weighted sum of the inputs. In the field of neural networks, perceptrons are considered mathematical models of biological neurons. Furthermore, these interconnected processing components are typically organized in layers. For recognition applications, the external input can correspond to a pattern presented to the network that communicates with one or more intermediate layers, also known as "hidden layers", where the actual processing is done through a system of weighted "connections".
ANN可以使用不同的架构来指定网络中涉及哪些变量及其拓扑关系。例如,神经网络中涉及的变量可能是神经元之间连接的权重,以及神经元的活动。大多数人工神经网络都包含某种形式的“学习规则”,它根据所呈现的输入模式修改连接的权重。从某种意义上说,人工神经网络就像他们的生物学对像一样通过实例来学习。ANNs can use different architectures to specify which variables are involved in the network and their topological relationships. For example, the variables involved in a neural network might be the weights of connections between neurons, and the activity of neurons. Most artificial neural networks contain some form of "learning rule" that modifies the weights of connections based on the input patterns presented. In a sense, artificial neural networks learn by example just like their biological counterparts.
深度神经网络(Deep Neural Network,DNN)或深度多层神经网络对应于具有多级互连节点的神经网络,其允许它们紧凑地表示高度非线性和高度变化的函数。然而,DNN的计算复杂度随着与大量层相关联的节点数量的增加而迅速增长。Deep Neural Networks (DNNs) or deep multilayer neural networks correspond to neural networks with multiple levels of interconnected nodes, which allow them to compactly represent highly nonlinear and highly varying functions. However, the computational complexity of DNNs grows rapidly with the number of nodes associated with a large number of layers.
目前,对于基于神经网络的编码技术包含的神经网络模型的模型参数的管理和传输,一种方式是将神经网络模型的模型参数固定于编码器和解码器都可得的一个公共库文件中,编码器和解码器在进行编码或解码时可以从该公共库文件中获取神经网络模型的模型参数,但是这种方式一旦编码器的某些编码工具发生变化,其效果会降低。At present, for the management and transmission of the model parameters of the neural network model included in the neural network-based coding technology, one way is to fix the model parameters of the neural network model in a common library file available to both the encoder and the decoder. The encoder and decoder can obtain the model parameters of the neural network model from the common library file when encoding or decoding, but this method will reduce the effect once some encoding tools of the encoder are changed.
另一种方式是将神经网络模型的模型参数通过码流传送到解码端,这样可以根据编码器的需求对模型参数进行灵活的调整,但是对于层度教深的深度学习神经网络模型来说参数量一般比较大,如果直接放到码流中进行传输 增加了比特消耗,降低了视频压缩率。此外,基于神经网络的编码技术进行的编码由于有模型参数的管理和传输需求对图传场景的低延时传输带来了更大的挑战。Another way is to send the model parameters of the neural network model to the decoder through the code stream, so that the model parameters can be flexibly adjusted according to the needs of the encoder. The amount is generally relatively large. If it is directly put into the code stream for transmission, it will increase the bit consumption and reduce the video compression rate. In addition, coding based on neural network coding technology brings greater challenges to the low-latency transmission of image transmission scenarios due to the management and transmission requirements of model parameters.
本申请提供一种编码方法、解码方法和编码装置、解码装置,可以很好地解决神经网络模型的模型参数的管理和传输问题,而且也缓解了码流的比特消耗,此外,还可以降低由于有模型参数的管理和传输需求对低时延传输的挑战。The present application provides an encoding method, a decoding method, an encoding device, and a decoding device, which can well solve the management and transmission problems of model parameters of a neural network model, and also alleviate the bit consumption of the code stream. There are challenges for low-latency transmission due to management of model parameters and transmission requirements.
下面将结合图3详细描述本申请实施例提供的编码方法300。The encoding method 300 provided by this embodiment of the present application will be described in detail below with reference to FIG. 3 .
如图3所示为本申请一实施例提供的编码方法300的示意图,该方法300可以包括步骤310-330。FIG. 3 is a schematic diagram of an encoding method 300 provided by an embodiment of the present application, and the method 300 may include steps 310-330.
310,利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流。310. Use a neural network-based encoding technology to encode the to-be-encoded image to obtain a code stream of the to-be-encoded image.
本申请实施例中,利用基于神经网络的编码技术(也可以称为“智能编码技术”)对待编码图像进行编码可以理解为:利用基于神经网络模型的编码模块替换或补充编解码框架中的部分编码模块对待编码图像进行编码,或者,利用基于神经网络模型的编码框架替换原始的编码框架对待编码图像进行编码。In this embodiment of the present application, using a neural network-based encoding technology (also referred to as "smart encoding technology") to encode an image to be encoded can be understood as: using a neural network model-based encoding module to replace or supplement part of the encoding and decoding framework The encoding module encodes the to-be-encoded image, or replaces the original encoding frame with a neural network model-based encoding frame to encode the to-be-encoded image.
基于神经网络的编码技术主要包括三个方面:混合式神经网络视频编码(即将神经网络的编码模块代替传统编码模块嵌入到传统的视频编码框架中)、神经网络率失真优化编码以及端到端的神经网络视频编码。The neural network-based coding technology mainly includes three aspects: hybrid neural network video coding (that is, the neural network coding module is embedded in the traditional video coding framework instead of the traditional coding module), neural network rate-distortion optimization coding and end-to-end neural network coding Network video encoding.
可选地,在一些实施例中,对于编码过程中的预测、变换、量化、熵编码、滤波中的至少一种,可以利用基于神经网络模型的编码模块替换编码框架中的部分编码模块对待编码图像进行编码。Optionally, in some embodiments, for at least one of prediction, transformation, quantization, entropy coding, and filtering in the coding process, a coding module based on a neural network model can be used to replace part of the coding module in the coding framework to be coded. image to encode.
需要特别说明的是,对于编码过程中的滤波,也可以在现有的滤波模块中增加一个基于神经网络模型的滤波模块对重建图像进行滤波。It should be noted that, for the filtering in the encoding process, a filtering module based on a neural network model may also be added to the existing filtering module to filter the reconstructed image.
示例性地,对于编码过程中的帧内预测,采用基于神经网络的帧内预测方法与原有的帧内预测方法进行比较决策出最优的帧内预测方法进行预测;对于编码过程中的帧间预测,可以采用基于神经网络的图像超分辨率技术进行预测,可以提升运动估计性能,进而提高帧间预测的效率。Exemplarily, for intra-frame prediction in the encoding process, the intra-frame prediction method based on neural network is compared with the original intra-frame prediction method to determine the optimal intra-frame prediction method for prediction; For inter prediction, the image super-resolution technology based on neural network can be used for prediction, which can improve the performance of motion estimation, thereby improving the efficiency of inter prediction.
对于编码过程中的熵编码,可以采用基于神经网络技术的上下文概率估计方法替换传统的基于规则的上下文概率预测模型,应用于系数编码或者其 它一些语法元素的熵编码过程中。对于编码过程中的滤波,可以在去方块滤波之后,增加基于神经网络的滤波(Neural Network Filter,NNF)技术。For the entropy coding in the coding process, the context probability estimation method based on neural network technology can be used to replace the traditional rule-based context probability prediction model, which is applied to the entropy coding process of coefficient coding or some other syntax elements. For filtering in the encoding process, a Neural Network Filter (NNF) technology can be added after deblocking filtering.
端到端的神经网络视频编码则是完全抛开传统的混合视频编码框架的方式,直接输入视频图像以及相关信息,采用基于神经网络的预测方式进行预测和压缩编码。End-to-end neural network video coding is to completely abandon the traditional hybrid video coding framework, directly input video images and related information, and use neural network-based prediction methods for prediction and compression coding.
320,通过第一通信链路传输所述待编码图像的码流。320. Transmit the code stream of the image to be encoded through the first communication link.
330,通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。330. Transmit, through the second communication link, the model parameters of the neural network model included in the neural network-based encoding technology.
可选地,本申请实施例中的神经网络模型可以包括离线训练的神经网络模型或在线训练的神经网络模型,不予限制。Optionally, the neural network model in this embodiment of the present application may include a neural network model trained offline or a neural network model trained online, which is not limited.
本申请实施例中的神经网络模型可以包括DNN、卷积神经网络(Convolutional Neural Network,CNN)、递归神经网络(Recurrent Neural Network,RNN)或其他神经网络变体等,本申请对此不作具体限制。The neural network model in the embodiments of the present application may include DNN, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or other neural network variants, etc., which is not specifically limited in the present application .
本申请实施例中的神经网络模型的模型参数包括但不限于神经网络的层数、神经元的个数、神经元之间连接的权重等。The model parameters of the neural network model in the embodiments of the present application include, but are not limited to, the number of layers of the neural network, the number of neurons, the weight of connections between neurons, and the like.
相应地,对于解码端,可以基于接收到的神经网络模型的模型参数对码流进行解码。如图4所示为本申请一实施例提供的解码方法400的示意图,该方法400可以包括步骤410-430。Correspondingly, for the decoding end, the code stream can be decoded based on the received model parameters of the neural network model. FIG. 4 is a schematic diagram of a decoding method 400 provided by an embodiment of the present application, and the method 400 may include steps 410-430.
410,通过第一通信链路接收待解码图像的码流。410. Receive the code stream of the image to be decoded through the first communication link.
420,通过第二通信链路接收神经网络模型的模型参数。420. Receive model parameters of the neural network model through the second communication link.
430,利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。430. Decode the code stream by using the model parameters of the neural network model to obtain a decoded image.
可选地,在一些实施例中,对于解码过程中的熵解码、反量化、反变换、预测重建或滤波中的至少一种,可以利用基于神经网络模型的解码模块替换解码框架中的部分解码模块对待解码图像进行解码。Optionally, in some embodiments, for at least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction or filtering in the decoding process, a decoding module based on a neural network model can be used to replace part of the decoding in the decoding framework. The module decodes the image to be decoded.
需要特别说明的是,对于解码过程中的滤波,也可以在现有的滤波模块中增加一个基于神经网络模型的滤波模块对预测重建图像进行滤波。It should be noted that, for the filtering in the decoding process, a filtering module based on a neural network model may also be added to the existing filtering module to filter the predicted reconstructed image.
本申请提供的方案,编码端分别通过第一通信链路和第二通信链路传输待编码图像的码流和神经网络模型的模型参数,解码端分别通过上述两个通信链路对应接收码流和神经网络模型的模型参数,可以很好地解决神经网络模型的模型参数的管理和传输问题;由于神经网络模型的模型参数通过第二 通信链路传输,缓解了码流的比特消耗;此外,还可以降低由于有模型参数的管理和传输需求对低时延传输的挑战。In the solution provided by this application, the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end respectively receives the code stream through the above two communication links. and the model parameters of the neural network model, which can well solve the management and transmission problems of the model parameters of the neural network model; since the model parameters of the neural network model are transmitted through the second communication link, the bit consumption of the code stream is alleviated; in addition, It can also reduce the challenge of low-latency transmission due to management of model parameters and transmission requirements.
可选地,在一些实施例中,所述第一通信链路和所述第二通信链路具有不同的物理特性。Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
本申请实施例中的物理特性可以由具体的应用需求决定,示例性地,可以包括传输时延和/或传输带宽。但应理解,该物理特性还可以为其它特性,不应对本申请造成特别限定。The physical characteristics in the embodiments of the present application may be determined by specific application requirements, and may include, for example, transmission delay and/or transmission bandwidth. However, it should be understood that the physical properties may also be other properties, which should not be particularly limited to the present application.
可选地,在一些实施例中,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link. a communication link.
本申请实施例中,第一通信链路与第二通信链路相比,其传输时延较低,或,传输带宽较低,或,传输时延和传输带宽均较低;换句话说,第二通信连路的传输时延和/或传输带宽高于第一通信链路。In the embodiment of the present application, compared with the second communication link, the first communication link has a lower transmission delay, or, the transmission bandwidth is lower, or, the transmission delay and the transmission bandwidth are both lower; in other words, The transmission delay and/or transmission bandwidth of the second communication link is higher than that of the first communication link.
需要说明的是,编码端在对待编码图像进行编码的过程中,可能会实时传输编码码流,以便于解码端可以及时对其进行解码。因此,可以考虑时延较低的第一通信链路传输待编码图像的码流。It should be noted that, in the process of encoding the image to be encoded, the encoding end may transmit the encoded code stream in real time, so that the decoding end can decode it in time. Therefore, it may be considered that the first communication link with lower time delay transmits the code stream of the image to be encoded.
还需要说明的是,编码端利用神经网络模型对待编码图像进行编码的过程中,可以通过第二通信链路传输神经网络模型的模型参数,由于神经网络模型的模型参数较多,特别是对于层数较多的神经网络模型的模型参数更复杂,因此,可以考虑利用带宽较高的第二通信链路传输神经网络模型的模型参数。It should also be noted that in the process of encoding the image to be encoded using the neural network model at the encoding end, the model parameters of the neural network model can be transmitted through the second communication link. The model parameters of the neural network model with a large number are more complex, therefore, it may be considered to transmit the model parameters of the neural network model by using a second communication link with a higher bandwidth.
本申请提供的方案,通过传输时延较低的第一通信链路传输待编码图像的码流,通过传输带宽较高的第二通信链路传输神经网络模型的模型参数,可以很好地解决神经网络模型的模型参数的管理和传输问题,同时可以进一步减少由于有神经网络模型的模型参数的管理和传输需求对于低时延传输的挑战。The solution provided by this application can be well solved by transmitting the code stream of the image to be encoded through the first communication link with lower transmission delay, and transmitting the model parameters of the neural network model through the second communication link with higher transmission bandwidth. The management and transmission of the model parameters of the neural network model can further reduce the challenge of low-latency transmission due to the management and transmission requirements of the model parameters of the neural network model.
可选地,在一些实施例中,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。Optionally, in some embodiments, the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a link with a bandwidth greater than or equal to a second threshold.
本申请实施例中的第一阈值和/或第二阈值可以通过协议规定,也可以通过服务器配置;第一阈值和/或第二阈值可以是固定值,也可以是不断调整的变化的值;不予限制。The first threshold and/or the second threshold in the embodiment of the present application may be specified by a protocol or configured by a server; the first threshold and/or the second threshold may be a fixed value, or a continuously adjusted and changed value; No restrictions.
可选地,在一些实施例中,所述第一通信链路包括基于私有图传协议或 无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。Optionally, in some embodiments, the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link includes a link based on a mobile communication protocol.
可选地,在一些实施例中,所述私有图传协议包括软件无线电(Software Defination Radio,SDR)协议,所述无线局域网协议包括无线保真(Wireless Fidelity,WiFi)协议,所述移动通信协议包括4G或5G协议。Optionally, in some embodiments, the private image transmission protocol includes a software radio (Software Defination Radio, SDR) protocol, the wireless local area network protocol includes a wireless fidelity (Wireless Fidelity, WiFi) protocol, the mobile communication protocol Including 4G or 5G protocols.
需要说明的是,本申请实施例中,私有图传协议示出了SDR协议,私有图传协议还可以包括其它协议,如开放式网络视频接口论坛(Open Network Video Interface Forum,ONVIF)协议等,不予限制。It should be noted that, in the embodiment of this application, the private image transmission protocol shows the SDR protocol, and the private image transmission protocol may also include other protocols, such as the Open Network Video Interface Forum (ONVIF) protocol, etc., No restrictions.
本申请实施例中的无线局域网协议除了上述所示出的WiFi协议外,还可以包括其它协议,如蓝牙(Bluetooth)、紫蜂(ZigBee)等,不予限制。In addition to the WiFi protocol shown above, the wireless local area network protocol in the embodiment of the present application may also include other protocols, such as Bluetooth (Bluetooth), ZigBee (ZigBee), etc., which are not limited.
此外,本申请实施例中的移动通信协议除了上述所示出的4G或5G协议之外,还可以包括其它协议,如未来的6G协议等。In addition, in addition to the 4G or 5G protocol shown above, the mobile communication protocol in the embodiments of the present application may also include other protocols, such as the future 6G protocol.
当然,在一些实施例中,第一通信链路也可以为基于SDR协议的链路,第二通信链路为基于WiFi协议的链路。Certainly, in some embodiments, the first communication link may also be a link based on the SDR protocol, and the second communication link may be a link based on the WiFi protocol.
可选地,在一些实施例中,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。Optionally, in some embodiments, the first communication link includes a private image transmission link, and the second communication link includes a public network transmission link.
如图5所示,为本申请实施例提供的智能编码技术的图传传输的示意图。采集端(也可以理解为编码端)采集到视频后,经过编码器和神经网络计算平台的压缩编码后得到码流和对应的神经网络模型的模型参数,然后经过无线图传***将码流和神经网络模型的模型参数传输到显示端。显示端(也可以理解为解码端)在接收到码流和对应的神经网络模型的模型参数后,通过显示端的解码器和神经网络计算平台进行解码得到重建视频,并通过显示端的显示器显示该重建视频。As shown in FIG. 5 , it is a schematic diagram of the image transmission transmission of the intelligent coding technology provided by the embodiment of the present application. After the acquisition end (which can also be understood as the encoding end) collects the video, the code stream and the corresponding model parameters of the neural network model are obtained after compression and encoding by the encoder and the neural network computing platform, and then the code stream and the corresponding neural network model parameters are obtained through the wireless image transmission system. The model parameters of the neural network model are transmitted to the display side. After receiving the code stream and the model parameters of the corresponding neural network model, the display end (also known as the decoding end) decodes the reconstructed video through the decoder of the display end and the neural network computing platform to obtain the reconstructed video, and displays the reconstruction through the display of the display end. video.
其中,上述无线图传***可以是无线视频传输***,可以包括私有图传链路和公网传输链路,用于分别传输码流和神经网络模型的模型参数。Wherein, the above-mentioned wireless image transmission system may be a wireless video transmission system, and may include a private image transmission link and a public network transmission link for respectively transmitting the code stream and model parameters of the neural network model.
本申请实施例中的私有图传链路可以包括基于SDR协议的链路或基于WiFi协议的链路或基于ONVIF协议的链路;本申请实施例中的公网传输链路可以包括基于4G或5G或6G协议的链路。The private image transmission link in the embodiment of the present application may include a link based on the SDR protocol, a link based on the WiFi protocol, or a link based on the ONVIF protocol; the public network transmission link in the embodiment of the present application may include a link based on 4G or 5G or 6G protocol link.
本申请提供的方案,通过私有图传链路传输待编码图像的码流,通过公网传输链路传输神经网络模型的模型参数,可以很好地解决神经网络模型的模型参数的管理和传输问题。The solution provided by this application transmits the code stream of the image to be encoded through the private image transmission link, and transmits the model parameters of the neural network model through the public network transmission link, which can well solve the management and transmission problems of the model parameters of the neural network model. .
可选地,在一些实施例中,所述第一通信链路和/或所述第二通信链路是 从以下一种或多种链路中选择的:Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of the following links:
基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
本申请实施例中,编码端传输待编码图像的码流和神经网络模型的模型参数时,也可以从一种或多种链路中灵活选择,从而可以提高灵活性。In the embodiment of the present application, when the encoding end transmits the code stream of the image to be encoded and the model parameters of the neural network model, it can also be flexibly selected from one or more links, thereby improving flexibility.
上文指出,神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型,下文将以这两种神经网络模型为例,分别介绍利用基于神经网络的编码技术对待编码图像进行编码的相关内容。It is pointed out above that neural network models include offline training neural network models or online training neural network models. The following two neural network models will be used as examples to introduce the use of neural network-based encoding technology to encode images related to encoding. content.
情况一:利用基于在线训练的神经网络的编码技术对待编码图像进行编码Case 1: Encoding the to-be-encoded image using the encoding technology based on the online training neural network
方式一:method one:
可选地,在一些实施例中,所述神经网络模型为在线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:Optionally, in some embodiments, the neural network model is an online training neural network model, and the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
对于第n个目标图像,利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;所述n为大于等于1的整数,所述m为大于等于1的整数。For the n-th target image, the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
本申请实施例中的目标图像可以为对待编码图像按照视频序列级、图像组(Group of Picture,GOP)级、图像级(或称帧级)中的任意一种进行划分后的图像。其中,GOP的长度可以自主定义,一般可以将视频序列中从当前I帧到下一个I帧之前的图像作为一个GOP。The target image in this embodiment of the present application may be an image after the image to be encoded is divided according to any one of a video sequence level, a Group of Picture (GOP) level, and a picture level (or frame level). The length of the GOP can be defined independently. Generally, the image from the current I frame to the next I frame in the video sequence can be used as a GOP.
为了便于理解本申请的方案,先对GOP进行简单介绍。GOP包括一组连续的画面,由一张I帧和数张B帧和/或P帧组成,是视频图像编码器和解码器存取的基本单位。I帧(也可以称为关键帧)即帧内编码图像帧,可以理解为这一帧画面的完整保留;B帧即双向参考帧或双向差别帧,可以理解为B帧记录的是本帧与前后帧的差别,解码时要取得之前的缓存图像和解码之后的图像,通过前后图像的数据与本帧数据的叠加取得最终的图像;P帧即前向参考帧或前向预测帧,可以理解为本帧跟前一帧的差别,解码时需要用之前缓存的图像叠加上本帧定义的差别,生成最终图像。In order to facilitate understanding of the solution of the present application, the GOP is briefly introduced first. GOP includes a group of continuous pictures, consisting of one I frame and several B frames and/or P frames, and is the basic unit accessed by video image encoders and decoders. The I frame (also known as the key frame) is the intra-coded image frame, which can be understood as the complete preservation of this frame; the B frame is the bidirectional reference frame or the bidirectional difference frame, which can be understood as the B frame records the current frame and the The difference between the before and after frames, the previous cached image and the decoded image are obtained when decoding, and the final image is obtained by superimposing the data of the previous and previous images and the data of the current frame; P frame is the forward reference frame or forward prediction frame, it can be understood For the difference between this frame and the previous frame, when decoding, it is necessary to superimpose the difference defined in this frame with the previously cached image to generate the final image.
此外,本申请实施例中的目标图像根据对待编码图像划分的级别不同存在差异。In addition, the target images in the embodiments of the present application differ according to the levels of the images to be encoded.
若目标图像是按照视频序列级对待编码图像进行划分得到的,则利用第n-m个目标图像进行训练获得的神经网络模型对第n个目标图像进行编码可以理解为:将视频所包含的全部待编码图像作为1个目标图像,基于预先训练好的神经网络模型对其进行编码。If the target image is obtained by dividing the to-be-encoded image at the video sequence level, encoding the n-th target image using the neural network model obtained by training the n-m-th target image can be understood as: The image is used as a target image, which is encoded based on the pre-trained neural network model.
若目标图像是按照图像组级对待编码图像进行划分得到的,则利用第n-m个目标图像进行训练获得的神经网络模型对第n个目标图像进行编码可以理解为:利用第n-m个GOP进行训练获得的神经网络模型对第n个GOP中包括的各个图像进行编码,此处的GOP包括一个I帧和数张B帧和/或P帧。If the target image is obtained by dividing the image to be encoded according to the image group level, then using the neural network model obtained by training the n-mth target image to encode the nth target image can be understood as: using the n-mth GOP for training to obtain The neural network model encodes each image included in the nth GOP, where the GOP includes one I frame and several B frames and/or P frames.
若目标图像是按照图像级对待编码图像进行划分得到的,则利用第n-m个目标图像进行训练获得的神经网络模型对第n个目标图像进行编码可以理解为:利用第n-m个图像进行训练获得的神经网络模型对第n个图像进行编码,此处的图像可以理解为图像帧,如上文中的I帧、B帧或P帧。If the target image is obtained by dividing the image to be encoded according to the image level, then using the neural network model obtained by training the n-mth target image to encode the n-th target image can be understood as: using the n-mth image to obtain training The neural network model encodes the nth image, and the image here can be understood as an image frame, such as the I frame, B frame or P frame above.
需要说明的是,本申请实施例中,m<n,即可以利用第n个目标图像之前的任一目标图像对其进行编码。It should be noted that, in the embodiment of the present application, m<n, that is, any target image before the nth target image can be used to encode it.
示例性地,对于第3个目标图像,可以利用第2个目标图像进行训练获得的神经网络模型对其进行编码,同时将第2个目标图像进行训练获得的神经网络模型的模型参数传输至解码端,便于解码端的解码;对于第3个目标图像,可以利用第1个目标图像进行训练获得的神经网络模型对其进行编码,由于对第2个目标图像进行编码会传输第1个目标图像进行训练获得的神经网络模型的模型参数,因此,在这种情况下,可以不再另外传输第3个目标图像所采用的神经网络模型的模型参数,只需通知解码端所采用的神经网络模型的模型参数即可,特别是对于层数较多的复杂的神经网络模型,该方式不仅可以满足由于模型参数较多而无法满足时延传输的要求,同时也可以进一步节省带宽。Exemplarily, for the third target image, the neural network model obtained by training the second target image can be used to encode it, and at the same time, the model parameters of the neural network model obtained by training the second target image are transmitted to decoding. end, which is convenient for decoding at the decoding end; for the third target image, the neural network model obtained by training the first target image can be used to encode it, because encoding the second target image will transmit the first target image for processing. The model parameters of the neural network model obtained by training, therefore, in this case, the model parameters of the neural network model used by the third target image can not be transmitted additionally, and only the The model parameters are enough, especially for a complex neural network model with a large number of layers, this method can not only meet the requirements of delay transmission due to the large number of model parameters, but also further save bandwidth.
本申请提供的方案,通过利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,可以提高编码的灵活性,能够适应于多种场景,而且针对各种场景训练的神经网络模型,基于神经网络模型的预测效果更佳。此外,对于采用间隔的目标图像进行训练获得的神经网络模型的模型参数进行的编码,不仅可以满足由于模型参数较多而无法满足时延传输的要求,同时也可以进一步节省带宽。The solution provided by the present application, by using the neural network model obtained by training the n-m th target images to encode the n-th target image, can improve the flexibility of encoding, can be adapted to various scenarios, and is suitable for various The neural network model trained by the scene, the prediction effect based on the neural network model is better. In addition, the encoding of the model parameters of the neural network model obtained by training with spaced target images can not only meet the requirement of delayed transmission due to too many model parameters, but also further save bandwidth.
如上所述,基于神经网络的编码技术可以应用于编码过程中的任意阶段,下文以基于神经网络的滤波技术为例进行说明。As mentioned above, the neural network-based coding technology can be applied to any stage in the coding process, and the following takes the neural network-based filtering technology as an example for description.
可选地,在一些实施例中,所述利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,包括:Optionally, in some embodiments, the encoding of the n th target image by using a neural network model obtained by training the n-m th target images includes:
利用对所述第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行滤波,所述第n-m个目标图像为第n-m个编码图像未经过神经网络模型进行滤波的图像,所述第n个目标图像为第n个编码图像未经过神经网络模型进行滤波的图像。The n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
如图6所示,为本申请实施例提供的一种编码框架2的示意图。As shown in FIG. 6 , it is a schematic diagram of an encoding framework 2 provided in an embodiment of the present application.
其中,编码端在利用反量化206和反变换207得到重建像素后,可以对该重建像素进行滤波211。在进行滤波的过程中,可以通过对重建像素进行去方块滤波(Deblocking Filter,DF)、NNF、样点自适应补偿(Sample adaptive offset,SAO)或自适应环路滤波(Adaptive Loop Fitler,ALF)中的任意一种或多种,输出滤波后的重建图像。Wherein, after obtaining the reconstructed pixel by using inverse quantization 206 and inverse transformation 207, the encoding end may perform filtering 211 on the reconstructed pixel. In the process of filtering, the reconstructed pixels can be deblocking filter (Deblocking Filter, DF), NNF, sample adaptive offset (Sample adaptive offset, SAO) or adaptive loop filter (Adaptive Loop Fitler, ALF) Any one or more of them, and output the filtered reconstructed image.
以目标图像为GOP为例,在编码当前待编码图像的第1个GOP时,将第1个GOP的图像作为训练集,进行基于神经网络的滤波技术的模型训练过程。整个训练过程可以参考如下过程:在编码过程中可以将当前待编码图像送入到图6所示的框架中进行编码训练,例如,第1个GOP经过反量化和反变换后得到重建图像,对该重建图像进行DF、NNF、SAO以及ALF。待第1个GOP内所有图像编码完成后,则得到第1个GOP内所有图像编码后的码流以及基于第1个GOP内有所有图像经过神经网络框架训练得到的神经网络模型。Taking the target image as a GOP as an example, when encoding the first GOP of the current image to be encoded, the image of the first GOP is used as a training set, and the model training process based on the neural network filtering technology is performed. The whole training process can refer to the following process: During the encoding process, the current image to be encoded can be sent to the frame shown in Figure 6 for encoding training. The reconstructed image is subjected to DF, NNF, SAO and ALF. After the encoding of all images in the first GOP is completed, the encoded code stream of all images in the first GOP and the neural network model obtained by training the neural network framework based on all the images in the first GOP are obtained.
在编码当前待编码图像的第2个GOP时,用第1个GOP训练得到的神经网络模型进行基于神经网络的滤波技术。具体的滤波过程可以参考如下过程:在编码第2个GOP内的图像时,该第2个GOP经过反量化和反变换后得到重建图像,然后将该重建图像送入到部署了用第1个GOP训练得到的神经网络模型的滤波模块得到滤波后的重建图像,其中,此处的滤波模块包括DF模块、NNF模块、SAO模块以及ALF模块,且该NNF模块中包括用第1个GOP训练得到的神经网络模型。When encoding the second GOP of the current to-be-encoded image, the neural network model trained by the first GOP is used to perform the filtering technology based on the neural network. The specific filtering process can refer to the following process: when encoding the image in the second GOP, the second GOP is inversely quantized and inversely transformed to obtain a reconstructed image, and then the reconstructed image is sent to the first The filtering module of the neural network model obtained by GOP training obtains the filtered reconstructed image, wherein the filtering module here includes the DF module, the NNF module, the SAO module and the ALF module, and the NNF module includes the first GOP training to obtain the image. neural network model.
以此类推,对于第n个GOP,在编码第n个GOP时,用第n-m个GOP训练得到的神经网络模型进行基于神经网络的滤波技术。具体的滤波过程可 以参考如下过程:在编码第n个GOP内的图像时,该第n个GOP经过反量化和反变换后得到重建图像,然后将该重建图像送入到部署了用第n-m个GOP训练得到的神经网络模型的滤波模块得到滤波后的重建图像,类似地,此处的滤波模块包括DF模块、NNF模块、SAO模块以及ALF模块,且该NNF模块中包括用第n-m个GOP训练得到的神经网络模型。By analogy, for the nth GOP, when encoding the nth GOP, the neural network model trained by the n-mth GOP is used to perform the filtering technology based on the neural network. The specific filtering process can refer to the following process: when encoding the image in the nth GOP, the nth GOP is inversely quantized and inversely transformed to obtain a reconstructed image, and then the reconstructed image is sent to the deployment of the n-mth GOP. The filtering module of the neural network model obtained by GOP training obtains the filtered reconstructed image. Similarly, the filtering module here includes DF module, NNF module, SAO module and ALF module, and the NNF module includes training with n-mth GOP The resulting neural network model.
可以理解的是,图6中所示的滤波顺序仅为示例说明,还可以为其它顺序,例如,DF、SAO、NNF以及ALF,不应对本申请造成特别限定。It can be understood that the filtering order shown in FIG. 6 is only an example, and other orders, such as DF, SAO, NNF, and ALF, should be used, which should not limit the present application.
需要说明的是,m可以为小于n的任意正整数。示例性地,对于第3个GOP,可以利用第2个GOP训练得到的神经网络模型对其进行编码,也可以利用第1个GOP训练得到的神经网络模型对其进行编码,不予限制。It should be noted that m may be any positive integer smaller than n. Exemplarily, for the third GOP, it can be encoded by using the neural network model trained by the second GOP, or it can be encoded by using the neural network model trained by the first GOP, which is not limited.
本申请提供的方案,通过利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行滤波,可以提高滤波的灵活性。In the solution provided by the present application, by using the neural network model obtained by training the n-m th target images to filter the n-th target image, the flexibility of filtering can be improved.
可选地,在一些实施中,m为编码前固化到或预设在编码端的参数;或,m为编码过程中形成的参数。Optionally, in some implementations, m is a parameter fixed to or preset at the encoding end before encoding; or, m is a parameter formed during encoding.
本申请实施例中,m可以为编码过程中形成的参数,例如,m=2,则对于第3个GOP,可以利用第1个GOP训练得到的神经网络模型对其进行编码;对于第4个GOP,可以利用第2个GOP训练得到的神经网络模型对其进行编码;对于第5个GOP,可以利用第3个GOP训练得到的神经网络模型对其进行编码;不予限制。则前面的第1个和第2个GOP中的图像可以利用预设的神经网络模型对其进行编码。In this embodiment of the present application, m may be a parameter formed in the encoding process. For example, if m=2, for the third GOP, the neural network model trained by the first GOP may be used to encode it; for the fourth GOP For the GOP, the neural network model trained by the second GOP can be used to encode it; for the fifth GOP, the neural network model trained by the third GOP can be used to encode it; there is no restriction. Then the images in the first and second GOPs can be encoded using a preset neural network model.
应理解,上述数值仅为举例说明,还可以为其它数值,不应对本申请造成特别限定。It should be understood that the above numerical values are only for illustration, and other numerical values may also be used, which should not limit the present application.
方式二:Method two:
可选地,在一些实施例中,所述神经网络模型为在线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:Optionally, in some embodiments, the neural network model is an online training neural network model, and the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
对于待编码的第一目标图像,利用对已编码的第二目标图像进行训练获得的神经网络模型对所述第一目标图像进行编码,所述第一目标图像和所述第二目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;For the first target image to be encoded, the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair The image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
所述通过第二通信链路传输所述基于神经网络的编码技术所包含的模型参数,包括:通过所述第二通信链路传输对所述第二目标图像进行训练获 得的神经网络模型的模型参数。The transmitting, through the second communication link, the model parameters included in the neural network-based encoding technology includes: transmitting, through the second communication link, a model of the neural network model obtained by training the second target image parameter.
可选地,在一些实施例中,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
类似地,本申请实施例中的第一目标图像和第二目标图像可以为对待编码图像按照视频序列级、GOP级、图像级中的任意一种进行划分后的图像。Similarly, the first target image and the second target image in this embodiment of the present application may be images obtained by dividing the image to be encoded according to any one of video sequence level, GOP level, and image level.
本申请实施例中,在对第一目标图像进行编码时,可以利用对已编码的第二目标图像进行训练获得的神经网络模型对其进行编码。假设第一目标图像为当前待编码图像的第2个GOP,可以利用已编码的第1个GOP进行训练获得的神经网络模型对其进行编码;假设第一目标图像为第3个GOP,可以利用已编码的第2个GOP进行训练获得的神经网络模型对其进行编码;本申请对此不作具体限定。In this embodiment of the present application, when the first target image is encoded, a neural network model obtained by training the encoded second target image may be used to encode it. Assuming that the first target image is the second GOP of the current image to be encoded, the neural network model obtained by training the encoded first GOP can be used to encode it; assuming that the first target image is the third GOP, it can be encoded using the The neural network model obtained by training the encoded second GOP encodes it; this application does not specifically limit this.
本申请实施例中的q可以为大于或等于0的正整数,以目标图像为GOP为例,若q为0,则第二目标图像为与第一目标图像相邻的前一个GOP;若q为1,则第二目标图像为与第一目标图像间隔一个GOP的前一个GOP;若q为2,则第二目标图像为与第一目标图像间隔两个GOP的前一个GOP。In this embodiment of the present application, q may be a positive integer greater than or equal to 0. Taking the target image as a GOP as an example, if q is 0, the second target image is the previous GOP adjacent to the first target image; if q If q is 1, the second target image is the previous GOP separated from the first target image by one GOP; if q is 2, the second target image is the previous GOP separated from the first target image by two GOPs.
可选地,在一些实施例中,所述q为编码前固化到或预设在编码端的参数。Optionally, in some embodiments, the q is a parameter that is fixed or preset at the encoding end before encoding.
本申请提供的方案,编码端利用对已编码的第二目标图像进行训练获得的神经网络模型对待编码的第一目标图像进行编码,可以提高编码的灵活性,能够适应于多种场景,而且针对各种场景训练的神经网络模型,基于神经网络模型的预测效果更佳。In the solution provided by the present application, the encoding end uses the neural network model obtained by training the encoded second target image to encode the first target image to be encoded, which can improve the flexibility of encoding, can adapt to various scenarios, and is suitable for The neural network model trained in various scenarios, the prediction effect based on the neural network model is better.
相应地,对于解码端,利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码,以获得所述第一目标图像。Correspondingly, for the decoding end, the code stream of the first target image is decoded by using the received model parameters of the neural network model obtained by training the second target image to obtain the first target image.
可选地,在一些实施例中,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
可选地,在一些实施例中,所述q为解码前固化到或预设在解码端的参数。Optionally, in some embodiments, the q is a parameter that is solidified or preset at the decoding end before decoding.
本申请实施例中,解码端可以利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码。解码端在进行解码的过程中,可以根据固化到或预设在解码端的参数q确定第二目标图 像进行训练获得的神经网络模型的模型参数对应的待解码目标图像。In this embodiment of the present application, the decoding end may use the received model parameters of the neural network model obtained by training the second target image to decode the code stream of the first target image. In the process of decoding, the decoding end can determine the target image to be decoded corresponding to the model parameter of the neural network model obtained by training the second target image according to the parameter q that is solidified or preset at the decoding end.
示例性地,以目标图像为GOP为例,若q为0,则解码端在解码第一目标图像时,可以利用与之相邻的前一个GOP进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码;若q为1,则解码端在解码第一目标图像时,可以利用与之间隔一个GOP的前一个GOP进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码;若q为2,则解码端在解码第一目标图像时,可以利用与之间隔两个GOP的前一个GOP进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码。Exemplarily, taking the target image as a GOP as an example, if q is 0, when decoding the first target image, the decoding end can use the model parameters of the neural network model obtained by training the previous GOP adjacent to it for the first target image. A code stream of a target image is decoded; if q is 1, when decoding the first target image, the decoding end can use the model parameters of the neural network model obtained by training the previous GOP separated by one GOP to decode the first target image. The code stream of the image is decoded; if q is 2, when decoding the first target image, the decoding end can use the model parameters of the neural network model obtained by training the previous GOP separated by two GOPs for the first target image. The code stream is decoded.
本申请提供的方案,解码端利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码,可以提高灵活性,能够适应于多种场景,而且针对各种场景训练的神经网络模型,基于神经网络模型的预测重建效果更佳。In the solution provided by the present application, the decoding end uses the received model parameters of the neural network model obtained by training the second target image to decode the code stream of the first target image, which can improve flexibility and adapt to various scenarios. Moreover, for the neural network models trained in various scenarios, the prediction and reconstruction effect based on the neural network model is better.
情况二:利用基于离线训练的神经网络的编码技术对待编码图像进行编码Scenario 2: Encoding the to-be-encoded image using an offline-trained neural network-based encoding technique
可选地,在一些实施例中,所述神经网络模型为离线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:Optionally, in some embodiments, the neural network model is an offline training neural network model, and the encoding of the to-be-encoded image using a neural network-based encoding technology includes:
对于第p个目标图像,利用所述离线训练的神经网络模型对所述第p个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像,所述p为大于或等于0的整数。For the p-th target image, the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level. The image after being divided by any one of , the p is an integer greater than or equal to 0.
本申请实施例中,利用离线训练的神经网络模型对所述第p个目标图像进行编码可以包括:利用离线训练的神经网络模型对第p个目标图像进行预测、变换、量化、熵编码、滤波中的至少一种。In the embodiment of the present application, using the offline trained neural network model to encode the p th target image may include: using the offline trained neural network model to predict, transform, quantify, entropy encode, filter the p th target image at least one of them.
下文以基于神经网络的滤波技术为例进行说明,假设上述目标图像为对待编码图像按照GOP级进行划分后的图像,首先可以基于一个能够覆盖大多数视频场景的训练集进行基于神经网络的滤波技术的神经网络模型的训练,得到训练好的神经网络模型。采集端将训练好的神经网络模型部署到神经网络计算平台并结合编码器对第p个GOP进行编码得到码流,然后将该码流和神经网络模型的模型参数通过无线图传***(采用双通信链路的方式进行传输)传送到显示端。显示端在接收到神经网络模型的模型参数和码流后,将基于该模型参数的神经网络模型部署到显示端的神经网络计算平台, 结合解码器进行解码得到重建视频再进行显示。The following takes the filtering technology based on neural network as an example. Assuming that the above target image is an image that is divided according to the GOP level of the image to be encoded, the filtering technology based on neural network can be performed based on a training set that can cover most video scenes. The training of the neural network model is obtained, and the trained neural network model is obtained. The acquisition end deploys the trained neural network model to the neural network computing platform and encodes the p-th GOP in combination with the encoder to obtain a code stream, and then the code stream and the model parameters of the neural network model are passed through the wireless image transmission system (using dual transmission by means of a communication link) to the display terminal. After receiving the model parameters and code stream of the neural network model, the display terminal deploys the neural network model based on the model parameters to the neural network computing platform of the display terminal, and decodes it with the decoder to obtain the reconstructed video and then displays it.
本申请提供的方案,编码端利用离线训练的神经网络模型对第p个目标图像进行编码,并分别通过第一通信链路和第二通信链路传输码流和神经网络模型的模型参数,可以很好地解决神经网络模型的模型参数的管理和传输问题。In the solution provided in this application, the encoding end uses the offline trained neural network model to encode the p-th target image, and transmits the code stream and the model parameters of the neural network model through the first communication link and the second communication link, respectively. It is a good solution to the management and transmission of model parameters of neural network models.
如上所述,利用基于神经网络的编码技术对待编码图像进行编码后,可以通过第二通信链路传输基于神经网络的编码技术包含的神经网络模型的模型参数。对于在线训练的神经网络模型,由于神经网络模型的模型参数随着编码的不断进行持续更新,因此会向解码端传输多个神经网络模型的模型参数,解码端需要确定待解码的码流所对应的神经网络模型的模型参数;对于离线训练的神经网络模型,编码端在训练神经网络模型的时候有可能训练出多个神经网络模型,编码端可能会用不同的神经网络模型对不同的目标图像进行编码,因此也会向解码端传输多个神经网络模型的模型参数,解码端同样也需要确定待解码的码流所对应的神经网络模型的模型参数。下文将介绍解码端确定待解码的码流所对应的神经网络模型的模型参数的相关内容。As described above, after the image to be encoded is encoded using the neural network-based encoding technology, the model parameters of the neural network model included in the neural network-based encoding technology can be transmitted through the second communication link. For the neural network model trained online, since the model parameters of the neural network model are continuously updated with the encoding, the model parameters of multiple neural network models will be transmitted to the decoding end, and the decoding end needs to determine the corresponding code stream to be decoded. The model parameters of the neural network model; for the offline training neural network model, the encoding end may train multiple neural network models when training the neural network model, and the encoding end may use different neural network models for different target images. Therefore, the model parameters of multiple neural network models are also transmitted to the decoding end, and the decoding end also needs to determine the model parameters of the neural network model corresponding to the code stream to be decoded. The following will introduce the relevant content of the decoding end determining the model parameters of the neural network model corresponding to the code stream to be decoded.
可选地,在一些实施例中,所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate a neural network model used when encoding the image to be encoded The identity of the model parameters.
情况一:利用在线训练的神经网络模型进行的编码Case 1: Encoding using an online trained neural network model
上文指出,对于第n个GOP,在编码第n个GOP时,用第n-m个GOP训练得到的神经网络模型进行基于神经网络的滤波技术。具体的滤波过程可以参考如下过程:在编码第n个GOP内的图像时,将第n-m个GOP内的图像送入到HM平台中进行编码得到重建图像,然后将该重建图像送入到部署了用第n-m个GOP训练得到的神经网络模型的滤波模块得到滤波后的重建图像,再将该重建图像进行后续的SAO、ALF过程得到最终的重建图像。It is pointed out above that, for the nth GOP, when encoding the nth GOP, the neural network model trained by the n-mth GOP is used to perform the filtering technology based on the neural network. The specific filtering process can refer to the following process: when encoding the image in the nth GOP, the image in the n-mth GOP is sent to the HM platform for encoding to obtain a reconstructed image, and then the reconstructed image is sent to the deployment Use the filtering module of the neural network model trained by the n-mth GOP to obtain the filtered reconstructed image, and then perform the subsequent SAO and ALF processes on the reconstructed image to obtain the final reconstructed image.
本申请实施例中,编码端在编码第1个GOP时,其对应的码流中可以包括第一指示信息,指示在编码该第1个GOP时采用的神经网络模型的模型参数的标识;编码端在编码第2个GOP时,其对应的码流中可以包括第一指示信息,指示在编码该第2个GOP时采用的神经网络模型的模型参数的标识;以此类推,编码端在编码第n个GOP时,其对应的码流中可以包括第一指示信息,指示在编码该第n个GOP时采用的神经网络模型的模型 参数的标识。In the embodiment of the present application, when the encoding end encodes the first GOP, the corresponding code stream may include first indication information, indicating the identifier of the model parameter of the neural network model used when encoding the first GOP; encoding When the terminal encodes the second GOP, its corresponding code stream may include first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the second GOP; and so on, the encoding terminal is encoding In the case of the nth GOP, the corresponding code stream may include first indication information, which indicates the identifier of the model parameter of the neural network model used when encoding the nth GOP.
解码端在接收到码流和神经网络模型的模型参数后,可以根据码流中的第一指示信息识别待解码图像的码流所对应的神经网络模型的模型参数的标识,基于该标识,可以确定待解码图像的码流在编码时所采用的神经网络模型的模型参数,并根据该神经网络模型的模型参数对码流进行解码,从而得到最终的重建图像。After receiving the code stream and the model parameters of the neural network model, the decoding end can identify the identifier of the model parameter of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream. The model parameters of the neural network model used in the encoding of the code stream of the image to be decoded are determined, and the code stream is decoded according to the model parameters of the neural network model, thereby obtaining the final reconstructed image.
情况二:利用离线训练的神经网络模型进行的编码Case 2: Encoding using offline trained neural network models
本申请实施例中,编码端在训练基于神经网络的滤波技术的神经网络模型时,有可能训练出多个神经网络模型,例如,编码端在对待编码图像中的不同目标图像进行滤波时,可以采用不同的神经网络模型进行滤波。In the embodiment of the present application, when the encoding end trains the neural network model based on the neural network filtering technology, it is possible to train multiple neural network models. For example, when the encoding end filters different target images in the image to be encoded, it may Different neural network models are used for filtering.
示例性地,假设编码端训练出3个神经网络模型,分别为神经网络模型1、神经网络模型2、神经网络模型3;在编码第1个GOP时,采用神经网络模型1对其进行滤波,其对应的码流中可以包括第一指示信息,指示在编码第1个GOP时采用的神经网络模型的模型参数的标识;在编码第2个GOP时,采用神经网络模型2对其进行编码,其对应的码流中可以包括第一指示信息,指示在编码第2个GOP时采用的神经网络模型的模型参数的标识;在编码第3个GOP时,采用神经网络模型3对其进行滤波,其对应的码流中可以包括第一指示信息,指示在编码第3个GOP时采用的神经网络模型的模型参数的标识。Exemplarily, it is assumed that the encoding end has trained three neural network models, namely neural network model 1, neural network model 2, and neural network model 3; when encoding the first GOP, the neural network model 1 is used to filter it, The corresponding code stream may include first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the first GOP; when encoding the second GOP, the neural network model 2 is used to encode it, The corresponding code stream can include the first indication information, indicating the identification of the model parameters of the neural network model adopted when encoding the 2nd GOP; when encoding the 3rd GOP, the neural network model 3 is used to filter it, The corresponding code stream may include first indication information, indicating the identifier of the model parameter of the neural network model used when encoding the third GOP.
相应地,解码端在接收到码流和神经网络模型的模型参数后,可以根据码流中的第一指示信息识别待解码图像的码流所对应的神经网络模型的模型参数的标识,基于该标识,可以确定待解码图像的码流在编码时所采用的神经网络模型的模型参数,并根据该神经网络模型的模型参数对码流进行解码,从而得到最终的重建图像。Correspondingly, after receiving the code stream and the model parameters of the neural network model, the decoding end can identify the identifier of the model parameter of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream, and based on this If the code stream of the image to be decoded is encoded, the model parameters of the neural network model used in encoding can be determined, and the code stream is decoded according to the model parameters of the neural network model to obtain the final reconstructed image.
需要说明的是,编码端在对目标图像进行滤波的过程中,可以采用训练好的任意神经网络模型对其进行滤波。例如,在编码第1个GOP和第2个GOP时,均可以采用神经网络模型1对其进行滤波;在编码第3个GOP时,可以采用神经网络模型3对其进行滤波;不予限制。It should be noted that, in the process of filtering the target image, the encoding end can use any trained neural network model to filter the target image. For example, when encoding the first GOP and the second GOP, the neural network model 1 can be used to filter them; when the third GOP is encoded, the neural network model 3 can be used to filter them; there is no limitation.
本申请提供的方案,通过在码流中增加第一指示信息,指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识,便于解码端在解码时确定待解码图像对应的模型参数,可以提高解码的准确率。In the solution provided by the present application, by adding first indication information to the code stream, it indicates the identifier of the model parameters of the neural network model used by the encoding end when encoding the image to be encoded, so that the decoding end can determine the corresponding data of the image to be decoded during decoding. The model parameters can improve the accuracy of decoding.
可选地,在一些实施例中,若所述待编码图像为关键帧,则所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。若所述待编码图像为非关键帧,则码流中可以不包括第一指示信息。Optionally, in some embodiments, if the to-be-encoded image is a key frame, the code stream of the to-be-encoded image further includes first indication information, where the first indication information is used to indicate that the The identification of the model parameters of the neural network model used when the image to be encoded is encoded. If the image to be encoded is a non-key frame, the first indication information may not be included in the code stream.
可选地,在一些实施例中,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行编码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when encoding other frames between the current key frame and the next key frame.
本申请实施例中的关键帧即为上文中的I帧,可以理解的是,若当前待编码帧为I帧,则编码端在对当前I帧进行编码后的码流中可以包括第一指示信息,指示在对当前I帧进行编码时采用的神经网络模型的模型参数的标识,便于解码端的正确解码。The key frame in the embodiment of the present application is the I frame above. It can be understood that if the current frame to be encoded is an I frame, the encoding end may include the first indication in the code stream after encoding the current I frame information, indicating the identification of the model parameters of the neural network model used when encoding the current I frame, so as to facilitate correct decoding at the decoding end.
此外,若当前待编码帧为I帧,其后续待编码帧为B帧和/或P帧,则上述第一指示信息除了指示在对当前I帧进行编码时采用的神经网络模型的模型参数的标识外,还可以指示在对当前I帧至下一个I帧之间的其它帧(如B帧和/或P帧)进行编码时采用的神经网络模型的模型参数的标识。In addition, if the current frame to be encoded is an I frame, and the subsequent frame to be encoded is a B frame and/or a P frame, the above-mentioned first indication information indicates the model parameters of the neural network model used when encoding the current I frame. In addition to the identifier, the identifier of the model parameters of the neural network model may also be indicated when encoding other frames (eg, B frames and/or P frames) between the current I frame and the next I frame.
换句话说,编码端在对当前I帧编码的过程中,通过在当前I帧的编码码流中增加第一指示信息,该第一指示信息可以用于指示当前I帧以及当前I帧至下一个I帧之间的其它帧(如B帧和/或P帧)进行编码时采用的神经网络模型的模型参数的标识。In other words, in the process of encoding the current I frame, the encoding end adds first indication information to the encoded code stream of the current I frame, and the first indication information can be used to indicate the current I frame and the current I frame to the next The identification of the model parameters of the neural network model used for encoding other frames (such as B frames and/or P frames) between an I frame.
对于解码端,解码端在解码到一个I帧时,若该I帧之后相邻的帧为B帧和/或P帧,则可以继续采用该I帧进行解码时的模型参数对该I帧之后相邻的B帧和/或P帧进行解码,直到解码出下一个I帧。For the decoding end, when the decoding end decodes an I frame, if the adjacent frame after the I frame is a B frame and/or a P frame, the model parameters used for decoding the I frame can continue to be used after the I frame. Adjacent B and/or P frames are decoded until the next I frame is decoded.
本申请提供的方案,码流中包括的第一指示信息指示当前I帧以及当前I帧至下一个I帧之间的其它帧进行编码时采用的神经网络模型的模型参数的标识,便于解码端在解码时确定待解码图像对应的模型参数,可以提高解码的准确率,此外,可以仅在I帧中携带第一指示信息,而在B帧或P帧中不携带第一指示信息,还可以进一步减少比特消耗。In the solution provided by the present application, the first indication information included in the code stream indicates the identification of the model parameters of the neural network model used when encoding the current I frame and other frames between the current I frame and the next I frame, which is convenient for the decoding end. Determining the model parameters corresponding to the image to be decoded during decoding can improve the accuracy of decoding. In addition, the first indication information may be carried only in the I frame, and the first indication information may not be carried in the B frame or P frame. Further reduce bit consumption.
可选地,在一些实施例中,所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:Optionally, in some embodiments, the transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
通过所述第二通信链路传输所述神经网络模型的模型参数以及所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
相应地,对于解码端,通过所述第二通信链路接收所述神经网络模型的模型参数和所述神经网络模型模型参数对应的标识。Correspondingly, for the decoding end, the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are received through the second communication link.
本申请实施例中,编码端也可以通过第二通信链路传输神经网络模型的模型参数和该神经网络模型的模型参数所对应的标识,解码端则通过第二通信链路接收神经网络模型的模型参数和该神经网络模型的模型参数所对应的标识,并通过该标识确定解码码流时对应的神经网络模型的模型参数,基于确定的神经网络模型的模型参数对码流进行解码。In the embodiment of the present application, the encoding end may also transmit the model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model through the second communication link, and the decoding end receives the information of the neural network model through the second communication link. The model parameter and the identifier corresponding to the model parameter of the neural network model, and the model parameter of the corresponding neural network model when decoding the code stream is determined by the identifier, and the code stream is decoded based on the determined model parameter of the neural network model.
在一些实施例中,针对基于神经网络的编码技术,可以基于已有的视频场景训练集离线训练得到一个可用的神经网络模型,作为编码端实现的一个基本的神经网络模型。In some embodiments, for the coding technology based on the neural network, an available neural network model can be obtained by offline training based on the existing video scene training set, as a basic neural network model implemented by the coding end.
在实际应用过程中,可以基于不同应用场景重新训练神经网络模型,编码端在编码的过程中,可以选择使用重新训练的神经网络模型或已有的基本的神经网络模型进行编码,并将是否使用重新训练的神经网络模型的语法元素进行编码写入到码流中。In the actual application process, the neural network model can be retrained based on different application scenarios. During the encoding process, the encoding end can choose to use the retrained neural network model or the existing basic neural network model for encoding, and decide whether to use the retrained neural network model or the existing basic neural network model. The syntax elements of the retrained neural network model are encoded and written into the codestream.
当编码端选择重新训练的神经网络模型进行编码时,则可以将重新训练的神经网络模型的模型参数通过第二通信链路传输,同时将使用重新训练的神经网络模型的语法元素进行编码写入到码流中;当编码端选择已有的基本的神经网络模型进行编码时,则不需将重新训练的神经网络模型的模型参数通过第二通信链路传输至解码端,进一步地,也可以将没有使用重新训练的神经网络模型的语法元素进行编码写入到码流中。When the encoding end selects the retrained neural network model for encoding, the model parameters of the retrained neural network model can be transmitted through the second communication link, and the syntax elements of the retrained neural network model can be used for encoding and writing into the code stream; when the encoding end selects the existing basic neural network model for encoding, it is not necessary to transmit the model parameters of the retrained neural network model to the decoding end through the second communication link. Writes the syntax elements that are not encoded using the retrained neural network model into the codestream.
解码端则从接收到的码流中解码出是否使用重新训练的神经网络模型的标识采用已有的基本的神经网络模型进行解码或采用通过第二通信链路传输过来的重新训练的神经网络模型进行解码。The decoding end decodes from the received code stream whether to use the identifier of the retrained neural network model, and uses the existing basic neural network model for decoding or uses the retrained neural network model transmitted through the second communication link. to decode.
在一种实现方式中,假设编码端通过已有的视频场景训练集离线训练得到一个可用的神经网络模型1。编码端在编码待编码图像的第1个GOP时,确定神经网络模型1不能用于编码该第1个GOP,则可以将第1个GOP作为训练集,进行基于神经网络的编码技术的神经网络模型的训练过程,获得第1个GOP内所有图像编码后的码流以及基于第1个GOP内所有图像经过神经网络框架训练得到的神经网络模型。此外,可以将该神经网络模型的模型参数通过第二通信链路传输至解码端,并在码流中写入使用重新训练的神经网络模型的语法元素。对于待编码图像的其他GOP,也可以采用类似的方 法进行编码,再在此不再赘述。In an implementation manner, it is assumed that the encoder obtains a usable neural network model 1 through offline training of the existing video scene training set. When encoding the first GOP of the image to be encoded, the encoder determines that the neural network model 1 cannot be used to encode the first GOP, then the first GOP can be used as a training set to perform a neural network based on neural network encoding technology. In the training process of the model, the encoded code stream of all the images in the first GOP and the neural network model obtained by training the neural network framework based on all the images in the first GOP are obtained. In addition, the model parameters of the neural network model can be transmitted to the decoding end through the second communication link, and the syntax elements using the retrained neural network model can be written in the code stream. For other GOPs of the to-be-coded image, a similar method can also be used for coding, which will not be repeated here.
在另一种实现方式中,假设编码端通过已有的视频场景训练集离线训练得到一个可用的神经网络模型1。编码端在编码待编码图像的第1个GOP时,确定神经网络模型1可以用于编码该第1个GOP,则可以用神经网络模型1编码该第1个GOP,获得第1个GOP内所有图像编码后的码流,并在码流中写入未使用重新训练的神经网络模型的语法元素。神经网络模型1的模型参数可以预设在解码端,也可以通过第二通信链路传输至解码端,不予限制。对于待编码图像的其他GOP,也可以采用类似的方法进行编码,再在此不再赘述。In another implementation, it is assumed that the encoder obtains a usable neural network model 1 through offline training of the existing video scene training set. When encoding the first GOP of the image to be encoded, the encoder determines that the neural network model 1 can be used to encode the first GOP, then the neural network model 1 can be used to encode the first GOP to obtain all the information in the first GOP. The codestream after the image is encoded, and the syntax elements of the neural network model without retraining are written in the codestream. The model parameters of the neural network model 1 may be preset at the decoding end, or may be transmitted to the decoding end through the second communication link, which is not limited. For other GOPs of the image to be coded, a similar method can also be used for coding, and details are not repeated here.
本申请提供的方案,编码端可以选择使用已有的基本的神经网络模型进行编码,也可以选择使用重新训练的神经网络模型进行编码,可以提高编码的灵活性。In the solution provided by this application, the encoding end can choose to use the existing basic neural network model for encoding, or choose to use the retrained neural network model for encoding, which can improve the flexibility of encoding.
可选地,在一些实施例中,所述码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。Optionally, in some embodiments, the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on one of image groups, frames, or sequences. trained.
本申请实施例中,通过第一通信链路传输的码流中可以包括第二指示信息,指示神经网络模型是基于图像组、帧、或序列中的一种进行训练的。In this embodiment of the present application, the code stream transmitted through the first communication link may include second indication information, indicating that the neural network model is trained based on one of image groups, frames, or sequences.
解码端在接收到编码端传输的码流后,可以通过码流中的第二指示信息确定神经网络模型的训练方式,并结合神经网络模型的训练方式和接收到的神经网络模型的模型参数确定神经网络模型,以及基于该神经网络模型对码流进行解码。After receiving the code stream transmitted by the encoding end, the decoding end can determine the training method of the neural network model through the second indication information in the code stream, and determine the training method of the neural network model in combination with the received model parameters of the neural network model. A neural network model, and decoding the code stream based on the neural network model.
上文指出,编码端通过第二通信链路传输神经网络模型的模型参数,在一些实现方式中,可以对该神经网络模型的模型参数进行处理,并将处理后的模型参数传输至解码端,具体请参见下文。It is pointed out above that the encoding end transmits the model parameters of the neural network model through the second communication link. In some implementations, the model parameters of the neural network model may be processed, and the processed model parameters are transmitted to the decoding end, See below for details.
可选地,在一些实施例中,所述方法300还包括:Optionally, in some embodiments, the method 300 further includes:
将所述神经网络模型的模型参数转换为目标格式;converting the model parameters of the neural network model into a target format;
对所述目标格式的模型参数进行压缩,得到压缩后的模型参数;compressing the model parameters of the target format to obtain the compressed model parameters;
所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:通过所述第二通信链路传输所述压缩后的模型参数。The transmitting through the second communication link the model parameters of the neural network model included in the neural network-based coding technology includes: transmitting the compressed model parameters through the second communication link.
相应地,对于解码端,上述步骤420中通过第二通信链路接收神经网络 模型的模型参数,包括:通过所述第二通信链路接收压缩后的模型参数;Correspondingly, for the decoding end, receiving the model parameters of the neural network model through the second communication link in the above-mentioned step 420, including: receiving the compressed model parameters through the second communication link;
上述步骤430中利用所述神经网络模型的模型参数对所述码流进行解码,包括:对所述压缩后的模型参数进行解压缩,以获得目标格式;对所述目标格式进行转换;利用转换后的格式的模型参数对所述码流进行解码。Decoding the code stream using the model parameters of the neural network model in the above step 430 includes: decompressing the compressed model parameters to obtain a target format; converting the target format; using the conversion The code stream is decoded according to the model parameters of the latter format.
本申请实施例中,编码端在利用神经网络模型对待编码图像进行编码后,可以对该神经网络模型的模型参数进行格式转换,以获得可以进行压缩处理的目标格式,通过对该目标格式进行压缩处理得到压缩后的模型参数,并将压缩后的模型参数通过第二通信链路传输至解码端。解码端在接收到压缩后的模型参数后,可以先对其进行解压缩,得到目标格式,然后再对该目标格式进行转换,获得码流所对应的神经网络模型的模型参数,并利用该神经网络模型的模型参数对码流进行解码。In the embodiment of the present application, after using the neural network model to encode the image to be encoded, the encoding end can perform format conversion on the model parameters of the neural network model to obtain a target format that can be compressed. By compressing the target format The compressed model parameters are obtained by processing, and the compressed model parameters are transmitted to the decoding end through the second communication link. After receiving the compressed model parameters, the decoding end can decompress it first to obtain the target format, and then convert the target format to obtain the model parameters of the neural network model corresponding to the code stream, and use the neural network model. The model parameters of the network model decode the code stream.
可选地,在一些实施例中,所述目标格式包括神经网络变换格式(Neural Network Exchange Format,NNEF)或开放神经网络变换(Open Neural Network Exchange)ONNX格式。Optionally, in some embodiments, the target format includes Neural Network Exchange Format (NNEF) or Open Neural Network Exchange (Open Neural Network Exchange) ONNX format.
本申请实施例中的NNEF和ONNX格式是两种相近的开放格式,用于表示和交换深度学习框架和推理引擎之间的神经网络。两种格式的核心都是基于一组常用操作,可以从这些操作中构建网络。The NNEF and ONNX formats in the embodiments of this application are two similar open formats used to represent and exchange neural networks between deep learning frameworks and inference engines. At their core, both formats are based on a set of common operations from which networks can be built.
NNEF通过支持多种设备和平台上的应用程序使用丰富的神经网络训练工具和推理引擎的组合,减少了机器学习部署的分散性,主要目标是能够从深度学习框架中导出网络并将其导入到硬件供应商的推理引擎中。NNEF reduces the fragmentation of machine learning deployments by enabling applications on multiple devices and platforms to use a rich combination of neural network training tools and inference engines, with the main goal of being able to export networks from deep learning frameworks and import them into in the hardware vendor's inference engine.
ONNX定义了一组通用的运算符-机器学习和深度学习模型的构建块以及一种通用的文件格式,以使人工智能(Artificial Intelligence,AI)开发人员可以将模型与各种框架、工具,运行时和编码器以及译码器一起使用。ONNX defines a common set of operators - building blocks for machine learning and deep learning models and a common file format to enable artificial intelligence (AI) developers to integrate models with various frameworks, tools, run When used with encoders and decoders.
这两种格式均可以存储常用的深度学习框架生成的神经网络模型,其目的是可以实现神经网络模型在不同深度学习框架之间的交互和通用。Both formats can store neural network models generated by commonly used deep learning frameworks. The purpose is to realize the interaction and commonality of neural network models between different deep learning frameworks.
本申请提供的方案,通过将神经网络模型的模型参数转换为目标格式,可以实现神经网络模型在不同深度学习框架之间的交互和通用;此外,对转换后得到的目标格式进行压缩并传输压缩后的模型参数,可以进一步节省带宽。In the solution provided by this application, by converting the model parameters of the neural network model into the target format, the interaction and generalization of the neural network model between different deep learning frameworks can be realized; in addition, the target format obtained after conversion is compressed and transmitted. The latter model parameters can further save bandwidth.
可选地,在一些实施例中,所述对所述目标格式的模型参数进行压缩,包括:利用动态图像专家组(Moving Pictures Experts Group,MPEG)的神 神经网络表达(Neural Network Representation,NNR)压缩方法对所述目标格式的模型参数进行压缩,以获得NNR的码流;Optionally, in some embodiments, the compressing the model parameters of the target format includes: using a neural network representation (Neural Network Representation, NNR) of the Moving Pictures Experts Group (MPEG) The compression method compresses the model parameters of the target format to obtain the code stream of the NNR;
所述通过所述第二通信链路传输所述压缩后的模型参数,包括:通过所述第二通信链路传输所述NNR的码流。The transmitting the compressed model parameters through the second communication link includes: transmitting the code stream of the NNR through the second communication link.
相应地,对于解码端,通过所述第二通信链路接收所述NNR的码流;并对所述NNR的码流进行解压缩。Correspondingly, for the decoding end, the code stream of the NNR is received through the second communication link; and the code stream of the NNR is decompressed.
本申请实施例中,NNR是一种采用类似于视频压缩编码的方式对神经网络模型进行表示和压缩的方法。通过采用权重稀疏化、网络参数剪枝、量化、低秩近似、预测编码、熵编码等方式将神经网络模型参数压缩成由多个NNR单元组成的NNR比特流(bitstream)。In this embodiment of the present application, NNR is a method for representing and compressing a neural network model in a manner similar to video compression coding. The neural network model parameters are compressed into an NNR bitstream composed of multiple NNR units by adopting methods such as weight sparsification, network parameter pruning, quantization, low-rank approximation, predictive coding, and entropy coding.
本申请实施例中,对于神经网络模型的训练可以通过以下方式进行训练。In this embodiment of the present application, the training of the neural network model may be performed in the following manner.
如图7所示,为本申请实施例提供的一种训练神经网络模型的流程示意图。参考图7,通过编码器和神经网络计算平台对待编码图像进行压缩编码得到神经网络模型,该神经网络模型的模型参数经过NNEF或ONNX格式的格式转换后,通过MPEG的NNR压缩方法对转换后获得的NNEF或ONNX格式的模型参数进行压缩,以获得NNR的码流。As shown in FIG. 7 , it is a schematic flowchart of training a neural network model according to an embodiment of the present application. Referring to Figure 7, a neural network model is obtained by compressing and encoding the image to be encoded by the encoder and the neural network computing platform. The model parameters in NNEF or ONNX format are compressed to obtain the NNR code stream.
编码端利用基于神经网络的编码技术对待编码图像进行编码时,可以基于图8a所示的过程进行编码。When the encoding end uses the neural network-based encoding technology to encode the to-be-encoded image, the encoding may be performed based on the process shown in FIG. 8a.
如图8a所示,为本申请实施例提供的一种视频编码器应用智能编码技术的流程示意图。参考图8a,在获得NNR的码流后,可以通过解压缩获得MPEG的NNR,该MPEG的NNR经过NNEF或ONNX格式的格式逆转换后得到神经网络模型的模型参数,并将得到的神经网络模型的模型参数部署于神经网络计算平台,编码器在对图像或视频进行编码时,可以结合部署了神经网络模型的模型参数的神经网络计算平台对其进行编码,生成编码码流。As shown in FIG. 8a , it is a schematic flowchart of applying an intelligent coding technology to a video encoder according to an embodiment of the present application. Referring to Fig. 8a, after obtaining the code stream of NNR, the NNR of MPEG can be obtained by decompression, and the NNR of the MPEG is inversely converted into the format of NNEF or ONNX to obtain the model parameters of the neural network model, and the obtained neural network model The model parameters of the model are deployed on the neural network computing platform. When the encoder encodes the image or video, it can be encoded in combination with the neural network computing platform on which the model parameters of the neural network model are deployed to generate an encoded code stream.
在具体实现中,可以通过对传统的编码器进行修改实现基于神经网络的编码技术进行的编码。示例性地,可以在编码头信息语法元素时在目前已有的头信息语法参数集中添加智能编码相关语法元素的编码,例如,可以在序列参数集(Sequence Parameter Set,SPS)、图像参数集(Picture Parameter Set,PPS)、条带头(Slice Header)中增加一个语法元素用来标识控制基于神经 网络的编码技术打开或者关闭的开关,可以直接将该语法元素添加到上述语法元素参数集中,也可以选择将该语法元素添加到上述语法元素参数集的用户扩展数据(User Extension Data)中。In the specific implementation, the coding based on the neural network coding technology can be realized by modifying the traditional encoder. Exemplarily, when coding the header information syntax element, the coding of the intelligent coding-related syntax element can be added to the existing header information syntax parameter set, for example, the sequence parameter set (Sequence Parameter Set, SPS), image parameter set ( Picture Parameter Set (PPS), Slice Header (Slice Header) adds a syntax element to identify the switch that controls the opening or closing of the neural network-based coding technology. This syntax element can be directly added to the above syntax element parameter set, or you can Select to add this syntax element to the User Extension Data of the above syntax element parameter set.
对于解码端,可以基于图8b所示的过程进行解码。For the decoding end, decoding can be performed based on the process shown in Figure 8b.
如图8b所示,为本申请实施例提供的一种视频解码器应用智能编码技术的流程示意图。参考图8b,在接收到NNR的码流后,可以通过解压缩获得MPEG的NNR,该MPEG的NNR经过NNEF或ONNX格式的格式逆转换后得到神经网络模型的模型参数,并将得到的神经网络模型的模型参数部署于神经网络计算平台,解码器在对待解码的图像或视频的码流进行解码时,可以结合部署了神经网络模型的模型参数的神经网络计算平台对其进行解码,获得待解码的图像或视频。As shown in FIG. 8b , it is a schematic flowchart of applying an intelligent coding technology to a video decoder according to an embodiment of the present application. Referring to Figure 8b, after receiving the NNR code stream, the MPEG NNR can be obtained by decompression, and the MPEG NNR can be inversely converted in the NNEF or ONNX format to obtain the model parameters of the neural network model, and the obtained neural network The model parameters of the model are deployed on the neural network computing platform. When the decoder decodes the code stream of the image or video to be decoded, it can be decoded in combination with the neural network computing platform on which the model parameters of the neural network model are deployed to obtain the to-be-decoded image. image or video.
可选地,在一些实施例中,所述对所述目标格式的模型参数进行压缩,包括:利用人工智能产业技术创新战略联盟(Artificial Intelligence Industry Technology Innovation Strategic Alliance,AITISA)的压缩方法对所述目标格式的模型参数进行压缩,以获得压缩数据;Optionally, in some embodiments, the compressing the model parameters of the target format includes: using a compression method of the Artificial Intelligence Industry Technology Innovation Strategic Alliance (AITISA) to compress the The model parameters in the target format are compressed to obtain compressed data;
所述通过所述第二通信链路传输所述压缩后的模型参数,包括:通过所述第二通信链路传输所述压缩数据。The transmitting the compressed model parameters through the second communication link includes: transmitting the compressed data through the second communication link.
相应地,对于解码端,通过所述第二通信链路接收所述压缩数据;并对所述压缩数据进行解压缩。Correspondingly, for the decoding end, the compressed data is received through the second communication link; and the compressed data is decompressed.
本申请实施例中,编码端利用AITISA的压缩方法对目标格式的模型参数进行压缩,并将压缩后获得的压缩数据通过第二通信链路传输;解码端接收到该压缩数据,对其进行解压缩。具体实现过程与上述利用MPEG的NNR的压缩编码以及解码的过程类似,在此不再赘述。In the embodiment of the present application, the encoding end uses the AITISA compression method to compress the model parameters of the target format, and transmits the compressed data obtained after compression through the second communication link; the decoding end receives the compressed data, and decompresses it. compression. The specific implementation process is similar to the above-mentioned process of compression encoding and decoding using the NNR of MPEG, which is not repeated here.
本申请实施例中的压缩数据也可以为称为压缩码流,不予限制。The compressed data in this embodiment of the present application may also be called a compressed code stream, which is not limited.
可选地,在一些实施例中,所述码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。Optionally, in some embodiments, the code stream further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
本申请实施例中,通过第一通信链路传输的码流中还可以包括第三指示信息,指示编码端在编码时是否开启了利用神经网络的编码技术进行编码。解码端在接收到码流后,可以根据该码流中的第三指示信息确定是否利用神经网络的解码技术进行解码。In this embodiment of the present application, the code stream transmitted through the first communication link may further include third indication information, indicating whether the encoding end has enabled encoding using the neural network encoding technology during encoding. After receiving the code stream, the decoding end can determine whether to use the decoding technology of the neural network to perform decoding according to the third indication information in the code stream.
若该第三指示信息指示开启了利用神经网络的编码技术编码,则解码端 利用神经网络的解码技术进行解码;若该第三指示信息指示未开启利用神经网络的编码技术编码,则解码端不利用神经网络的解码技术进行解码。If the third indication information indicates that encoding using the neural network encoding technology is enabled, the decoding end uses the neural network decoding technology to decode; if the third indication information indicates that the neural network encoding technology encoding is not enabled, the decoding end does not Use the decoding technology of neural network to decode.
示例性地,假设分别以“1”和“0”表示利用神经网络的编码技术进行编码的开启和关闭,若码流中的第三指示信息指示“1”,则解码端在接收到该指示信息后,则确定利用神经网络的解码技术进行解码;若码流中的第三指示信息指示“0”,则解码端在接收到该指示信息后,则确定不利用神经网络的解码技术进行解码。Exemplarily, it is assumed that "1" and "0" are used to represent the opening and closing of encoding using the coding technology of the neural network. If the third indication information in the code stream indicates "1", the decoding end receives the indication. After receiving the information, it is determined to use the decoding technology of the neural network for decoding; if the third indication information in the code stream indicates "0", after receiving the indication information, the decoding end determines not to use the decoding technology of the neural network for decoding. .
应理解,上述以“1”和“0”表示利用神经网络的编码技术进行编码的开启和关闭仅为示例性说明,还可以用其他标识(如“a”、“b”等)表示利用神经网络的编码技术进行编码的开启和关闭,不应对本申请造成特别限定。It should be understood that the above-mentioned "1" and "0" to indicate that the encoding using the neural network coding technology is turned on and off is only an exemplary illustration, and other identifiers (such as "a", "b", etc.) can also be used to indicate that the neural network is used. The coding technology of the network performs coding on and off, and this application should not be particularly limited.
本申请提供的方案,通过在码流中增加第三指示信息,指示是否开启了利用神经网络的编码技术进行编码,解码端在接收到该码流后,根据该码流中的第三指示信息确定是否利用神经网络的解码技术进行解码,可以进一步提高解码的准确率。In the solution provided by the present application, the third indication information is added to the code stream to indicate whether encoding using the neural network coding technology is enabled. Determining whether to use the decoding technology of the neural network for decoding can further improve the accuracy of decoding.
可选地,在一些实施例中,所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:Optionally, in some embodiments, the transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
通过所述第二通信链路传输所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are transmitted over the second communication link.
相应地,对于解码端,通过所述第二通信链路接收所述神经网络模型的部分模型参数或全部模型参数。Correspondingly, for the decoding end, part or all of the model parameters of the neural network model are received through the second communication link.
本申请实施例中,通过第二通信链路传输的神经网络模型的模型参数可以为该神经网络模型的模型参数的部分模型参数或全部模型参数。其中该部分模型参数可以包括但不限于该神经网络的层数、神经元的个数,其它模型参数(包括但不限制于神经元之间连接的权重等)可以预设在编码端和解码端。In this embodiment of the present application, the model parameters of the neural network model transmitted through the second communication link may be some model parameters or all model parameters of the model parameters of the neural network model. The part of the model parameters may include but not limited to the number of layers of the neural network, the number of neurons, and other model parameters (including but not limited to the weight of connections between neurons, etc.) can be preset at the encoding end and the decoding end .
解码端通过第二通信链路接收到神经网络模型的模型参数的部分模型参数后,可以结合预设在解码端的其它模型参数对待解码图像的码流进行解码;或者,通过第二通信链路接收到神经网络模型的模型参数的全部模型参数对待解码图像的码流进行解码。After the decoding end receives part of the model parameters of the model parameters of the neural network model through the second communication link, it can decode the code stream of the image to be decoded in combination with other model parameters preset at the decoding end; or, receive through the second communication link All model parameters to the model parameters of the neural network model decode the code stream of the image to be decoded.
本申请提供的方案,编码端通过第二通信链路传输神经网络模型的模型参数的部分模型参数,其它模型参数预设在编码端和解码端,可以节省传输 带宽,解码端在对码流进行解码时可以保证解码的准确性;编码端通过第二信链路传输神经网络模型的模型参数的全部模型参数,解码端在对码流进行解码时可以保证解码的准确性。In the solution provided by this application, the encoding end transmits some model parameters of the model parameters of the neural network model through the second communication link, and other model parameters are preset at the encoding end and the decoding end, which can save the transmission bandwidth, and the decoding end is performing the code stream processing. The decoding accuracy can be guaranteed during decoding; the encoding end transmits all model parameters of the model parameters of the neural network model through the second signal link, and the decoding end can ensure the decoding accuracy when decoding the code stream.
上文结合图1-图8b,详细描述了本申请的方法实施例,下面结合图9-图13,描述本申请的装置实施例,装置实施例与方法实施例相互对应,因此未详细描述的部分可参见前面各部分方法实施例。The method embodiments of the present application are described in detail above with reference to FIGS. 1 to 8 b , and the apparatus embodiments of the present application are described below with reference to FIGS. 9 to 13 . The apparatus embodiments and method embodiments correspond to each other, and therefore are not described in detail. In part, please refer to the method embodiments of the previous parts.
图9为本申请一实施例提供的一种编码装置900,该编码装置900可以包括处理器910。FIG. 9 is an encoding apparatus 900 provided by an embodiment of the present application. The encoding apparatus 900 may include a processor 910 .
处理器910,用于利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流;A processor 910, configured to encode an image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded;
通过第一通信链路传输所述待编码图像的码流;transmitting the code stream of the to-be-encoded image through the first communication link;
通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。The model parameters of the neural network model included in the neural network-based encoding technique are transmitted over the second communication link.
第一通信链路可以是第一通信模块提供的,第二通信链路可以是第二图像模块提供的。例如,第一通信模块为SDR模块/wifi模块,第二通信模块为4G/5G模块。The first communication link may be provided by the first communication module, and the second communication link may be provided by the second image module. For example, the first communication module is an SDR module/wifi module, and the second communication module is a 4G/5G module.
可选地,在一些实施例中,所述第一通信链路和所述第二通信链路具有不同的物理特性。Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
可选地,在一些实施例中,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link. a communication link.
可选地,在一些实施例中,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。Optionally, in some embodiments, the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a link with a bandwidth greater than or equal to a second threshold.
可选地,在一些实施例中,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。Optionally, in some embodiments, the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link includes a link based on a mobile communication protocol.
可选地,在一些实施例中,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。Optionally, in some embodiments, the private image transmission protocol includes a software-defined radio SDR protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
可选地,在一些实施例中,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。Optionally, in some embodiments, the first communication link includes a private image transmission link, and the second communication link includes a public network transmission link.
可选地,在一些实施例中,所述神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型。Optionally, in some embodiments, the neural network model includes an offline trained neural network model or an online trained neural network model.
可选地,在一些实施例中,所述神经网络模型为在线训练的神经网络模型,所述处理器910进一步用于:Optionally, in some embodiments, the neural network model is an online training neural network model, and the processor 910 is further configured to:
对于第n个目标图像,利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;所述n为大于等于1的整数,所述m为大于等于1的整数。For the n-th target image, the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
利用对所述第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行滤波,所述第n-m个目标图像为第n-m个编码图像未经过神经网络模型进行滤波的图像,所述第n个目标图像为第n个编码图像未经过神经网络模型进行滤波的图像。The n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
可选地,在一些实施例中,所述m为编码前固化到或预设在编码端的参数;或,所述m为编码过程中形成的参数。Optionally, in some embodiments, the m is a parameter that is solidified or preset at the encoding end before encoding; or, the m is a parameter formed during the encoding process.
可选地,在一些实施例中,所述神经网络模型为在线训练的神经网络模型,所述处理器910进一步用于:Optionally, in some embodiments, the neural network model is an online training neural network model, and the processor 910 is further configured to:
对于待编码的第一目标图像,利用对已编码的第二目标图像进行训练获得的神经网络模型对所述第一目标图像进行编码,所述第一目标图像和所述第二目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;For the first target image to be encoded, the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair The image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
通过所述第二通信链路传输对所述第二目标图像进行训练获得的神经网络模型的模型参数。The model parameters of the neural network model obtained by training the second target image are transmitted through the second communication link.
可选地,在一些实施例中,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
可选地,在一些实施例中,所述q为编码前固化到或预设在编码端的参数。Optionally, in some embodiments, the q is a parameter that is fixed or preset at the encoding end before encoding.
可选地,在一些实施例中,所述神经网络模型为离线训练的神经网络模型,所述处理器910进一步用于:Optionally, in some embodiments, the neural network model is an offline training neural network model, and the processor 910 is further configured to:
对于第p个目标图像,利用所述离线训练的神经网络模型对所述第p个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像,所述p为大于或等于0的整数。For the p-th target image, the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level. The image after being divided by any one of , the p is an integer greater than or equal to 0.
可选地,在一些实施例中,所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate a neural network model used when encoding the image to be encoded The identity of the model parameters.
可选地,在一些实施例中,若所述待编码图像为关键帧,则所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, if the to-be-encoded image is a key frame, the code stream of the to-be-encoded image further includes first indication information, where the first indication information is used to indicate that the The identification of the model parameters of the neural network model used when the image to be encoded is encoded.
可选地,在一些实施例中,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行编码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when encoding other frames between the current key frame and the next key frame.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
通过所述第二通信链路传输所述神经网络模型的模型参数以及所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
可选地,在一些实施例中,所述待编码图像的码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。Optionally, in some embodiments, the code stream of the to-be-encoded image further includes second indication information, where the second indication information is used to indicate that the neural network model is based on an image group, frame, or sequence. a kind of training.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
将所述神经网络模型的模型参数转换为目标格式;converting the model parameters of the neural network model into a target format;
对所述目标格式的模型参数进行压缩,得到压缩后的模型参数;compressing the model parameters of the target format to obtain the compressed model parameters;
通过所述第二通信链路传输所述压缩后的模型参数。The compressed model parameters are transmitted over the second communication link.
可选地,在一些实施例中,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。Optionally, in some embodiments, the target format includes a neural network transformation format NNEF or an open neural network transformation ONNX format.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
利用动态图像专家组MPEG的神经网络表达NNR压缩方法对所述目标格式的模型参数进行压缩,以获得NNR的码流;Utilize the neural network expression NNR compression method of the Moving Picture Experts Group MPEG to compress the model parameters of the target format to obtain the code stream of NNR;
通过所述第二通信链路传输所述NNR的码流。The code stream of the NNR is transmitted through the second communication link.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
利用人工智能产业技术创新战略联盟AITISA的压缩方法对所述目标格式的模型参数进行压缩,以获得压缩数据;Compress the model parameters of the target format by using the compression method of the artificial intelligence industry technology innovation strategic alliance AITISA to obtain compressed data;
通过所述第二通信链路传输所述压缩数据。The compressed data is transmitted over the second communication link.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
利用所述基于神经网络的编码技术对所述待编码图像进行预测、变换、 量化、熵编码或滤波中的至少一种。At least one of prediction, transformation, quantization, entropy encoding or filtering is performed on the to-be-encoded image by using the neural network-based encoding technology.
可选地,在一些实施例中,所述待编码图像的码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。Optionally, in some embodiments, the code stream of the image to be encoded further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
可选地,在一些实施例中,所述处理器910进一步用于:Optionally, in some embodiments, the processor 910 is further configured to:
通过所述第二通信链路传输所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are transmitted over the second communication link.
可选地,在一些实施例中,所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of the following links:
基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
图10为本申请一实施例提供的一种解码装置1000,该解码装置1000可以包括处理器1010。FIG. 10 is a decoding apparatus 1000 provided by an embodiment of the present application. The decoding apparatus 1000 may include a processor 1010 .
处理器1010,用于通过第一通信链路接收待解码图像的码流;a processor 1010, configured to receive the code stream of the image to be decoded through the first communication link;
通过第二通信链路接收神经网络模型的模型参数;receiving model parameters of the neural network model via the second communication link;
利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。The code stream is decoded using the model parameters of the neural network model to obtain a decoded image.
可选地,在一些实施例中,所述第一通信链路和所述第二通信链路具有不同的物理特性。Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
可选地,在一些实施例中,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the second communication link. a communication link.
可选地,在一些实施例中,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。Optionally, in some embodiments, the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a link with a bandwidth greater than or equal to a second threshold.
可选地,在一些实施例中,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。Optionally, in some embodiments, the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link includes a link based on a mobile communication protocol.
可选地,在一些实施例中,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。Optionally, in some embodiments, the private image transmission protocol includes a software-defined radio SDR protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
可选地,在一些实施例中,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。Optionally, in some embodiments, the first communication link includes a private image transmission link, and the second communication link includes a public network transmission link.
可选地,在一些实施例中,所述神经网络模型包括离线训练的神经网络 模型或在线训练的神经网络模型。Optionally, in some embodiments, the neural network model includes an offline trained neural network model or an online trained neural network model.
可选地,在一些实施例中,所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;Optionally, in some embodiments, the code stream further includes first indication information, where the first indication information is used to indicate the identifier of the model parameter of the neural network model used by the encoding end when encoding the image to be encoded. ;
所述处理器1010进一步用于:The processor 1010 is further configured to:
利用所述模型参数的标识对应的神经网络模型的模型参数对所述码流进行解码。The code stream is decoded by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
可选地,在一些实施例中,若所述码流中待解码图像为关键帧,则所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;Optionally, in some embodiments, if the image to be decoded in the code stream is a key frame, the code stream further includes first indication information, and the first indication information is used to indicate that the encoding end is to be encoded. The identification of the model parameters of the neural network model used when the image is encoded;
所述处理器1010进一步用于:The processor 1010 is further configured to:
利用所述神经网络模型的模型参数和所述模型参数的标识对所述码流进行解码。The code stream is decoded using the model parameters of the neural network model and the identification of the model parameters.
可选地,在一些实施例中,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行解码时采用的神经网络模型的模型参数的标识。Optionally, in some embodiments, the first indication information is further used to indicate an identifier of a model parameter of the neural network model used when decoding other frames between the current key frame and the next key frame.
可选地,在一些实施例中,所述神经网络模型为在线训练的神经网络模型,所述处理器1010进一步用于:Optionally, in some embodiments, the neural network model is an online training neural network model, and the processor 1010 is further configured to:
利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码,以获得所述第一目标图像,所述第一目标图像和所述第二目标图像为待解码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像。Decode the code stream of the first target image using the received model parameters of the neural network model obtained by training the second target image to obtain the first target image, the first target image and the second target image. The target image is an image obtained by dividing the image to be decoded according to any one of the video sequence level, the picture group level, and the picture level.
可选地,在一些实施例中,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
可选地,在一些实施例中,所述q为解码前固化到或预设在解码端的参数。Optionally, in some embodiments, the q is a parameter that is solidified or preset at the decoding end before decoding.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
通过所述第二通信链路接收所述神经网络模型的模型参数和所述神经网络模型模型参数对应的标识。The model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are received through the second communication link.
可选地,在一些实施例中,所述码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进 行训练的。Optionally, in some embodiments, the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on one of image groups, frames, or sequences. trained.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
通过所述第二通信链路接收压缩后的模型参数;receiving compressed model parameters via the second communication link;
对所述压缩后的模型参数进行解压缩,以获得目标格式;decompressing the compressed model parameters to obtain a target format;
对所述目标格式进行转换;converting the target format;
利用转换后的格式的模型参数对所述码流进行解码。The code stream is decoded using the model parameters in the converted format.
可选地,在一些实施例中,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。Optionally, in some embodiments, the target format includes a neural network transformation format NNEF or an open neural network transformation ONNX format.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
通过所述第二通信链路接收神经网络表达NNR的码流;Receive the code stream expressing NNR by the neural network through the second communication link;
对所述NNR的码流进行解压缩。Decompress the code stream of the NNR.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
通过所述第二通信链路接收压缩数据;receiving compressed data over the second communication link;
对所述压缩数据进行解压缩。The compressed data is decompressed.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
利用所述神经网络的模型参数对所述码流进行熵解码、反量化、反变换、预测重建或滤波中的至少一种。At least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction or filtering is performed on the code stream using the model parameters of the neural network.
可选地,在一些实施例中,所述码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。Optionally, in some embodiments, the code stream further includes third indication information, where the third indication information is used to indicate whether encoding using a neural network encoding technique is enabled.
可选地,在一些实施例中,所述处理器1010进一步用于:Optionally, in some embodiments, the processor 1010 is further configured to:
通过所述第二通信链路接收所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are received over the second communication link.
可选地,在一些实施例中,所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of the following links:
基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
图11是本申请再一实施例提供的编码装置的示意性结构图。图11所示的编码装置1100包括处理器1110,处理器1110可以从存储器中调用并运行计算机程序,以实现本申请实施例中所述的编码方法。FIG. 11 is a schematic structural diagram of an encoding apparatus provided by still another embodiment of the present application. The encoding apparatus 1100 shown in FIG. 11 includes a processor 1110, and the processor 1110 can call and run a computer program from a memory to implement the encoding method described in the embodiments of the present application.
可选地,如图11所示,编码装置1100还可以包括存储器1120。其中,处理器1110可以从存储器1120中调用并运行计算机程序,以实现本申请实 施例中的编码方法。Optionally, as shown in FIG. 11 , the encoding apparatus 1100 may further include a memory 1120 . The processor 1110 may call and run a computer program from the memory 1120 to implement the encoding method in the embodiments of the present application.
其中,存储器1120可以是独立于处理器1110的一个单独的器件,也可以集成在处理器1110中。The memory 1120 may be a separate device independent of the processor 1110, or may be integrated in the processor 1110.
可选地,如图11所示,编码装置1100还可以包括收发器1130,处理器1110可以控制该收发器1130与其他装置进行通信,具体地,可以向其他装置发送信息或数据,或接收其他装置发送的信息或数据。Optionally, as shown in FIG. 11 , the encoding device 1100 may further include a transceiver 1130, and the processor 1110 may control the transceiver 1130 to communicate with other devices, specifically, may send information or data to other devices, or receive other devices Information or data sent by the device.
可选地,该编码装置例如可以是编码器、终端(包括但不限于手机、相机、无人机等),并且该编码装置可以实现本申请实施例的各个编码方法中的相应流程,为了简洁,在此不再赘述。Optionally, the encoding device can be, for example, an encoder, a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the encoding device can implement the corresponding processes in the encoding methods of the embodiments of the present application, for the sake of brevity. , and will not be repeated here.
图12是本申请再一实施例提供的解码装置的示意性结构图。图12所示的解码装置1200包括处理器1210,处理器1210可以从存储器中调用并运行计算机程序,以实现本申请实施例中所述的解码方法。FIG. 12 is a schematic structural diagram of a decoding apparatus provided by still another embodiment of the present application. The decoding apparatus 1200 shown in FIG. 12 includes a processor 1210, and the processor 1210 can call and run a computer program from a memory to implement the decoding method described in the embodiments of the present application.
可选地,如图12所示,解码装置1200还可以包括存储器1220。其中,处理器1210可以从存储器1220中调用并运行计算机程序,以实现本申请实施例中的解码方法。Optionally, as shown in FIG. 12 , the decoding apparatus 1200 may further include a memory 1220 . The processor 1210 may call and run a computer program from the memory 1220 to implement the decoding method in the embodiment of the present application.
其中,存储器1220可以是独立于处理器1210的一个单独的器件,也可以集成在处理器1210中。The memory 1220 may be a separate device independent of the processor 1210, or may be integrated in the processor 1210.
可选地,如图12所示,解码装置1200还可以包括收发器1230,处理器1210可以控制该收发器1230与其他装置进行通信,具体地,可以向其他装置发送信息或数据,或接收其他装置发送的信息或数据。Optionally, as shown in FIG. 12 , the decoding apparatus 1200 may further include a transceiver 1230, and the processor 1210 may control the transceiver 1230 to communicate with other apparatuses, specifically, may send information or data to other apparatuses, or receive other apparatuses Information or data sent by the device.
可选地,该解码装置例如可以是解码器、终端(包括但不限于手机、相机、无人机等),并且该解码装置可以实现本申请实施例的各个解码方法中的相应流程,为了简洁,在此不再赘述。Optionally, the decoding device can be, for example, a decoder, a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the decoding device can implement the corresponding processes in the decoding methods in the embodiments of the present application, for the sake of brevity. , and will not be repeated here.
图13是本申请实施例的芯片的示意性结构图。图13所示的芯片1300包括处理器1310,处理器1310可以从存储器中调用并运行计算机程序,以实现本申请实施例中的编码方法或解码方法。FIG. 13 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip 1300 shown in FIG. 13 includes a processor 1310, and the processor 1310 can call and run a computer program from a memory to implement the encoding method or the decoding method in the embodiments of the present application.
可选地,如图13所示,芯片1300还可以包括存储器1320。其中,处理器1310可以从存储器1320中调用并运行计算机程序,以实现本申请实施例中的编码方法或解码方法。Optionally, as shown in FIG. 13 , the chip 1300 may further include a memory 1320 . The processor 1310 may call and run a computer program from the memory 1320 to implement the encoding method or the decoding method in the embodiments of the present application.
其中,存储器1320可以是独立于处理器1310的一个单独的器件,也可以集成在处理器1310中。The memory 1320 may be a separate device independent of the processor 1310, or may be integrated in the processor 1310.
可选地,该芯片1300还可以包括输入接口1330。其中,处理器1310可以控制该输入接口1330与其他装置或芯片进行通信,具体地,可以获取其他装置或芯片发送的信息或数据。Optionally, the chip 1300 may further include an input interface 1330 . The processor 1310 can control the input interface 1330 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.
可选地,该芯片1300还可以包括输出接口1340。其中,处理器1310可以控制该输出接口1340与其他装置或芯片进行通信,具体地,可以向其他装置或芯片输出信息或数据。Optionally, the chip 1300 may further include an output interface 1340 . The processor 1310 can control the output interface 1340 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.
应理解,本申请实施例提到的芯片还可以称为***级芯片,***芯片,芯片***或片上***芯片等。It should be understood that the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-chip, or a system-on-a-chip, or the like.
应理解,本申请实施例的处理器可能是一种集成电路图像处理***,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be understood that the processor in this embodiment of the present application may be an integrated circuit image processing system, which has signal processing capability. In the implementation process, each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other available Programming logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、 双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的***和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Wherein, the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which acts as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (Direct Rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.
本申请实施例中的存储器可以向处理器提供指令和数据。存储器的一部分还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。该处理器可以用于执行存储器中存储的指令,并且该处理器执行该指令时,该处理器可以执行上述方法实施例中与终端设备对应的各个步骤。The memory in the embodiments of the present application may provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information. The processor may be configured to execute the instruction stored in the memory, and when the processor executes the instruction, the processor may execute each step corresponding to the terminal device in the foregoing method embodiments.
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器执行存储器中的指令,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.
还应理解,在本申请实施例中,图像中的像素点可以位于不同的行和/或列,其中,A的长度可以对应于A包括的位于同一行的像素点个数,A的高度可以对应于A包括的位于同一列的像素点个数。此外,A的长度和高度也可以分别称为A的宽度和深度,本申请实施例对此不做限定。It should also be understood that, in this embodiment of the present application, the pixels in the image may be located in different rows and/or columns, wherein the length of A may correspond to the number of pixels located in the same row included in A, and the height of A may be Corresponds to the number of pixels in the same column included in A. In addition, the length and height of A may also be referred to as the width and depth of A, respectively, which are not limited in this embodiment of the present application.
还应理解,在本申请实施例中,“与A的边界间隔分布”可以指与A的边界间隔至少一个像素点,也可以称为“不与A的边界相邻”或者“不位于A的边界”,本申请实施例对此不做限定,其中,A可以是图像、矩形区域或子图像,等等。It should also be understood that, in this embodiment of the present application, "distributed at the boundary of A" may refer to at least one pixel point away from the boundary of A, and may also be referred to as "not adjacent to the boundary of A" or "not located at the boundary of A". "Boundary", which is not limited in this embodiment of the present application, where A may be an image, a rectangular area, or a sub-image, and so on.
还应理解,上文对本申请实施例的描述着重于强调各个实施例之间的不同之处,未提到的相同或相似之处可以互相参考,为了简洁,这里不再赘述。It should also be understood that the above description of the embodiments of the present application focuses on emphasizing the differences between the various embodiments, and the unmentioned same or similar points can be referred to each other, and are not repeated here for brevity.
本申请实施例还提供了一种计算机可读存储介质,用于存储计算机程序。Embodiments of the present application further provide a computer-readable storage medium for storing a computer program.
可选的,该计算机可读存储介质可应用于本申请实施例中的编码装置或解码装置,并且该计算机程序使得计算机执行本申请实施例的各个方法中由 编码装置或解码装置实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer-readable storage medium can be applied to the encoding device or the decoding device in the embodiments of the present application, and the computer program enables the computer to execute the corresponding processes implemented by the encoding device or the decoding device in each method of the embodiments of the present application. , for brevity, will not be repeated here.
本申请实施例还提供了一种计算机程序产品,包括计算机程序指令。Embodiments of the present application also provide a computer program product, including computer program instructions.
可选的,该计算机程序产品可应用于本申请实施例中的编码装置或解码装置,并且该计算机程序指令使得计算机执行本申请实施例的各个方法中由编码装置或解码装置实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program product can be applied to the encoding device or the decoding device in the embodiments of the present application, and the computer program instructions cause the computer to execute the corresponding processes implemented by the encoding device or the decoding device in each method of the embodiments of the present application, For brevity, details are not repeated here.
本申请实施例还提供了一种计算机程序。The embodiments of the present application also provide a computer program.
可选的,该计算机程序可应用于本申请实施例中的编码装置或解码装置,当该计算机程序在计算机上运行时,使得计算机执行本申请实施例的各个方法中由编码装置或解码装置实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program can be applied to the encoding device or the decoding device in the embodiments of the present application. When the computer program is run on the computer, the encoding device or the decoding device implements each method in the computer to perform the embodiments of the present application. The corresponding process, for the sake of brevity, will not be repeated here.
应理解,在本申请实施例中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that, in this embodiment of the present application, the term "and/or" is only an association relationship for describing associated objects, indicating that there may be three kinds of relationships. For example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this text generally indicates that the related objects are an "or" relationship.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions, these modifications or substitutions shall be covered by the protection scope of this application
之内。因此,本申请的保护范围应以权利要求的保护范围为准。within. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (108)

  1. 一种编码方法,其特征在于,包括:An encoding method, comprising:
    利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流;Utilize neural network-based coding technology to encode the to-be-coded image to obtain a code stream of the to-be-coded image;
    通过第一通信链路传输所述待编码图像的码流;transmitting the code stream of the to-be-encoded image through the first communication link;
    通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。The model parameters of the neural network model included in the neural network-based encoding technique are transmitted over the second communication link.
  2. 根据权利要求1所述的方法,其特征在于,所述第一通信链路和所述第二通信链路具有不同的物理特性。The method of claim 1, wherein the first communication link and the second communication link have different physical characteristics.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。The method according to claim 1 or 2, wherein the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link above the first communication link.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。The method according to any one of claims 1 to 3, wherein the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a bandwidth greater than or equal to Links equal to the second threshold.
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。The method according to any one of claims 1 to 3, wherein the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link includes a mobile-based The link of the communication protocol.
  6. 根据权利要求5所述的方法,其特征在于,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。The method according to claim 5, wherein the private image transmission protocol includes a software defined radio (SDR) protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
  7. 根据权利要求1或2所述的方法,其特征在于,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。The method according to claim 1 or 2, wherein the first communication link comprises a private image transmission link, and the second communication link comprises a public network transmission link.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型。The method according to any one of claims 1 to 7, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
  9. 根据权利要求1或8所述的方法,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:The method according to claim 1 or 8, wherein the neural network model is an online training neural network model, and the encoding of the image to be encoded by using a neural network-based encoding technology includes:
    对于第n个目标图像,利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,所述目标图像为对所述待编码图像 按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;所述n为大于等于1的整数,所述m为大于等于1的整数。For the n-th target image, the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
  10. 根据权利要求9所述的方法,其特征在于,所述利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,包括:The method according to claim 9, wherein the encoding the n-th target image by using a neural network model obtained by training the n-m-th target images comprises:
    利用对所述第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行滤波,所述第n-m个目标图像为第n-m个编码图像未经过神经网络模型进行滤波的图像,所述第n个目标图像为第n个编码图像未经过神经网络模型进行滤波的图像。The n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
  11. 根据权利要求9或10所述的方法,其特征在于,所述m为编码前固化到或预设在编码端的参数;或,所述m为编码过程中形成的参数。The method according to claim 9 or 10, wherein the m is a parameter that is solidified or preset at the encoding end before encoding; or, the m is a parameter formed during the encoding process.
  12. 根据权利要求1或8所述的方法,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:The method according to claim 1 or 8, wherein the neural network model is an online training neural network model, and the encoding of the image to be encoded by using a neural network-based encoding technology includes:
    对于待编码的第一目标图像,利用对已编码的第二目标图像进行训练获得的神经网络模型对所述第一目标图像进行编码,所述第一目标图像和所述第二目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;For the first target image to be encoded, the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair The image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
    所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:The transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
    通过所述第二通信链路传输对所述第二目标图像进行训练获得的神经网络模型的模型参数。The model parameters of the neural network model obtained by training the second target image are transmitted through the second communication link.
  13. 根据权利要求12所述的方法,其特征在于,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。The method according to claim 12, wherein the images of the first target image and the second target image are separated by q target images, and q is a positive integer greater than or equal to 0.
  14. 根据权利要求13所述的方法,其特征在于,所述q为编码前固化到或预设在编码端的参数。The method according to claim 13, wherein the q is a parameter that is solidified or preset at the encoding end before encoding.
  15. 根据权利要求1或8所述的方法,其特征在于,所述神经网络模型为离线训练的神经网络模型,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:The method according to claim 1 or 8, wherein the neural network model is an offline training neural network model, and the encoding of the image to be encoded by using a neural network-based encoding technology includes:
    对于第p个目标图像,利用所述离线训练的神经网络模型对所述第p个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图 像组级、图像级中的任意一种进行划分后的图像,所述p为大于或等于0的整数。For the p-th target image, the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level. The image after being divided by any one of , the p is an integer greater than or equal to 0.
  16. 根据权利要求1至11或15中任一项所述的方法,其特征在于,所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。The method according to any one of claims 1 to 11 or 15, wherein the code stream of the image to be encoded further includes first indication information, and the first indication information is used to indicate that the The identification of the model parameters of the neural network model used when the image to be encoded is encoded.
  17. 根据权利要求1至11或15中任一项所述的方法,其特征在于,若所述待编码图像为关键帧,则所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。The method according to any one of claims 1 to 11 or 15, wherein if the to-be-encoded image is a key frame, the code stream of the to-be-encoded image further includes first indication information, the The first indication information is used to indicate an identifier of a model parameter of the neural network model used when the image to be encoded is encoded.
  18. 根据权利要求17所述的方法,其特征在于,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行编码时采用的神经网络模型的模型参数的标识。The method according to claim 17, wherein the first indication information is further used to indicate the model parameters of the neural network model used when encoding other frames between the current key frame and the next key frame. logo.
  19. 根据权利要求1至18中任一项所述的方法,其特征在于,所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:The method according to any one of claims 1 to 18, wherein the transmitting, through the second communication link, the model parameters of the neural network model included in the neural network-based encoding technology, comprising:
    通过所述第二通信链路传输所述神经网络模型的模型参数以及所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
  20. 根据权利要求1至19中任一项所述的方法,其特征在于,所述待编码图像的码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。The method according to any one of claims 1 to 19, wherein the code stream of the image to be encoded further includes second indication information, and the second indication information is used to indicate that the neural network model is Trained on one of image groups, frames, or sequences.
  21. 根据权利要求1至20中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 20, wherein the method further comprises:
    将所述神经网络模型的模型参数转换为目标格式;converting the model parameters of the neural network model into a target format;
    对所述目标格式的模型参数进行压缩,得到压缩后的模型参数;compressing the model parameters of the target format to obtain the compressed model parameters;
    所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:The transmission of the model parameters of the neural network model included in the neural network-based coding technology through the second communication link includes:
    通过所述第二通信链路传输所述压缩后的模型参数。The compressed model parameters are transmitted over the second communication link.
  22. 根据权利要求21所述的方法,其特征在于,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。The method according to claim 21, wherein the target format comprises a neural network transformation format NNEF or an open neural network transformation ONNX format.
  23. 根据权利要求21或22所述的方法,其特征在于,所述对所述目标格式的模型参数进行压缩,包括:The method according to claim 21 or 22, wherein the compressing the model parameters of the target format comprises:
    利用动态图像专家组MPEG的神经网络表达NNR压缩方法对所述目标格式的模型参数进行压缩,以获得NNR的码流;Utilize the neural network expression NNR compression method of the Moving Picture Experts Group MPEG to compress the model parameters of the target format to obtain the code stream of NNR;
    所述通过所述第二通信链路传输所述压缩后的模型参数,包括:The transmitting the compressed model parameters through the second communication link includes:
    通过所述第二通信链路传输所述NNR的码流。The code stream of the NNR is transmitted through the second communication link.
  24. 根据权利要求21或22所述的方法,其特征在于,所述对所述目标格式的模型参数进行压缩,包括:The method according to claim 21 or 22, wherein the compressing the model parameters of the target format comprises:
    利用人工智能产业技术创新战略联盟AITISA的压缩方法对所述目标格式的模型参数进行压缩,以获得压缩数据;Compress the model parameters of the target format by using the compression method of the artificial intelligence industry technology innovation strategic alliance AITISA to obtain compressed data;
    所述通过所述第二通信链路传输所述压缩后的模型参数,包括:The transmitting the compressed model parameters through the second communication link includes:
    通过所述第二通信链路传输所述压缩数据。The compressed data is transmitted over the second communication link.
  25. 根据权利要求1至24中任一项所述的方法,其特征在于,所述利用基于神经网络的编码技术对待编码图像进行编码,包括:The method according to any one of claims 1 to 24, wherein the encoding the image to be encoded using a neural network-based encoding technology comprises:
    利用所述基于神经网络的编码技术对所述待编码图像进行预测、变换、量化、熵编码或滤波中的至少一种。At least one of prediction, transformation, quantization, entropy encoding or filtering is performed on the to-be-encoded image using the neural network-based encoding technology.
  26. 根据权利要求1至25中任一项所述的方法,其特征在于,所述待编码图像的码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。The method according to any one of claims 1 to 25, wherein the code stream of the image to be encoded further includes third indication information, the third indication information is used to indicate whether the neural network is enabled encoding technology.
  27. 根据权利要求1至26中任一项所述的方法,其特征在于,所述通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数,包括:The method according to any one of claims 1 to 26, wherein the transmitting, through the second communication link, the model parameters of the neural network model included in the neural network-based coding technology, comprising:
    通过所述第二通信链路传输所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are transmitted over the second communication link.
  28. 根据权利要求1至27中任一项所述的方法,其特征在于,所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:The method of any one of claims 1 to 27, wherein the first communication link and/or the second communication link is selected from one or more of the following links:
    基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
  29. 一种解码方法,其特征在于,包括:A decoding method, comprising:
    通过第一通信链路接收待解码图像的码流;Receive the code stream of the image to be decoded through the first communication link;
    通过第二通信链路接收神经网络模型的模型参数;receiving model parameters of the neural network model via the second communication link;
    利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。The code stream is decoded using the model parameters of the neural network model to obtain a decoded image.
  30. 根据权利要求29所述的方法,其特征在于,所述第一通信链路和所述第二通信链路具有不同的物理特性。30. The method of claim 29, wherein the first communication link and the second communication link have different physical characteristics.
  31. 根据权利要求29或30所述的方法,其特征在于,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。The method according to claim 29 or 30, wherein the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link above the first communication link.
  32. 根据权利要求29至31中任一项所述的方法,其特征在于,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。The method according to any one of claims 29 to 31, wherein the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a bandwidth greater than or equal to Links equal to the second threshold.
  33. 根据权利要求29至31中任一项所述的方法,其特征在于,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。The method according to any one of claims 29 to 31, wherein the first communication link includes a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link includes a mobile-based The link of the communication protocol.
  34. 根据权利要求33所述的方法,其特征在于,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。The method according to claim 33, wherein the private image transmission protocol includes a software defined radio (SDR) protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
  35. 根据权利要求29或30所述的方法,其特征在于,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。The method according to claim 29 or 30, wherein the first communication link comprises a private image transmission link, and the second communication link comprises a public network transmission link.
  36. 根据权利要求29至35中任一项所述的方法,其特征在于,所述神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型。The method according to any one of claims 29 to 35, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
  37. 根据权利要求29至36中任一项所述的方法,其特征在于,所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;The method according to any one of claims 29 to 36, wherein the code stream further includes first indication information, and the first indication information is used to instruct the encoding end to use the encoding method when encoding the image to be encoded. The identification of the model parameters of the neural network model;
    所述利用所述神经网络模型的模型参数对所述码流进行解码,包括:The decoding of the code stream using the model parameters of the neural network model includes:
    利用所述模型参数的标识对应的神经网络模型的模型参数对所述码流进行解码。The code stream is decoded by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
  38. 根据权利要求29至36中任一项所述的方法,其特征在于,若所述码流中待解码图像为关键帧,则所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;The method according to any one of claims 29 to 36, wherein if the image to be decoded in the code stream is a key frame, the code stream further includes first indication information, the first indication The information is used to indicate the identification of the model parameters of the neural network model used by the encoding end when encoding the image to be encoded;
    所述利用所述神经网络模型的模型参数对所述码流进行解码,包括:The decoding of the code stream using the model parameters of the neural network model includes:
    利用所述神经网络模型的模型参数和所述模型参数的标识对所述码流进行解码。The code stream is decoded using the model parameters of the neural network model and the identification of the model parameters.
  39. 根据权利要求38所述的方法,其特征在于,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行解码时采用的神经网络模型的模型参数的标识。The method according to claim 38, wherein the first indication information is further used to indicate the model parameters of the neural network model used when decoding other frames between the current key frame and the next key frame. logo.
  40. 根据权利要求29或36所述的方法,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像,包括:The method according to claim 29 or 36, wherein the neural network model is an online training neural network model, and the code stream is decoded by using the model parameters of the neural network model, and the decoded images, including:
    利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码,以获得所述第一目标图像,所述第一目标图像和所述第二目标图像为待解码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像。Decode the code stream of the first target image using the received model parameters of the neural network model obtained by training the second target image to obtain the first target image, the first target image and the second target image. The target image is an image obtained by dividing the image to be decoded according to any one of the video sequence level, the picture group level, and the picture level.
  41. 根据权利要求40所述的方法,其特征在于,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。The method according to claim 40, wherein the images of the first target image and the second target image are separated by q target images, and q is a positive integer greater than or equal to 0.
  42. 根据权利要求41所述的方法,其特征在于,所述q为解码前固化到或预设在解码端的参数。The method according to claim 41, wherein the q is a parameter that is solidified or preset at the decoding end before decoding.
  43. 根据权利要求29至42中任一项所述的方法,其特征在于,所述通过第二通信链路接收神经网络模型的模型参数,包括:The method according to any one of claims 29 to 42, wherein the receiving the model parameters of the neural network model through the second communication link comprises:
    通过所述第二通信链路接收所述神经网络模型的模型参数和所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifiers corresponding to the model parameters of the neural network model are received through the second communication link.
  44. 根据权利要求29至43中任一项所述的方法,其特征在于,所述码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。The method according to any one of claims 29 to 43, wherein the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on an image group, frame, or one of the sequences for training.
  45. 根据权利要求29至44中任一项所述的方法,其特征在于,所述通过第二通信链路接收神经网络模型的模型参数,包括:The method according to any one of claims 29 to 44, wherein the receiving the model parameters of the neural network model through the second communication link comprises:
    通过所述第二通信链路接收压缩后的模型参数;receiving compressed model parameters via the second communication link;
    所述利用所述神经网络模型的模型参数对所述码流进行解码,包括:The decoding of the code stream using the model parameters of the neural network model includes:
    对所述压缩后的模型参数进行解压缩,以获得目标格式;decompressing the compressed model parameters to obtain a target format;
    对所述目标格式进行转换;converting the target format;
    利用转换后的格式的模型参数对所述码流进行解码。The code stream is decoded using the model parameters in the converted format.
  46. 根据权利要求45所述的方法,其特征在于,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。The method of claim 45, wherein the target format comprises a neural network transformation format NNEF or an open neural network transformation ONNX format.
  47. 根据权利要求45或46所述的方法,其特征在于,所述通过所述第二通信链路接收压缩后的模型参数,包括:The method according to claim 45 or 46, wherein the receiving the compressed model parameters through the second communication link comprises:
    通过所述第二通信链路接收神经网络表达NNR的码流;Receive the code stream expressing NNR by the neural network through the second communication link;
    所述对所述压缩后的模型参数进行解压缩,包括:The decompressing the compressed model parameters includes:
    对所述NNR的码流进行解压缩。Decompress the code stream of the NNR.
  48. 根据权利要求45或46所述的方法,其特征在于,所述通过所述第二通信链路接收压缩后的模型参数,包括:The method according to claim 45 or 46, wherein the receiving the compressed model parameters through the second communication link comprises:
    通过所述第二通信链路接收压缩数据;receiving compressed data over the second communication link;
    所述对所述压缩后的模型参数进行解压缩,包括:The decompressing the compressed model parameters includes:
    对所述压缩数据进行解压缩。The compressed data is decompressed.
  49. 根据权利要求29至48中任一项所述的方法,其特征在于,所述利用所述神经网络模型的模型参数对所述码流进行解码,包括:The method according to any one of claims 29 to 48, wherein the decoding the code stream using the model parameters of the neural network model comprises:
    利用所述神经网络的模型参数对所述码流进行熵解码、反量化、反变换、预测重建或滤波中的至少一种。At least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction or filtering is performed on the code stream using the model parameters of the neural network.
  50. 根据权利要求29至49中任一项所述的方法,其特征在于,所述码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。The method according to any one of claims 29 to 49, wherein the code stream further includes third indication information, and the third indication information is used to indicate whether the coding technique using the neural network is turned on. coding.
  51. 根据权利要求29至50中任一项所述的方法,其特征在于,所述通过第二通信链路接收神经网络模型的模型参数,包括:The method according to any one of claims 29 to 50, wherein the receiving the model parameters of the neural network model through the second communication link comprises:
    通过所述第二通信链路接收所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are received over the second communication link.
  52. 根据权利要求29至51中任一项所述的方法,其特征在于,所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:The method of any one of claims 29 to 51, wherein the first communication link and/or the second communication link is selected from one or more of the following links:
    基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
  53. 一种编码装置,其特征在于,包括:An encoding device, comprising:
    处理器,用于利用基于神经网络的编码技术对待编码图像进行编码,以获得所述待编码图像的码流;a processor, configured to encode the to-be-encoded image using a neural network-based encoding technology to obtain a code stream of the to-be-encoded image;
    通过第一通信链路传输所述待编码图像的码流;transmitting the code stream of the to-be-encoded image through the first communication link;
    通过第二通信链路传输所述基于神经网络的编码技术所包含的神经网络模型的模型参数。The model parameters of the neural network model included in the neural network-based encoding technique are transmitted over the second communication link.
  54. 根据权利要求53所述的编码装置,其特征在于,所述第一通信链路和所述第二通信链路具有不同的物理特性。The encoding apparatus of claim 53, wherein the first communication link and the second communication link have different physical characteristics.
  55. 根据权利要求53或54所述的编码装置,其特征在于,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。The encoding device according to claim 53 or 54, wherein the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission delay of the second communication link The bandwidth is higher than the first communication link.
  56. 根据权利要求53至55中任一项所述的编码装置,其特征在于,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。The encoding apparatus according to any one of claims 53 to 55, wherein the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a bandwidth greater than or equal to or a link equal to the second threshold.
  57. 根据权利要求53至55中任一项所述的编码装置,其特征在于,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。The encoding device according to any one of claims 53 to 55, wherein the first communication link comprises a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link comprises a link based on Links for mobile communication protocols.
  58. 根据权利要求57所述的编码装置,其特征在于,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。The encoding device according to claim 57, wherein the private image transmission protocol includes a software defined radio (SDR) protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
  59. 根据权利要求53所述的编码装置,其特征在于,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。The encoding apparatus according to claim 53, wherein the first communication link comprises a private image transmission link, and the second communication link comprises a public network transmission link.
  60. 根据权利要求53至59中任一项所述的编码装置,其特征在于,所述神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型。The encoding device according to any one of claims 53 to 59, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
  61. 根据权利要求53或60所述的编码装置,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述处理器进一步用于:The encoding device according to claim 53 or 60, wherein the neural network model is an online training neural network model, and the processor is further configured to:
    对于第n个目标图像,利用对第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;所述n为大于等于1的整数,所述m为大于等于1的整数。For the n-th target image, the n-th target image is encoded by using the neural network model obtained by training the n-m-th target images, and the target image is the image to be encoded according to the video sequence level, image group An image divided by any one of the image level and the image level; the n is an integer greater than or equal to 1, and the m is an integer greater than or equal to 1.
  62. 根据权利要求61所述的编码装置,其特征在于,所述处理器进一步用于:The encoding apparatus of claim 61, wherein the processor is further configured to:
    利用对所述第n-m个目标图像进行训练获得的神经网络模型对所述第n个目标图像进行滤波,所述第n-m个目标图像为第n-m个编码图像未经过神经网络模型进行滤波的图像,所述第n个目标图像为第n个编码图像未经过神经网络模型进行滤波的图像。The n-th target image is filtered by using the neural network model obtained by training the n-mth target image, and the n-mth target image is an image of the n-mth encoded image that has not been filtered by the neural network model, The n-th target image is an image of the n-th coded image that has not been filtered by the neural network model.
  63. 根据权利要求61或62所述的编码装置,其特征在于,所述m为编 码前固化到或预设在编码端的参数;或,所述m为编码过程中形成的参数。The encoding device according to claim 61 or 62, wherein the m is a parameter that is solidified or preset at the encoding end before encoding; or, the m is a parameter formed during the encoding process.
  64. 根据权利要求53或60所述的编码装置,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述处理器进一步用于:The encoding device according to claim 53 or 60, wherein the neural network model is an online training neural network model, and the processor is further configured to:
    对于待编码的第一目标图像,利用对已编码的第二目标图像进行训练获得的神经网络模型对所述第一目标图像进行编码,所述第一目标图像和所述第二目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像;For the first target image to be encoded, the neural network model obtained by training the encoded second target image is used to encode the first target image, and the first target image and the second target image are a pair The image to be encoded is divided according to any one of the video sequence level, the picture group level, and the picture level;
    通过所述第二通信链路传输对所述第二目标图像进行训练获得的神经网络模型的模型参数。The model parameters of the neural network model obtained by training the second target image are transmitted through the second communication link.
  65. 根据权利要求64所述的编码装置,其特征在于,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。The encoding device according to claim 64, wherein the images of the first target image and the second target image are separated by q target images, and q is a positive integer greater than or equal to 0.
  66. 根据权利要求65所述的编码装置,其特征在于,所述q为编码前固化到或预设在编码端的参数。The encoding device according to claim 65, wherein the q is a parameter that is fixed or preset at the encoding end before encoding.
  67. 根据权利要求53或60所述的编码装置,其特征在于,所述神经网络模型为离线训练的神经网络模型,所述处理器进一步用于:The encoding device according to claim 53 or 60, wherein the neural network model is an offline training neural network model, and the processor is further configured to:
    对于第p个目标图像,利用所述离线训练的神经网络模型对所述第p个目标图像进行编码,所述目标图像为对所述待编码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像,所述p为大于或等于0的整数。For the p-th target image, the offline-trained neural network model is used to encode the p-th target image, and the target image is the image to be encoded according to the video sequence level, image group level, and image level. The image after being divided by any one of , the p is an integer greater than or equal to 0.
  68. 根据权利要求53至63或67中任一项所述的编码装置,其特征在于,所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。The encoding device according to any one of claims 53 to 63 or 67, wherein the code stream of the to-be-encoded image further includes first indication information, and the first indication information is used to indicate when The identification of the model parameters of the neural network model used when the image to be encoded is encoded.
  69. 根据权利要求53至63或67中任一项所述的编码装置,其特征在于,若所述待编码图像为关键帧,则所述待编码图像的码流中还包括第一指示信息,所述第一指示信息用于指示在对所述待编码图像进行编码时采用的神经网络模型的模型参数的标识。The encoding device according to any one of claims 53 to 63 or 67, wherein if the to-be-encoded image is a key frame, the code stream of the to-be-encoded image further includes first indication information, and the The first indication information is used to indicate the identifier of the model parameter of the neural network model used when the image to be encoded is encoded.
  70. 根据权利要求69所述的编码装置,其特征在于,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行编码时采用的神经网络模型的模型参数的标识。The encoding apparatus according to claim 69, wherein the first indication information is further used to indicate model parameters of the neural network model used when encoding other frames between the current key frame and the next key frame 's identification.
  71. 根据权利要求53至70中任一项所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to any one of claims 53 to 70, wherein the processor is further configured to:
    通过所述第二通信链路传输所述神经网络模型的模型参数以及所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifier corresponding to the model parameters of the neural network model are transmitted through the second communication link.
  72. 根据权利要求53至71中任一项所述的编码装置,其特征在于,所述待编码图像的码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。The encoding apparatus according to any one of claims 53 to 71, wherein the code stream of the image to be encoded further includes second indication information, and the second indication information is used to indicate the neural network model is trained on one of image groups, frames, or sequences.
  73. 根据权利要求53至72中任一项所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to any one of claims 53 to 72, wherein the processor is further configured to:
    将所述神经网络模型的模型参数转换为目标格式;converting the model parameters of the neural network model into a target format;
    对所述目标格式的模型参数进行压缩,得到压缩后的模型参数;compressing the model parameters of the target format to obtain the compressed model parameters;
    通过所述第二通信链路传输所述压缩后的模型参数。The compressed model parameters are transmitted over the second communication link.
  74. 根据权利要求73所述的编码装置,其特征在于,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。The encoding device according to claim 73, wherein the target format comprises a neural network transformation format NNEF or an open neural network transformation ONNX format.
  75. 根据权利要求73或74所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to claim 73 or 74, wherein the processor is further configured to:
    利用动态图像专家组MPEG的神经网络表达NNR压缩方法对所述目标格式的模型参数进行压缩,以获得NNR的码流;Utilize the neural network expression NNR compression method of the Moving Picture Experts Group MPEG to compress the model parameters of the target format to obtain the code stream of NNR;
    通过所述第二通信链路传输所述NNR的码流。The code stream of the NNR is transmitted through the second communication link.
  76. 根据权利要求73或74所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to claim 73 or 74, wherein the processor is further configured to:
    利用人工智能产业技术创新战略联盟AITISA的压缩方法对所述目标格式的模型参数进行压缩,以获得压缩数据;Compress the model parameters of the target format by using the compression method of the artificial intelligence industry technology innovation strategic alliance AITISA to obtain compressed data;
    通过所述第二通信链路传输所述压缩数据。The compressed data is transmitted over the second communication link.
  77. 根据权利要求53至76中任一项所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to any one of claims 53 to 76, wherein the processor is further configured to:
    利用所述基于神经网络的编码技术对所述待编码图像进行预测、变换、量化、熵编码或滤波中的至少一种。At least one of prediction, transformation, quantization, entropy encoding or filtering is performed on the to-be-encoded image using the neural network-based encoding technology.
  78. 根据权利要求53至77中任一项所述的编码装置,其特征在于,所述待编码图像的码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。The encoding device according to any one of claims 53 to 77, wherein the code stream of the image to be encoded further includes third indication information, and the third indication information is used to indicate whether the use of neural Network coding techniques for coding.
  79. 根据权利要求53至78中任一项所述的编码装置,其特征在于,所述处理器进一步用于:The encoding device according to any one of claims 53 to 78, wherein the processor is further configured to:
    通过所述第二通信链路传输所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are transmitted over the second communication link.
  80. 根据权利要求53至79中任一项所述的编码装置,其特征在于,所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:The encoding apparatus according to any one of claims 53 to 79, wherein the first communication link and/or the second communication link is selected from one or more of the following links :
    基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
  81. 一种解码装置,其特征在于,包括:A decoding device, comprising:
    处理器,用于通过第一通信链路接收待解码图像的码流;a processor, configured to receive the code stream of the image to be decoded through the first communication link;
    通过第二通信链路接收神经网络模型的模型参数;receiving model parameters of the neural network model via the second communication link;
    利用所述神经网络模型的模型参数对所述码流进行解码,获得解码后的图像。The code stream is decoded using the model parameters of the neural network model to obtain a decoded image.
  82. 根据权利要求81所述的解码装置,其特征在于,所述第一通信链路和所述第二通信链路具有不同的物理特性。The decoding apparatus of claim 81, wherein the first communication link and the second communication link have different physical characteristics.
  83. 根据权利要求81或82所述的解码装置,其特征在于,所述第一通信链路的传输时延低于所述第二通信链路,和/或,所述第二通信链路的传输带宽高于所述第一通信链路。The decoding apparatus according to claim 81 or 82, wherein the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission delay of the second communication link The bandwidth is higher than the first communication link.
  84. 根据权利要求81至83中任一项所述的解码装置,其特征在于,所述第一通信链路包括时延小于或等于第一阈值的链路,所述第二通信链路包括带宽大于或等于第二阈值的链路。The decoding apparatus according to any one of claims 81 to 83, wherein the first communication link includes a link with a delay less than or equal to a first threshold, and the second communication link includes a bandwidth greater than or equal to or a link equal to the second threshold.
  85. 根据权利要求81至83中任一项所述的解码装置,其特征在于,所述第一通信链路包括基于私有图传协议或无线局域网协议的链路,所述第二通信链路包括基于移动通信协议的链路。The decoding device according to any one of claims 81 to 83, wherein the first communication link comprises a link based on a proprietary image transmission protocol or a wireless local area network protocol, and the second communication link comprises a link based on Links for mobile communication protocols.
  86. 根据权利要求85所述的解码装置,其特征在于,所述私有图传协议包括软件无线电SDR协议,所述无线局域网协议包括无线保真WiFi协议,所述移动通信协议包括4G或5G协议。The decoding device according to claim 85, wherein the private image transmission protocol includes a software defined radio (SDR) protocol, the wireless local area network protocol includes a wireless fidelity WiFi protocol, and the mobile communication protocol includes a 4G or 5G protocol.
  87. 根据权利要求81所述的解码装置,其特征在于,所述第一通信链路包括私有图传链路,所述第二通信链路包括公网传输链路。The decoding apparatus according to claim 81, wherein the first communication link comprises a private image transmission link, and the second communication link comprises a public network transmission link.
  88. 根据权利要求81至87中任一项所述的解码装置,其特征在于,所述神经网络模型包括离线训练的神经网络模型或在线训练的神经网络模型。The decoding apparatus according to any one of claims 81 to 87, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
  89. 根据权利要求81至88中任一项所述的解码装置,其特征在于,所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;The decoding apparatus according to any one of claims 81 to 88, wherein the code stream further includes first indication information, and the first indication information is used to instruct the encoding end when encoding the image to be encoded Identification of the model parameters of the adopted neural network model;
    所述处理器进一步用于:The processor is further used to:
    利用所述模型参数的标识对应的神经网络模型的模型参数对所述码流进行解码。The code stream is decoded by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
  90. 根据权利要求81至88中任一项所述的解码装置,其特征在于,若所述码流中待解码图像为关键帧,则所述码流中还包括第一指示信息,所述第一指示信息用于指示编码端在对待编码图像进行编码时采用的神经网络模型的模型参数的标识;The decoding device according to any one of claims 81 to 88, wherein if the image to be decoded in the code stream is a key frame, the code stream further includes first indication information, the first The indication information is used to indicate the identifier of the model parameter of the neural network model adopted by the encoding end when encoding the image to be encoded;
    所述处理器进一步用于:The processor is further used to:
    利用所述神经网络模型的模型参数和所述模型参数的标识对所述码流进行解码。The code stream is decoded using the model parameters of the neural network model and the identification of the model parameters.
  91. 根据权利要求90所述的解码装置,其特征在于,所述第一指示信息还用于指示在对当前关键帧至下一个关键帧之间的其它帧进行解码时采用的神经网络模型的模型参数的标识。The decoding apparatus according to claim 90, wherein the first indication information is further used to indicate model parameters of the neural network model used when decoding other frames between the current key frame and the next key frame 's identification.
  92. 根据权利要求81或88所述的解码装置,其特征在于,所述神经网络模型为在线训练的神经网络模型,所述处理器进一步用于:The decoding device according to claim 81 or 88, wherein the neural network model is an online training neural network model, and the processor is further configured to:
    利用接收到的对第二目标图像进行训练获得的神经网络模型的模型参数对第一目标图像的码流进行解码,以获得所述第一目标图像,所述第一目标图像和所述第二目标图像为待解码图像按照视频序列级、图像组级、图像级中的任意一种进行划分后的图像。Decode the code stream of the first target image by using the received model parameters of the neural network model obtained by training the second target image to obtain the first target image, the first target image and the second target image. The target image is an image obtained by dividing the image to be decoded according to any one of the video sequence level, the picture group level, and the picture level.
  93. 根据权利要求92所述的解码装置,其特征在于,所述第一目标图像和所述第二目标图像的图像间隔q个目标图像,所述q为大于或等于0的正整数。The decoding apparatus according to claim 92, wherein the first target image and the second target image are separated by q target images, and q is a positive integer greater than or equal to 0.
  94. 根据权利要求93所述的解码装置,其特征在于,所述q为解码前固化到或预设在解码端的参数。The decoding apparatus according to claim 93, wherein the q is a parameter that is solidified or preset at the decoding end before decoding.
  95. 根据权利要求81至94中任一项所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to any one of claims 81 to 94, wherein the processor is further configured to:
    通过所述第二通信链路接收所述神经网络模型的模型参数和所述神经网络模型的模型参数对应的标识。The model parameters of the neural network model and the identifiers corresponding to the model parameters of the neural network model are received through the second communication link.
  96. 根据权利要求81至95中任一项所述的解码装置,其特征在于,所述码流中还包括第二指示信息,所述第二指示信息用于指示所述神经网络模型是基于图像组、帧、或序列中的一种进行训练的。The decoding device according to any one of claims 81 to 95, wherein the code stream further includes second indication information, and the second indication information is used to indicate that the neural network model is based on an image group , frame, or sequence for training.
  97. 根据权利要求81至96中任一项所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to any one of claims 81 to 96, wherein the processor is further configured to:
    通过所述第二通信链路接收压缩后的模型参数;receiving compressed model parameters via the second communication link;
    对所述压缩后的模型参数进行解压缩,以获得目标格式;decompressing the compressed model parameters to obtain a target format;
    对所述目标格式进行转换;converting the target format;
    利用转换后的格式的模型参数对所述码流进行解码。The code stream is decoded using the model parameters in the converted format.
  98. 根据权利要求97所述的解码装置,其特征在于,所述目标格式包括神经网络变换格式NNEF或开放神经网络变换ONNX格式。The decoding apparatus according to claim 97, wherein the target format comprises a neural network transformation format NNEF or an open neural network transformation ONNX format.
  99. 根据权利要求97或98所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to claim 97 or 98, wherein the processor is further configured to:
    通过所述第二通信链路接收神经网络表达NNR的码流;Receive the code stream expressing NNR by the neural network through the second communication link;
    对所述NNR的码流进行解压缩。Decompress the code stream of the NNR.
  100. 根据权利要求97或98所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to claim 97 or 98, wherein the processor is further configured to:
    通过所述第二通信链路接收压缩数据;receiving compressed data over the second communication link;
    对所述压缩数据进行解压缩。The compressed data is decompressed.
  101. 根据权利要求81至100中任一项所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to any one of claims 81 to 100, wherein the processor is further configured to:
    利用所述神经网络的模型参数对所述码流进行熵解码、反量化、反变换、预测重建或滤波中的至少一种。At least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction or filtering is performed on the code stream using the model parameters of the neural network.
  102. 根据权利要求81至101中任一项所述的解码装置,其特征在于,所述码流中还包括第三指示信息,所述第三指示信息用于指示是否开启了利用神经网络的编码技术进行编码。The decoding device according to any one of claims 81 to 101, wherein the code stream further includes third indication information, and the third indication information is used to indicate whether an encoding technique using a neural network is enabled to encode.
  103. 根据权利要求81至102中任一项所述的解码装置,其特征在于,所述处理器进一步用于:The decoding apparatus according to any one of claims 81 to 102, wherein the processor is further configured to:
    通过所述第二通信链路接收所述神经网络模型的部分模型参数或全部模型参数。Some or all of the model parameters of the neural network model are received over the second communication link.
  104. 根据权利要求81至103中任一项所述的解码装置,其特征在于, 所述第一通信链路和/或所述第二通信链路是从以下一种或多种链路中选择的:The decoding apparatus according to any one of claims 81 to 103, wherein the first communication link and/or the second communication link is selected from one or more of the following links :
    基于无线局域网协议的链路、基于移动通信协议的链路、基于以太网协议的链路。Links based on wireless local area network protocols, links based on mobile communication protocols, links based on Ethernet protocols.
  105. 一种编码装置,其特征在于,包括:处理器和存储器,该存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,执行如权利要求1至28中任一项所述的方法。An encoding device, characterized in that it comprises: a processor and a memory, the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, and executes any one of claims 1 to 28. one of the methods described.
  106. 一种解码装置,其特征在于,包括:处理器和存储器,该存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,执行如权利要求29至52中任一项所述的方法。A decoding device, characterized in that it comprises: a processor and a memory, the memory is used to store a computer program, the processor is used to call and run the computer program stored in the memory, and execute any one of claims 29 to 52. one of the methods described.
  107. 一种计算机可读存储介质,其特征在于,包括用于执行权利要求1至28中任一项所述的编码方法的指令。A computer-readable storage medium, characterized by comprising instructions for executing the encoding method according to any one of claims 1 to 28.
  108. 一种计算机可读存储介质,其特征在于,包括用于执行权利要求29至52中任一项所述的解码方法的指令。A computer-readable storage medium comprising instructions for executing the decoding method of any one of claims 29 to 52.
PCT/CN2020/134085 2020-12-04 2020-12-04 Coding method, decoding method, coding apparatus, and decoding apparatus WO2022116207A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/134085 WO2022116207A1 (en) 2020-12-04 2020-12-04 Coding method, decoding method, coding apparatus, and decoding apparatus
CN202080078054.3A CN114731406A (en) 2020-12-04 2020-12-04 Encoding method, decoding method, encoding device, and decoding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/134085 WO2022116207A1 (en) 2020-12-04 2020-12-04 Coding method, decoding method, coding apparatus, and decoding apparatus

Publications (1)

Publication Number Publication Date
WO2022116207A1 true WO2022116207A1 (en) 2022-06-09

Family

ID=81853685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134085 WO2022116207A1 (en) 2020-12-04 2020-12-04 Coding method, decoding method, coding apparatus, and decoding apparatus

Country Status (2)

Country Link
CN (1) CN114731406A (en)
WO (1) WO2022116207A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115720257A (en) * 2022-10-13 2023-02-28 华能信息技术有限公司 Communication security management method and system of video conference system
WO2024149245A1 (en) * 2023-01-13 2024-07-18 杭州海康威视数字技术股份有限公司 Decoding method and apparatus, encoding method and apparatus, and devices thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117440185A (en) * 2022-07-14 2024-01-23 杭州海康威视数字技术股份有限公司 Image decoding and encoding method, device and equipment based on neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013121793A1 (en) * 2012-02-16 2013-08-22 日本放送協会 Multi-channel sound system, transmitting device, receiving device, program for transmitting, and program for receiving
CN103580773A (en) * 2012-07-18 2014-02-12 中兴通讯股份有限公司 Method and device for transmitting data frame
CN109874018A (en) * 2018-12-29 2019-06-11 深兰科技(上海)有限公司 Image encoding method, system, terminal and storage medium neural network based
CN110870310A (en) * 2018-09-04 2020-03-06 深圳市大疆创新科技有限公司 Image encoding method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229644A (en) * 2016-12-15 2018-06-29 上海寒武纪信息科技有限公司 The device of compression/de-compression neural network model, device and method
CN108271026B (en) * 2016-12-30 2020-03-31 上海寒武纪信息科技有限公司 Compression/decompression device and system, chip, electronic device and method
EP3942700A1 (en) * 2019-03-18 2022-01-26 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013121793A1 (en) * 2012-02-16 2013-08-22 日本放送協会 Multi-channel sound system, transmitting device, receiving device, program for transmitting, and program for receiving
CN103580773A (en) * 2012-07-18 2014-02-12 中兴通讯股份有限公司 Method and device for transmitting data frame
CN110870310A (en) * 2018-09-04 2020-03-06 深圳市大疆创新科技有限公司 Image encoding method and apparatus
CN109874018A (en) * 2018-12-29 2019-06-11 深兰科技(上海)有限公司 Image encoding method, system, terminal and storage medium neural network based

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115720257A (en) * 2022-10-13 2023-02-28 华能信息技术有限公司 Communication security management method and system of video conference system
WO2024149245A1 (en) * 2023-01-13 2024-07-18 杭州海康威视数字技术股份有限公司 Decoding method and apparatus, encoding method and apparatus, and devices thereof

Also Published As

Publication number Publication date
CN114731406A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2022116207A1 (en) Coding method, decoding method, coding apparatus, and decoding apparatus
Lombardo et al. Deep generative video compression
TWI830107B (en) Encoding by indicating feature map data
WO2022155974A1 (en) Video coding and decoding and model training method and apparatus
CN110740319B (en) Video encoding and decoding method and device, electronic equipment and storage medium
US20240064318A1 (en) Apparatus and method for coding pictures using a convolutional neural network
CN116235496A (en) Encoding method, decoding method, encoder, decoder, and encoding system
JP2023548507A (en) Decoding using segmentation information signaling
US11638025B2 (en) Multi-scale optical flow for learned video compression
WO2022063265A1 (en) Inter-frame prediction method and apparatus
KR20200109904A (en) System and method for DNN based image or video coding
KR20200005403A (en) System and method for DNN based image or video coding based on an entire codec
US20240107015A1 (en) Encoding method, decoding method, code stream, encoder, decoder and storage medium
Han et al. Deep generative video compression
TW202337211A (en) Conditional image compression
JP2023544562A (en) Intra prediction method and device
CN115604485A (en) Video image decoding method and device
WO2021196087A1 (en) Video quality improvement method and apparatus
JP2024513693A (en) Configurable position of auxiliary information input to picture data processing neural network
US11893783B2 (en) Apparatus and method for transceiving feature map extracted using MPEG-VCM
US20230252300A1 (en) Methods and apparatus for hybrid training of neural networks for video coding
TWI826160B (en) Image encoding and decoding method and apparatus
WO2022147745A1 (en) Encoding method, decoding method, encoding apparatus, decoding apparatus
KR20240039178A (en) Encoding and decoding methods and devices
CN116803078A (en) Encoding/decoding method, code stream, encoder, decoder, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964052

Country of ref document: EP

Kind code of ref document: A1