CN118200608A

CN118200608A - Motion compensation residual error determination method, device, terminal equipment and storage medium

Info

Publication number: CN118200608A
Application number: CN202211614426.7A
Authority: CN
Inventors: 李帅; 高寒; 叶茂; 王之奎; 张雯; 李斌
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2024-06-14

Abstract

Some embodiments of the application relate to a motion compensation residual determination method, a motion compensation residual determination device, a terminal device and a storage medium, and in particular relate to the technical field of video coding. Comprising the following steps: performing motion compensation on a reconstructed frame of the historical frame based on the reconstructed motion parameter to obtain a reconstructed predicted frame, wherein the reconstructed motion parameter is obtained after encoding and decoding the motion parameter, and the motion parameter is determined based on the reconstructed frame of the historical frame and the current frame; performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame; determining a reconstructed residual error according to the reconstructed predicted frame and the current frame; determining an original residual error according to the original predicted frame and the current frame; and fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error. Some embodiments of the present application are used to solve the problem that the currently calculated motion compensation residual error has distortion, and the image structure information is lost or is in error.

Description

Motion compensation residual error determination method, device, terminal equipment and storage medium

Technical Field

Embodiments of the present application relate to the field of video coding technologies, and in particular, to a method, an apparatus, a terminal device, and a storage medium for determining a motion compensation residual.

Background

Depth learning based video coding is currently implemented by a feature-domain video coding (Feature based Video Compression, FVC) framework and a context-based video coding (Deep Contextual Video Compression, DCVC) framework, where FVC framework and DCVC framework differ only in the manner of motion estimation. In the video coding process, motion estimation residual errors and motion parameters need to be calculated, and a coded bit stream is finally generated based on the motion estimation residual errors and the motion parameters. When calculating the motion estimation residual, firstly, motion estimation is carried out on a reconstructed frame and a current frame, a predicted frame is generated, then the generated predicted frame is subtracted from the current frame to calculate the motion compensation residual, and the reconstructed frame is obtained by reconstructing the encoded previous frame.

However, in lossy video coding, since the reconstructed frame obtained by reconstructing the encoded previous frame has random distortion compared with the original frame (i.e., the previous frame), the motion compensation residual error calculated based on the reconstructed frame also has distortion, i.e., the image structure information is lost or is wrong, which may lead to inaccurate information after final video coding and reduce video coding performance.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, some embodiments of the present application provide a method, an apparatus, an encoding device and a storage medium for determining a motion compensation residual, which can accurately calculate the motion compensation residual, improve the accuracy of information after video encoding, and improve the video encoding performance.

In order to achieve the above object, some embodiments of the present application provide the following technical solutions:

In a first aspect, a motion compensation residual determination method is provided, including:

Performing motion compensation on a reconstructed frame of a historical frame based on a reconstructed motion parameter to obtain a reconstructed predicted frame, wherein the reconstructed motion parameter is obtained after encoding and decoding the motion parameter, and the motion parameter is determined based on the reconstructed frame of the historical frame and a current frame;

performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame;

Determining a reconstructed residual error according to the reconstructed predicted frame and the current frame;

determining an original residual error according to the original predicted frame and the current frame;

And fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error.

In a second aspect, there is provided a motion compensation residual determination apparatus comprising:

A processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of motion compensated residual determination as described in the first aspect or any of its alternative embodiments.

In a third aspect, there is provided a computer readable storage medium comprising: the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method for motion compensated residual determination according to the first aspect or any optional implementation thereof.

In a fourth aspect, a computer program product is provided, the computer program product having a computer program stored therein, the computer program when executed by a processor implementing a motion compensated residual determination method as follows:

Performing motion compensation on a reconstructed frame of the historical frame based on the reconstructed motion parameter to obtain a reconstructed predicted frame, wherein the reconstructed motion parameter is obtained after encoding and decoding the motion parameter, and the motion parameter is determined based on the reconstructed frame of the historical frame and the current frame; performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame; determining a reconstructed residual error according to the reconstructed predicted frame and the current frame; determining an original residual error according to the original predicted frame and the current frame; and fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error.

The method for motion compensated residual determination of the first aspect or any optional implementation thereof.

According to the motion compensation residual error determination method provided by some embodiments of the application, the reconstructed frame of the historical frame is subjected to motion compensation based on the reconstructed motion parameter so as to obtain a reconstructed predicted frame, the reconstructed motion parameter is obtained after the motion parameter is encoded and decoded, and the motion parameter is determined based on the reconstructed frame of the historical frame and the current frame; performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame; determining a reconstructed residual error according to the reconstructed predicted frame and the current frame; determining an original residual error according to the original predicted frame and the current frame; and fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error.

According to the scheme, the reconstructed residual error and the original residual error are calculated based on the reconstructed frame and the history frame of the history frame, and as the history frame is not encoded and reconstructed, distortion does not exist, the calculated original residual error is not distorted, complete image structure information is included, and the target fusion residual error obtained by fusing the reconstructed residual error and the original residual error is used as a final motion compensation residual error, so that damaged image structure information in the reconstructed residual error can be supplemented, and the final determined motion compensation residual error is more accurate.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with some embodiments of the application and together with the description, serve to explain the principles of some embodiments of the application.

In order to more clearly illustrate some embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1A is a block diagram of a video coding system 100 in some embodiments of the application;

FIG. 1B is a schematic diagram of a video coding basic framework provided in the related art;

Fig. 2 is a schematic diagram of a video encoding process according to some embodiments of the present application;

FIG. 3 is a schematic diagram of a basic framework of a coding model according to some embodiments of the present application;

fig. 4 is a flowchart illustrating a motion compensation residual determination method according to some embodiments of the present application;

FIG. 5 is a schematic diagram of a framework of a residual fusion module applied to a fusion method according to some embodiments of the present application;

Fig. 6 is a schematic frame diagram of a residual fusion module applied in a fusion manner two according to some embodiments of the present application;

fig. 7 is a flowchart of a video encoding method according to some embodiments of the present application;

Fig. 8 is a flowchart of a video encoding method according to some embodiments of the present application;

FIG. 9 is a schematic diagram of a basic framework for video decoding according to some embodiments of the present application;

Fig. 10 is a schematic structural diagram of a motion compensation residual determining apparatus according to some embodiments of the present application.

Detailed Description

In order that the above objects, features and advantages of some embodiments of the application may be more clearly understood, a further description of aspects of some embodiments of the application will be provided below. It should be noted that, without conflict, embodiments of some embodiments of the application and features of the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of some embodiments of the application, but some embodiments of the application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application.

A video can be seen as a sequence of multiple video frames (images). Video playback may be viewed as the display of video frames at a preset rate (e.g., 24 frames/second, 30 frames/second, 60 frames/second) in the order in the sequence. Theoretically, the amount of data of a video is positively correlated with the resolution of a video frame, and the higher the resolution of a video frame, the larger the amount of data of the video. If the data of all video frames are directly stored in the video file, the data volume of the video will be very huge, which results in difficult storage and transmission of the video, and video decoding is proposed to solve the problem to a certain extent. Video coding mainly includes: video encoding and video decoding. Video encoding is understood to be a process of compressing video data, and video decoding is understood to be a process of restoring compressed video data.

Fig. 1A is a block diagram of a video coding system 100 in some embodiments of the application. As shown in fig. 1A, a video coding system 100 includes: a source device 11 and a destination device 12. Wherein the source device 11 can obtain the original video data through the video source 111, and the video encoded data obtained by encoding the original video data through the video encoder 112, and the video encoded data obtained by encoding is provided to the destination device 12 through the output interface 113. The destination device 12 may obtain the video encoded data provided by the source device 11 through the input interface 121, decode the video encoded data through the video decoder 122 to obtain video decoded data, and input the video decoded data to the player 123 to realize the playing of the video. Source device 11 and destination device 12 may comprise any of a wide range of devices, such as: personal computers (Program Counter), notebook computers, tablet computers, set-top boxes, cell phones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and the like.

In some embodiments, the video source 111 of the source device 11 may be a video capturing device, such as a video camera. In other embodiments, video source 111 may be a component capable of generating video based on computer graphics. Such as a screen recording component, an animation generation component, etc.

In some embodiments, destination device 12 may receive video encoded data provided by source device 11 via a computer readable medium. The computer readable medium may include any type of medium or device capable of moving video encoded data from source device 11 to destination device 12. In one example, the computer-readable medium may comprise a communication medium to enable source device 11 to transmit video encoded data directly to destination device 12 in real-time. The video encoded data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 12. Communication media may include any wireless or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet network (e.g., a local area network, a wide area network, or a global network such as the internet). The communication medium may include a router, a switch, a base station, or any other apparatus that may be used to facilitate communication from source device 11 to destination device 12.

In some examples, video encoded data may be output from output interface 113 to a storage device. Accordingly, the video encoding data may be accessed from the storage device by the input interface 123. The storage device may include any of a variety of distributed or locally accessed data storage media such as hard drives, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing video encoding data.

In another example, the storage device may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 11. Destination device 12 may access stored video data from a storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 12. The file servers described above include, for example, web servers (e.g., for web sites), FTP servers, network Attached Storage (NAS) devices, or local disk drives. Destination device 12 may access the encoded video data over any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

As described above, video encoding may be understood as a process of compressing video data, video decoding may be understood as a process of restoring compressed video data, and video encoder 112 may be understood as a set of standard rules in the process of compressing video data, and video decoder 122 may be understood as a set of standard rules in the process of restoring video data, and in general, video decoder 122 needs to use a decoding method corresponding to the encoding method used by video encoder 121 to correctly restore video data.

Currently, video Coding standards have gradually evolved from the very beginning ISO/IECMPEG-1 to multi-function Video Coding (VERSATILE VIDEO CODING, VVC) via ISO/IECMPEG-2, ISO/IEC MPEG-4, advanced Video Coding (Advanced Viedo Coding, AVC), high efficiency Video Coding (HIGH EFFICIENCY Video Coding, HEVC), and the like. The motion compensation residual determination method, the video encoding method and the video decoding method provided in the embodiments of the present application are applicable to any suitable video coding standard. For example: HEVC standard, low complexity enhanced video coding (Low Complexity Enhancement Video Coding, LCEVC) standard, and the like.

In video coding, learning-based video coding has attracted more and more attention over the past few years. Previous hybrid coding methods rely on pixel space operations to reduce spatio-temporal redundancy, which may present inaccurate motion estimation or less efficient motion compensation. To ameliorate these problems, a feature-domain video coding (Feature based Video Compression, FVC) coding scheme is proposed that performs all major operations (i.e., motion estimation, motion compression, motion compensation, and residual compression) in feature space. Specifically, motion estimation is first applied in feature space to generate motion information (i.e., an offset map) and compressed using an auto-encoder network, then motion compensated using deformable convolution, and a predicted frame is generated. Then we compress the residual between the features of the current frame and the predicted frame.

In a context-based video coding (Deep Contextual Video Compression, DCVC) coding scheme, the DCVC framework has a lower entropy bound than the usual residual coding framework. DCVC can adaptively learn intra-coding and inter-coding. In DCVC, its condition is defined as a context feature. Compared with the traditional RGB three-channel pixel, the three-channel pixel has the contextual characteristics of higher dimensionality, can carry richer time domain information to help coding, and recovers high-frequency details.

The entropy of the residual coding mode in the traditional video coding method is often greater than or equal to the entropy of conditional coding. Through the conversion from residual coding to conditional coding, a context-based video compression framework (DCVC) is constructed, providing a new idea and a new method for video compression based on depth learning. Experiments show that the video compression frame has lower information entropy lower bound than the common residual error coding frame, can adaptively learn intra-frame coding and inter-frame coding, and is suitable for recovering high-frequency details.

Current depth learning based video coding is implemented through a feature domain video coding (Feature based Video Compression, FVC) framework and a context based video coding (Deep Contextual Video Compression, DCVC) framework. In the video coding process, motion estimation residual errors and motion parameters need to be calculated, and a coded bit stream is finally generated based on the motion estimation residual errors and the motion parameters. When calculating the motion estimation residual, firstly, motion estimation is carried out on a reconstructed frame and a current frame, a predicted frame is generated, then the generated predicted frame is subtracted from the current frame to calculate the motion compensation residual, and the reconstructed frame is obtained by reconstructing the encoded previous frame.

Fig. 1B is a schematic diagram of a video coding basic framework provided in the related art.

As shown in fig. 1B, the video coding basic frame includes: feature extraction (feature extraction) module 101, motion estimation (motion estimation) module 102, motion codec (moon compression) module 103, motion compensation (motion compensation) module 104, residual codec (Residual Compression) module 105, entropy coding (Entropy coding) module 106, post-enhancement processing (Recon-Net) module 107, residual acquisition module 108, and reconstruction module 109. The video coding basic frame shown in fig. 1B may be a FVC frame or DCVC frame, and the motion estimation algorithm in the motion estimation module 102 involved in the FVC frame and DCVC frame only differs from each other, and the other module algorithms are the same. The motion estimation algorithm used by the motion estimation module 102 in the FVC framework is motion estimation based on deformable convolution, and the motion estimation algorithm used by the motion estimation module 102 in the DCVC framework is motion estimation based on optical flow.

The feature extraction module 101 is configured to extract image features from an input image, where the image features are some features that are used to represent features of the image. Wherein the input image may be a reconstructed frame of the current frame and the historical frame. Image features may include, but are not limited to: geometric features, shape features, magnitude features, histogram features, color features, local binary pattern (Local binary patterns, LBP) features, and the like.

The motion estimation module 102 is configured to calculate motion parameters (also referred to as motion vectors) based on the extracted features of the reconstructed frames of the current frame and the historical frame. The motion estimation module generally divides an image in a video into a plurality of blocks, detects a corresponding position of each block in a current frame in an image of a historical frame (such as a previous frame of the current frame), so as to estimate displacement of the block, and represents the displacement of the block by a motion vector, and a process of obtaining the motion vector is called motion estimation.

The motion encoding and decoding module 103 is configured to encode a motion parameter, and decode the encoded motion parameter to obtain a reconstructed motion parameter.

The motion compensation module 104 is configured to calculate a reconstructed predicted frame according to the motion parameter and the reconstructed frame of the history frame. Motion compensation is a method for describing the difference between the reconstructed frames of the current frame and the historical frame, specifically, according to each small block of the reconstructed frame of the historical frame, prediction is performed according to motion parameters, and the prediction is moved to which position in the current frame, so that the reconstructed predicted frame is predicted.

The residual obtaining module 108 is configured to calculate a motion compensation residual according to the reconstructed predicted frame and the reconstructed frame of the history frame.

The residual coding and decoding module 105 is configured to code the motion compensation residual and decode the coded motion compensation residual to obtain a recovered motion compensation residual.

The entropy encoding (Entropy coding) module 106 is configured to perform entropy encoding on the encoded motion compensation residual and the encoded motion parameter to obtain a bitstream. The entropy coding is coding without losing any information according to the entropy principle in the coding process. The entropy encoding method includes: shannon (Shannon) coding, huffman (Huffman) coding, arithmetic coding (ARITHMETIC CODING), and the like.

The reconstruction module 109 is configured to reconstruct a feature of a reconstructed frame of the current frame based on the recovered motion compensation residual error and the predicted frame.

The post-enhancement processing module 107 is configured to perform feature enhancement on the features of the input reconstructed frame of the current frame to obtain the reconstructed frame of the current frame.

For a specific application of the above module in the overall video coding flow, the following description will be made with reference to fig. 2.

Fig. 2 is a schematic diagram of a video encoding process according to some embodiments of the present application.

As shown in fig. 2, the video encoding process may include, but is not limited to, the following steps:

201. and extracting the characteristics of the reconstructed frames of the current frame and the historical frame.

The reconstructed frame of the history frame is obtained by encoding the history frame and then decoding the encoded history frame. The history frame may refer to a frame previous to the current frame, or the history frame may refer to other frames preceding the current frame.

In FIG. 1B, the current frame is the t-th frame in the video frames, the current frame may be represented as X _t, the history frame is the t-1 th frame in the video frames, and the reconstructed frame of the history frame is represented asFeature extraction is carried out on X _t through a feature extraction module 101 shown in FIG. 1B to obtain a feature F _t of the current frame; will/>Feature extraction by the feature extraction module 101 shown in fig. 1B results in feature/>, of the reconstructed frame

202. And performing motion estimation based on the characteristics of the current frame and the characteristics of the reconstructed frame of the historical frame to obtain motion parameters.

In FIG. 1B, the features of the current frame F _t, as well as the features of the reconstructed frame of the history frame, may be usedInput to the motion estimation module 102, and then processed by the motion estimation module 102 according to F _t and/>The motion parameter is calculated and is shown as θ _t in fig. 1B.

203. And coding the motion parameters to obtain coded motion parameters.

204. And decoding the encoded motion parameters to obtain reconstructed motion parameters.

In fig. 1B, the motion parameter θ _t is input to the motion codec module 103, the motion codec module 103 encodes the motion parameter to obtain the encoded motion parameter M _t, and then the motion codec module 103 decodes the encoded motion parameter M _t to obtain the reconstructed motion parameter, which is represented in fig. 1B as

205. And performing motion compensation on the reconstructed frame of the historical frame according to the reconstructed motion parameters to obtain a reconstructed predicted frame.

The reconstructed predicted frame is shown in FIG. 1B asThe above motion compensation process is that the motion compensation module 104 in fig. 1B uses motion parameters to reconstruct frame (reference frame) characteristics/>Performing deformation to generate a reconstructed predicted frame/>Is a process of (2).

206. And determining a reconstruction residual error according to the reconstruction predicted frame and the current frame.

Wherein, according to the reconstructed predicted frame and the current frame, the determination of the reconstructed residual may be by: using features F _t of current frame and reconstructing predicted frameA subtraction is performed to determine a reconstructed residual R _t.

207. And coding the reconstructed residual error to obtain a coded residual error.

The encoded residual may be denoted Y _t in fig. 1B, and the reconstructed residual R _t may be encoded by the residual codec module 105 shown in fig. 1B, resulting in an encoded residual Y _t.

208. And generating a bit stream based on the coded motion parameters and the coded residual, and sending the bit stream to a decoding device.

In fig. 1B, by inputting the encoded motion parameter M _t and the encoded residual Y _t into the entropy encoding module 106, a bitstream is generated by encoding by the entropy encoding module 106 and transmitted to a decoding device at the opposite end.

209. And decoding the coded residual error to obtain a recovered residual error.

In fig. 1B, the encoded residual Y _t is decoded by the residual codec module 105 to obtain a restored residual

210. And generating reconstructed frame characteristics of the current frame based on the recovered residual error and the reconstructed predicted frame.

In fig. 1B, the reconstruction module 109 will recover the residual errorAnd reconstructing predicted frame/>Adding to obtain the reconstructed frame characteristic/>, of the current frame

211. And carrying out characteristic enhancement on the reconstructed frame characteristics of the current frame to obtain the reconstructed frame of the current frame.

In FIG. 1B, frame features are to be reconstructedIs input into the post-enhancement processing module 107, and the post-enhancement processing module 107 reconstructs frame features/>Performing feature enhancement to obtain reconstructed frame/>, of the current frame

In the process shown in fig. 2, since the reconstructed frame of the history frame is the encoded frame, there is random distortion, and thus the image structure information is lost (lost or erroneous), so that the motion compensation residual error calculated based on the reconstructed frame of the history frame is also distorted, that is, the image structure information is lost or erroneous, which may result in inaccurate information after the final video encoding and reduce the video encoding performance.

Wherein the image structure information includes: texture information of an image and semantic information of the image. Texture information refers to the arrangement and frequency of tone variations across an image. Texture may include coarse texture and smooth texture. The hue is the relative brightness of an image, and is expressed as a color on a color image. Semantic information refers to meaning of the image content. Semantic information may be expressed in languages, including natural language and symbolic language (mathematical language). But the expression of semantic information is not limited to natural language, its appearance corresponds to all ways of understanding the image by the human visual system. For example, for a puppy image, the image semantics may include the natural language word "puppy" or a symbol representing the puppy image in the image.

The image structure information can be divided into high frequency information and low frequency information according to the frequency of the image, wherein the frequency of the image is an index of the intensity of the change of the gray value, and the image is a gradient of the gray in the plane space. The low-frequency information represents a region with slow brightness or gray value change in the image, namely a large flat region in the image, and describes the main part of the image, and is a comprehensive measure of the intensity of the whole image; the high frequency information corresponds to the highly varying parts of the image, i.e. edges (contours) or noise and detail parts of the image, mainly the measures of the edges and contours of the image.

When there is a loss of the above-mentioned image structure information, there is a loss of high-frequency information, which may cause difficulty in recognizing the edges and contours of the image. For example, the high frequency information of the image indicates that a face contour exists in the image, and after the high frequency information is lost, the face contour cannot be recognized. This can lead to inaccurate information after such images are ultimately encoded, degrading video coding performance.

In order to calculate an accurate motion compensation residual and improve video coding performance, some embodiments of the present application provide a motion compensation residual determination method, which may be implemented based on a motion compensation residual determination device, where the motion compensation residual determination device may be a coding apparatus or may be a functional module or a functional entity in the coding apparatus. The encoding device can be any device capable of realizing video encoding, such as a mobile phone, a television, a computer, a server and the like.

Fig. 3 is a schematic diagram of a basic framework of a coding model according to some embodiments of the present application.

As shown in fig. 3, the basic framework of the coding model may include: a feature extraction module 301, a motion estimation module 302, a motion codec module 303, a motion compensation module 304, a Residual codec module 305, an entropy coding module 306, a post-enhancement processing module 307, a Residual acquisition module 308, and a reconstruction module 309, and a Residual Fusion (Residual Fusion) module 310. The basic framework of the coding model shown in fig. 3 is added with a residual fusion module 310 compared with the basic framework shown in fig. 1B, and the acquisition of the original residual is added at the residual acquisition module 308, and the algorithms of other modules can be the same as those of the corresponding modules in fig. 1B.

The residual fusion module 310 is configured to fuse the reconstructed residual calculated by the reconstructed frame based on the history frame and the original residual calculated by the history frame to obtain a more accurate target fusion residual. Because the history frame is not encoded and reconstructed, the calculated original residual error is not distorted, the calculated original residual error also comprises complete image structure information, and the target fusion residual error obtained by fusing the reconstructed residual error and the original residual error is used as a final motion compensation residual error, so that the destroyed image structure information in the reconstructed residual error can be supplemented, and the final determined motion compensation residual error is more accurate.

The fusion mode for fusing the reconstructed residual and the original residual includes, but is not limited to, a first fusion mode and a second fusion mode described below.

Fusion mode one: splicing the reconstructed residual error and the original residual error to obtain a splicing result, and extracting characteristics of the splicing result to obtain a target fusion residual error.

In the first fusion mode, the original residual error and the original residual error are directly spliced to obtain the target fusion residual error after fusion, and the fusion mode has less operation and higher operation efficiency.

And a second fusion mode: splicing the reconstructed residual error and the original residual error to obtain a splicing result; extracting features of the spliced results to obtain initial fusion residual errors; processing the initial fusion residual error according to the Sigmoid activation function to obtain attention weight; performing feature enhancement on the reconstructed residual error according to the attention weight to obtain an enhancement result; and extracting the characteristics of the enhanced result to obtain a target fusion residual error.

The second fusion mode is to use the attention mechanism to fuse the reconstructed residual and the original residual on the basis of splicing the reconstructed residual and the original residual, namely, firstly solving the attention weight for assisting the residual frame, and then enhancing the coding of the residual by using the attention weight, so that compared with the first fusion mode, the first fusion mode has better performance, and the calculated target fusion residual is more accurate.

The coding model shown in fig. 3 may be trained based on a sample data set; wherein the sample dataset comprises: a plurality of sets of training data, each set of training data comprising: a video frame, a historical video frame of the video frame, and a reconstructed frame of the historical video frame. The historical video frame may be the previous frame of the one video frame, or the historical video frame may be any frame before the one video frame.

The coding model shown in fig. 3 is different from the model corresponding to the basic frame of video coding shown in fig. 1B in training: each set of training data includes a previous video frame of a video frame in addition to the reconstructed frame including the previous video frame and the previous video frame.

Wherein, the coding model shown in fig. 3 can adopt a whole cost function of a network as lambda, d+r when model training is performed, wherein D is a mean square loss (MSEloss) function of an original (input) video and a reconstructed (decoded) video, and R is an entropy coded bit stream; lambda is a constant used to balance code rate and distortion. And setting an initial learning rate, an adjustment rule of the learning rate, a training cycle number of learning and the like. The MSEloss functions are easy to calculate because the gradient derivative calculation process is simple.

Illustratively, lambda may be set to 1024, the initial learning rate may be set to 0.0001, and the adjustment rule for the learning rate is set to: the learning rate was adjusted to 0.1 times the original rate every 60 training cycles, and the training cycle number of the total learning was set to 240 cycles. Wherein, after each group of training data in the sample data set is subjected to a training process, a training period is considered to be completed.

The trained coding model can be directly used for determining motion compensation residual errors and performing video coding.

Fig. 4 is a flowchart illustrating a motion compensation residual determination method according to some embodiments of the present application.

The motion compensation residual determination method as shown in fig. 4 may be implemented based on the basic framework of the coding model shown in fig. 3, and may include, but is not limited to, the following steps:

401. And determining motion parameters according to the reconstructed frame of the historical frame and the current frame.

In FIG. 3, the current frame is the t-th frame of the video frames, the current frame is denoted as X _t, the history frame is the t-1 th frame, and the reconstructed frame of the history frame is denoted asFeature extraction is performed on X _t through a feature extraction module 301 shown in FIG. 3 to obtain a feature F _t of the current frame; will/>Feature extraction by the feature extraction module 301 shown in fig. 3 results in features/>, of the reconstructed frame

402. And encoding the motion parameters to obtain encoded motion parameters.

In FIG. 3, the characteristics of the current frame F _t, and the characteristics of the reconstructed frame of the t-1 frame, may be combinedInput to motion estimation module 302, and then processed by motion estimation module 302 according to F _t and/>And calculating a motion parameter theta _t.

403. And decoding the encoded motion parameters to obtain reconstructed motion parameters.

In fig. 3, the motion parameter θ _t is input to the motion codec module 303, the motion parameter is encoded by the motion codec module 303 to obtain the encoded motion parameter M _t, and then the encoded motion parameter M _t is decoded by the motion codec module 303 to obtain the reconstructed motion parameter

404. And performing motion compensation on the reconstructed frame of the historical frame based on the reconstructed motion parameters to obtain a reconstructed predicted frame.

Wherein the reconstructed predicted frame is also referred to hereinafter as the feature of the reconstructed predicted frame.

405. And performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame.

Wherein the original predicted frame is also referred to as the feature of the original predicted frame hereinafter.

Illustratively, the motion compensation module 304 according to FIG. 3 is based on reconstructed motion parametersReconstructed frame features for historical frames/>And historical frame characteristics F _t-1 to generate a reconstructed predicted frame/>And the process of the original predicted frame P _t can be expressed as follows:

where f () represents the computation function of the motion compensation module, which may remain the same as in the FVC framework or DCVC framework. Reconstruction motion parameters representing motion estimation,/>Representing reconstructed frame features of the historical frames, and F _t-1 represents historical frame features. /(I)Representing reconstructed predicted frames, i.e., predicted frame features generated based on the reconstructed frames, P _t represents the original predicted frames, i.e., predicted frame features generated based on the historical frames.

In fig. 3, the feature extraction module 301 may perform feature extraction on the historical frame X _t-1 to obtain the historical frame feature F _t-1, and then the motion compensation module 304 uses the reconstructed motion parametersDeforming the characteristic F _t-1 of the history frame to generate an original predicted frame/>Is a process of (2).

406. And determining a reconstruction residual error according to the reconstruction predicted frame and the current frame.

407. And determining an original residual error according to the original predicted frame and the current frame.

As shown in the figure 3 of the drawings,Representing reconstructed residual, r _t representing original residual, and determining reconstructed residual/>In time, the residual acquisition module 308 subtracts the feature P _t of the reconstructed predicted frame from the feature F _t of the current frame; the feature F _t of the current frame is used to subtract the feature/>, of the original predicted frame in determining the original residual r _t

Illustratively, the process of determining the reconstructed residual and the original residual may be as follows:

408. and fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error.

This target fusion residual is denoted as R _t in FIG. 3, and the reconstructed residual is to beAnd the original residual R _t are fused by the residual fusion module 310, so that a target fusion residual R _t can be output.

Fig. 5 is a schematic frame diagram of a residual fusion module applied in a fusion manner according to some embodiments of the present application.

In fig. 5, the frame includes a stitching module 501 and a feature extraction module 502, where the feature extraction module 502 includes: one convolutional layer, one active layer and another convolutional layer. Illustratively, the convolutional layers in the framework may each be a 3*3 convolutional kernel-sized convolutional layer, and the active layer may be a modified linear Unit (RECTIFIED LINEAR Unit, reLU) active function.

As shown in fig. 5, the residual will be reconstructedAnd the original residual r _t are input to the stitching module 501, original residual/>And the original residual r _t are both matrices, and the original residual/>And performing matrix splicing with the original residual error R _t to obtain a spliced large matrix as a splicing result, and inputting the splicing result into the feature extraction module 502 to perform feature extraction so as to obtain a target fusion residual error R _t. The first fusion mode can be expressed by the following formula:

In the above formula (5), conv () represents the algorithm of the convolution layer, relu () represents the algorithm of the activation layer, Representation will original residual/>And performing matrix splicing on the original residual r _t.

In the first fusion mode, the calculated channel number of the target fusion residual is the same as the channel number of the original residual, that is, the neuron number of the convolution layer in the residual fusion module shown in fig. 5 is the same as the neuron number of the convolution layer used for feature extraction in the FVC framework or DCVC framework.

In the first fusion mode, the original residual error is calculatedAnd the original residual r _t is directly spliced to obtain a target fusion residual after fusion, and the fusion mode has less operation and higher operation efficiency.

Fig. 6 is a schematic diagram of a framework of a residual fusion module applied in a fusion manner two according to some embodiments of the present application.

In fig. 6, the framework includes a stitching module 601 and a first feature extraction module 602, an activation function 603, a feature enhancement module 604, and a second feature extraction module 605, where each of the first feature extraction module 602 and the second feature extraction module 605 may include: one convolutional layer, one active layer and another convolutional layer. Illustratively, the convolutional layers in the framework may each be a 3*3 convolutional kernel-sized convolutional layer, the active layer may be a modified linear unit (RectifiedLinearUnit, reLU) active function, and the active function 603 may be a Sigmoid function.

As shown in fig. 6, the residual will be reconstructedAnd an original residual r _t are input to the stitching module 601, original residual/>And the original residual r _t are both matrices, and the original residual/>And performing matrix splicing with the original residual r _t to obtain a spliced large matrix as a splicing result, inputting the splicing result into the feature extraction module 602 for feature extraction, and inputting the initial fusion residual obtained by feature extraction into the activation function 603 to generate the attention weight A _t. Attention weight a _t and reconstructed residual/>Input to feature enhancement module 604 such that attention weight A _t and reconstructed residual/>Multiplying to reconstruct residual error/>And performing feature enhancement, and then inputting the result after the feature enhancement into a feature extraction module 605 to perform feature extraction so as to obtain a target fusion residual R _t.

The second fusion mode can be expressed by the following formula:

In the above formula, sigmoid () represents a Sigmoid function, conv () represents an algorithm of a convolution layer, relu () represents an algorithm of an activation layer, Representation will original residual/>Matrix splicing is carried out with the original residual error r _t,/>Representing the attention weight A _t and the reconstructed residual/>Multiplying.

The number of channels of the initial fusion residual (A _t) is smaller than or equal to that of the original residual, and the number of channels of the target fusion residual is the same as that of the original residual.

Illustratively, the number of neurons in the convolution layer before the activation function 603 is 1, and the number of neurons in other convolution layers is the same as the number of channels of the original residual, so that the calculated initial fusion residual is single-channel, and the attention weight a _t is also single-channel attention weight.

In the case where the number of channels of the initial fusion residual is smaller than the number of channels of the reconstruction residual, the amount of calculation in the fusion process can be reduced.

In the second fusion mode, the original residual error is calculatedOn the basis of splicing the original residual r _t, the attention mechanism is also utilized for fusion, namely attention weight A _t for assisting the residual frame is firstly obtained, and then the coding of the residual is enhanced by attention weight A _t, so that compared with the first fusion mode, the method has better performance, and the calculated target fusion residual is more accurate.

According to the motion compensation residual error determination method provided by some embodiments of the application, the reconstructed frame of the historical frame is subjected to motion compensation based on the reconstructed motion parameters, so as to obtain a reconstructed predicted frame, and the reconstructed motion parameters are obtained after the motion parameters determined by the reconstructed frame of the historical frame and the current frame are encoded and decoded; performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame; determining a reconstructed residual error according to the reconstructed predicted frame and the current frame; determining an original residual error according to the original predicted frame and the current frame; and fusing the reconstructed residual error and the original residual error to obtain a target fusion residual error. According to the scheme, the reconstructed residual error and the original residual error are calculated based on the reconstructed frame and the history frame of the history frame, and as the history frame is not encoded and reconstructed, distortion does not exist, the calculated original residual error is not distorted, complete image structure information is included, and the target fusion residual error obtained by fusing the reconstructed residual error and the original residual error is used as a final motion compensation residual error, so that damaged image structure information in the reconstructed residual error can be supplemented, and the final determined motion compensation residual error is more accurate.

After the target fusion residual is calculated, coding is carried out based on the target fusion residual to obtain a coding residual, and the coding residual and coding motion parameters are generated into a bit stream, so that the whole process of video coding can be realized, the information obtained by video coding has higher accuracy, and the coding performance is improved.

Fig. 7 is a flowchart of a video encoding method according to some embodiments of the present application. Fig. 7 is a block diagram showing a process of adding a subsequent residual coding to the target fusion residual calculated by the motion compensation residual determination method shown in fig. 4.

In connection with fig. 4, as shown in fig. 7, following step 408 in fig. 4, the following steps may also be performed:

409. and encoding the target fusion residual error to obtain an encoded residual error.

As shown in fig. 3, the target fusion residual may be input to the residual codec module 305, and the residual codec module 305 encodes the target fusion residual, so as to obtain an encoded residual Y _t.

410. A bitstream is generated based on the encoded motion parameters and the encoded residual.

The coding motion parameters are obtained by coding the motion parameters.

As shown in fig. 3, the encoded motion parameter M _t and the encoded residual Y _t are input into the entropy encoding module 306, and encoded by the entropy encoding module 306 to generate a bitstream. After generating the bitstream, the bitstream may be transmitted to a decoding device of the opposite end to enable data transmission after video encoding.

Exemplary, as shown in table 1 below: on a Class C test sequence under the HEVC universal test condition, some embodiments of the present application provide a video coding method (calculating a target fusion residual by adopting a fusion manner and performing video coding) and a test result of coding performance of the video coding method using the FVC framework.

TABLE 1

In table 1, in the field of image and video compression, the Bit rate after image and video compression refers to the coding length required for unit Pixel coding, and the unit of the general Bit Per Pixel (bpp) Bit rate is; peak signal-to-Noise Ratio (PSNR) is an engineering term that represents the Ratio of the maximum possible power of a signal to the destructive Noise power affecting its accuracy of representation, and is often used as a measure of the quality of signal reconstruction in decibels (dB) in the field of image compression and the like.

The overall cost function of the video coding method is lambda D+R, and the larger the value calculated by the cost function is, the worse the performance of the corresponding video coding method is; the smaller the value calculated by the cost function, the better the performance of the corresponding video coding method.

Wherein lambda is a constant, lambda is 1024 in this experiment, d= ^-0.1*PSNR, and R represents the code rate.

Referring to the data calculation overall cost function lambda d+r in table 1, it can be obtained that the overall cost function value calculated by the video coding method using the FVC frame is 0.9726, and the overall cost function value calculated by the video coding method provided by some embodiments of the present application is 0.96955424, and it can be seen that the overall cost function value calculated by the video coding method provided by some embodiments of the present application is smaller than the overall cost function value calculated by the video coding method using the FVC frame, so that the video coding method provided by some embodiments of the present application has better performance than the video coding method using the FVC frame, i.e., the video coding method provided by some embodiments of the present application improves overall coding performance.

Fig. 8 is a flowchart of a video encoding method according to some embodiments of the present application. Fig. 8 is a view showing an addition of a reconstruction process of a current frame to the video encoding method shown in fig. 7.

In connection with fig. 7, as shown in fig. 8, after step 409 in fig. 7, the following steps may also be performed:

411. and decoding the coded residual error to obtain a reconstructed fusion residual error.

As shown in fig. 3, the residual codec module 305 decodes the encoded residual Y _t to obtain a reconstructed fusion residual

412. And determining reconstruction characteristics according to the reconstruction fusion residual error and the reconstruction prediction frame.

As shown in fig. 3, the reconstruction module 309 may reconstruct the fusion residualAnd reconstructing predicted frame/>Adding to obtain the characteristic/>, of the reconstructed frame of the current frame

413. And carrying out feature enhancement on the reconstructed features to obtain a reconstructed frame of the current frame.

As shown in fig. 3, frame characteristics will be reconstructedIs input into the post-enhancement processing module 307, and the post-enhancement processing module 307 reconstructs frame features/>Performing feature enhancement to obtain reconstructed frame/>, of the current frame

In some embodiments of the present application, a reconstructed frame of the current frame may be reconstructed based on the target fusion residual, for use in a subsequent video encoding process. For example, when video encoding a next frame of a current frame, the current frame may be used as a history frame of the next frame, and the reconstructed frame of the current frame may be used as a reconstructed frame of the history frame to perform an encoding process.

After the bit stream is obtained by encoding according to the video encoding method provided by some embodiments of the present application, the bit stream may be sent to a decoding device, and after the decoding device receives the bit stream, video decoding may be performed to implement a video transmission process.

Fig. 9 is a schematic diagram of a basic framework for video decoding according to some embodiments of the present application.

The frame shown in fig. 9 includes: an entropy decoding module 901, a motion parameter decoding module 902, a feature extraction module 903, a motion compensation module 904, a residual decoding module 905, a reconstruction module 906, and a feature enhancement and reconstruction module 907.

The entropy decoding module 901 is configured to perform entropy decoding on the received bitstream, where the entropy decoding may recover data without distortion, and the entropy decoding method may include: exponential golomb decoding, context adaptive variable length coding (Context Adaptive VariableLength Coding, CAVLC) decoding, and context-based binary arithmetic coding (Context Adaptive Binary Arithmatic Coding, CABAC) decoding. In the embodiment of the present application, the entropy decoding module 901 may perform entropy decoding on a bitstream sent by an encoding device and received by a decoding device, to obtain an encoded motion parameter M _t and an encoded residual Y _t.

The motion parameter decoding module 902 is configured to perform data decoding, and perform decoding after receiving the encoded motion parameter M _t to obtain a decoded reconstructed motion parameter

The feature extraction module 903 is configured to perform feature extraction on the image frame to obtain features of the image frame. The application can reconstruct the history frameExtracting features to obtain features/>, of the reconstructed frame of the history frame

The motion compensation module 904 described above is similar in function to the motion compensation module 304 described above and illustrated in fig. 3. The motion compensation module 904 can reconstruct motion parameters from the reconstructed motion parametersFeatures of reconstructed frame to history frame/>Performing motion compensation to obtain a reconstructed predicted frame/>, of a reconstructed frame aiming at a historical frame

The residual decoding module 905 is configured to perform data decoding, and after the encoded residual Y _t is obtained, the encoded residual Y _t is decoded by the residual decoding module 905 to obtain a reconstructed fusion residual

A reconstruction module 906 for reconstructing the fusion residualAnd reconstructing predicted frame/>Adding to obtain the characteristic/>, of the reconstructed frame of the current frame

A feature enhancement and reconstruction module 907 for characterizing the reconstructed frame of the current framePerforming feature enhancement and reconstruction to finally obtain a reconstructed frame/>, of the current frame/>

As shown in fig. 9, after the decoding device receives a bitstream, the bitstream is entropy decoded by the entropy decoding module 901 to obtain a coded motion parameter M _t and a coded residual Y _t, and the coded motion parameter M _t is input to the motion parameter decoding module 902 to be decoded, so as to obtain a reconstructed motion parameterThe feature extraction module 903 reconstructs frames/>, for the historical framesExtracting features to obtain features/>, of the reconstructed frame of the history frameFeatures of reconstructed frame of historical frame/>And reconstructing motion parameters/>Input to the motion compensation module 904, the motion compensation module 904 generates a reconstructed motion parameter/>Features of reconstructed frame to history frame/>After motion compensation, a reconstructed predicted frame/>, of the reconstructed frame for the history frame, can be obtainedInputting the coded residual Y _t into the residual decoding module 905 for decoding to obtain a reconstructed fusion residual/>The reconstruction module 906 will then reconstruct the fusion residual/>And reconstructing predicted frame/>Adding to obtain the characteristic/>, of the reconstructed frame of the current frameFeature enhancement and reconstruction module 907 characterizes/>, the reconstructed frame of the current frameThe reconstructed frame/>, of the current frame can be obtained by carrying out feature enhancement and reconstruction

In some embodiments of the present application, since there are reconstructed frames of the history frames in the decoding device, there are no history frames, and thus in some embodiments of the present application the original residual is not directly used in place of the reconstructed residual in the encoding device. The original residual is utilized to assist the distorted reconstructed residual for encoding, so that the encoding efficiency is improved, and the decoding equipment is not influenced for decoding.

Fig. 10 is a schematic structural diagram of a motion compensation residual determining apparatus according to some embodiments of the present application. The motion compensation residual determination apparatus may include: a processor 1001, a memory 1002 and a computer program stored on the memory 1002 and executable on the processor 1001, which when executed by the processor 1001 implements the above described motion compensation residual determination method provided in an embodiment of the disclosure.

In some embodiments, the processor 1001 is configured to: performing motion compensation on a reconstructed frame of a historical frame based on a reconstructed motion parameter to obtain a reconstructed predicted frame, wherein the reconstructed motion parameter is obtained by encoding and decoding the motion parameter, and the motion parameter is determined according to the reconstructed frame and a current frame of the historical frame; performing motion compensation on the historical frame based on the reconstructed motion parameters to obtain an original predicted frame;

In some embodiments, the processor 1001 is further configured to: after the reconstructed residual error and the original residual error are fused to obtain a target fusion residual error, the target fusion residual error is encoded to obtain an encoded residual error;

And generating a bit stream based on the coding motion parameter and the coding residual, wherein the coding motion parameter is obtained by coding the motion parameter.

In some embodiments, the processor 1001 is specifically configured to: the fusing the reconstructed residual error and the original residual error to obtain a target fused residual error comprises the following steps:

Splicing the reconstructed residual error and the original residual error to obtain a splicing result;

and extracting features of the splicing result to obtain the target fusion residual error.

extracting features of the splicing results to obtain initial fusion residual errors;

Processing the initial fusion residual error according to a Sigmoid activation function to obtain attention weight;

performing feature enhancement on the reconstructed residual error according to the attention weight to obtain an enhancement result;

And extracting the characteristics of the enhanced result to obtain the target fusion residual error.

In some embodiments, the number of channels of the initial fusion residual is less than or equal to the number of channels of the original residual.

In some embodiments, the number of channels of the target fusion residual is the same as the number of channels of the original residual.

In some embodiments, the processor 1001 is further configured to: after encoding the target fusion residual error to obtain an encoded residual error, decoding the encoded residual error to obtain a reconstructed fusion residual error; determining reconstruction characteristics according to the reconstruction fusion residual error and the reconstruction prediction frame; and carrying out feature enhancement on the reconstruction features to obtain a reconstruction frame of the current frame.

The processor 1001 implements the above functions based on a coding model, which is trained based on a sample data set;

wherein the sample dataset comprises: a plurality of sets of training data, each set of training data comprising: a video frame, a historical video frame of the video frame, and a reconstructed frame of the historical video frame.

The embodiment of the disclosure also provides a terminal device, which comprises the motion compensation residual error determining device.

The motion compensation residual determination means may be any means that may implement the motion compensation residual determination method, for example, a video encoding chip, and the terminal device may be a video encoding device including the video encoding chip.

Some embodiments of the present application provide a computer readable storage medium, which is characterized in that a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program realizes each process of the motion compensation residual determination method in the above method embodiment, and the same technical effect can be achieved, so that repetition is avoided, and no description is repeated here.

The computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk.

Some embodiments of the present application provide a computer program product, where the computer program product stores a computer program, and when the computer program is executed by a processor, the computer program realizes each process of the motion compensation residual determination method in the foregoing method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and details are not repeated here.

It will be appreciated by those skilled in the art that embodiments of the application may be provided as methods, systems, or computer program products. Accordingly, some embodiments of the application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, some embodiments of the application may take the form of a computer program product on one or more computer-usable storage media having computer-usable program code embodied therein.

In some embodiments of the application, the Processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In some embodiments of the application, the memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory in a computer readable medium, such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

In some embodiments of the application, computer readable media include both non-transitory and non-transitory, removable and non-removable storage media. Storage media may embody any method or technology for storage of information, which may be computer readable instructions, data structures, program modules, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The above is merely a specific implementation of some embodiments of the application to enable a person skilled in the art to understand or practice them. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of some embodiments of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for motion compensated residual determination, comprising:

performing motion compensation on a reconstructed frame of a historical frame based on a reconstructed motion parameter to obtain a reconstructed predicted frame, wherein the reconstructed motion parameter is obtained by encoding and decoding the motion parameter, and the motion parameter is determined based on the reconstructed frame of the historical frame and a current frame;

2. The method of claim 1, wherein after fusing the reconstructed residual and the original residual to obtain a target fused residual, the method further comprises:

Encoding the target fusion residual error to obtain an encoded residual error;

3. The method according to claim 1, wherein the fusing the reconstructed residual and the original residual to obtain a target fused residual comprises:

4. The method according to claim 1, wherein the fusing the reconstructed residual and the original residual to obtain a target fused residual comprises:

5. The method of claim 4, wherein the initial fusion residual has a channel number less than or equal to the channel number of the original residual.

6. The method of claim 3 or 4, wherein the number of channels of the target fusion residual is the same as the number of channels of the original residual.

7. The method of claim 2, wherein after encoding the target fusion residual to obtain an encoded residual, the method further comprises:

decoding the encoded residual error to obtain a reconstructed fusion residual error;

determining reconstruction characteristics according to the reconstruction fusion residual error and the reconstruction prediction frame;

and carrying out feature enhancement on the reconstruction features to obtain a reconstruction frame of the current frame.

8. The method of claim 1, wherein the method is implemented based on a coding model that is trained based on a sample dataset;

9. A motion compensated residual determination apparatus, comprising:

A processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the motion compensation residual determination method of any one of claims 1 to 8.

10. A terminal device, comprising: the motion compensated residual determination device of claim 9.

11. A computer-readable storage medium, comprising: the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the motion compensated residual determination method according to any of claims 1 to 8.