CN113949883A - Coding method, decoding method and coding and decoding system of bidirectional prediction frame - Google Patents

Coding method, decoding method and coding and decoding system of bidirectional prediction frame Download PDF

Info

Publication number
CN113949883A
CN113949883A CN202010687711.6A CN202010687711A CN113949883A CN 113949883 A CN113949883 A CN 113949883A CN 202010687711 A CN202010687711 A CN 202010687711A CN 113949883 A CN113949883 A CN 113949883A
Authority
CN
China
Prior art keywords
frame
optical flow
bidirectional prediction
image
bidirectional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010687711.6A
Other languages
Chinese (zh)
Inventor
樊顺利
陈巍
肖云雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan TCL Group Industrial Research Institute Co Ltd
Original Assignee
Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan TCL Group Industrial Research Institute Co Ltd filed Critical Wuhan TCL Group Industrial Research Institute Co Ltd
Priority to CN202010687711.6A priority Critical patent/CN113949883A/en
Publication of CN113949883A publication Critical patent/CN113949883A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a coding method, a decoding method and a coding and decoding system of a bidirectional prediction frame, wherein the method comprises the following steps: acquiring a first optical flow of a forward frame and a second optical flow of a backward frame corresponding to a bidirectional prediction frame to be coded; and determining motion compensation information corresponding to the bidirectional prediction frame through a preset image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow so as to obtain a coding file corresponding to the bidirectional prediction frame. According to the embodiment of the application, the bidirectional prediction frame, the first optical flow and the second optical flow are used as the input items of the image coding model, and the motion compensation information corresponding to the bidirectional prediction frame is output through the image coding model, so that the accuracy of the motion compensation information can be improved by obtaining the motion compensation information through the image coding model, the motion compensation frame distortion corresponding to the bidirectional prediction frame obtained based on the motion compensation information is avoided, and the effect of the motion compensation frame corresponding to the bidirectional prediction frame is improved.

Description

Coding method, decoding method and coding and decoding system of bidirectional prediction frame
Technical Field
The present application relates to the field of image coding technologies, and in particular, to a method and a system for coding and decoding a bidirectional predicted frame.
Background
The bidirectional prediction frame in the video coding compression technology is the part with the largest compression rate in video compression, and the coding rate of the video can be effectively reduced. The bidirectional prediction frame is coded based on the bidirectional prediction optical flow technology, and the motion compensation is determined based on an artificially designed motion compensation algorithm. However, the manual design of the motion compensation algorithm relies on a priori knowledge, and the lack of a priori knowledge may cause motion compensation to be in error, thereby making the determination of the reconstructed bidirectional prediction frame based on motion compensation less effective.
Disclosure of Invention
The present application provides a method for encoding and decoding bi-directional predicted frames, and a system for encoding and decoding bi-directional predicted frames, aiming at the deficiencies of the prior art.
In order to solve the above technical problem, a first aspect of embodiments of the present application provides a method for encoding a bidirectional predictive frame, where the method includes
Acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be coded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame;
and determining motion compensation information corresponding to the bidirectional prediction frame through a preset image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow so as to obtain a coding file corresponding to the bidirectional prediction frame.
In one possible embodiment, the obtaining of the first optical flow and the second optical flow corresponding to the bidirectional prediction frame to be encoded specifically includes:
acquiring a forward frame and a backward frame corresponding to the bidirectional prediction frame, wherein the forward frame is positioned in front of and adjacent to the bidirectional prediction frame, and the backward frame is positioned behind and adjacent to the bidirectional prediction frame;
and acquiring a first optical flow corresponding to the forward frame and a second optical flow corresponding to the backward frame to obtain a first optical flow and a second optical flow corresponding to the bidirectional prediction frame.
In a possible embodiment, the determining, by a preset image coding model, motion compensation information corresponding to the bidirectional prediction frame based on the bidirectional prediction frame, the first optical flow, and the second optical flow to obtain a coding file corresponding to the bidirectional prediction frame specifically includes:
determining an encoded image frame corresponding to the bidirectional prediction frame based on the bidirectional prediction frame, the first optical flow and the second optical flow;
and inputting the coded image frame into the image coding model, and determining motion compensation information corresponding to the bidirectional prediction frame through the image coding model to obtain a coded file corresponding to the bidirectional prediction frame.
In a possible embodiment, said determining, based on said bidirectional predicted frame, said first optical flow and said second optical flow, an encoded image frame corresponding to said bidirectional predicted frame specifically comprises:
affine transforming the forward frame based on the first optical flow to obtain the target forward frame;
affine transforming the backward frame based on the second optical flow to obtain the target backward frame;
and splicing the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame according to channels to obtain an encoded image frame corresponding to the bidirectional prediction frame.
The encoding method of the bidirectional prediction frame, wherein the affine transformation specifically includes: performing spatial movement warp operation on the reference image frame to obtain a target image frame, wherein when the reference image frame is a forward frame, the target image frame is a target forward frame; when the reference image frame is a backward frame, the target image frame is a target backward frame.
In one possible embodiment, the image coding model comprises a plurality of cascaded convolution modules and fusion modules; the inputting the encoded image frame to a preset image encoding model, and determining motion compensation information corresponding to the bidirectional prediction frame through the image encoding model to obtain an encoded file corresponding to the bidirectional prediction frame specifically include:
determining a plurality of feature maps corresponding to the image coding model based on a plurality of cascaded convolution modules, wherein the feature maps correspond to the cascaded convolution modules one to one;
and inputting the characteristic graphs into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module to obtain a coding file corresponding to the bidirectional prediction frame.
In one possible embodiment, the image coding model includes a quantization module, the inputting the feature maps into the fusion module, and the determining, by the fusion module, the motion compensation information corresponding to the bidirectional prediction frame to obtain the coding file corresponding to the bidirectional prediction frame specifically includes:
inputting the feature maps into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module;
and inputting the motion compensation information into the quantization module, and generating a coding file corresponding to the bidirectional prediction frame through the quantization module.
A second aspect of the present embodiment provides a method for decoding a bidirectional predictive frame, which is used for decoding an encoded file encoded based on the bidirectional predictive frame encoding method according to any one of claims 1 to 7, the method including:
inputting a coding file into a preset image decoding model, and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model;
determining a target forward frame based on the first optical flow and a target backward frame based on a second optical flow;
and determining a motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
In one possible embodiment, the determining the target forward frame based on the first optical flow and the determining the target backward frame based on the second optical flow specifically includes:
acquiring a reconstructed forward frame and a reconstructed backward frame corresponding to the bidirectional prediction frame;
affine transforming the reconstructed forward frame based on the first optical flow to obtain the target forward frame;
performing affine transformation on the reconstructed backward frame based on the second optical flow to obtain the target backward frame.
A third aspect of the present embodiment provides a coding/decoding system for bidirectional predictive frames, which includes an encoding module and a decoding module;
the encoding module is used for acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be encoded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame; determining motion compensation information corresponding to the bidirectional prediction frame through an image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow to obtain a coding file corresponding to the bidirectional prediction frame;
the decoding module is used for inputting a coding file into a preset image decoding model and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model; determining a target forward frame based on the first optical flow and a target backward frame based on a second optical flow; and determining a motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
A fourth aspect of the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the method for encoding a bidirectional predictive frame as described above or to implement the steps in the method for decoding a bidirectional predictive frame as described above.
A fifth aspect of the present embodiment provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the method for encoding a bidirectional predictive frame as described above or implements the steps in the method for decoding a bidirectional predictive frame as described above.
Has the advantages that: compared with the prior art, the application provides a method for coding a bidirectional prediction frame, which comprises the following steps: acquiring a first optical flow of a forward frame and a second optical flow of a backward frame corresponding to a bidirectional prediction frame to be coded; and determining motion compensation information corresponding to the bidirectional prediction frame through an image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow so as to obtain a coding file corresponding to the bidirectional prediction frame. According to the embodiment of the application, the bidirectional prediction frame, the first optical flow and the second optical flow are used as the input items of the image coding model, and the motion compensation information corresponding to the bidirectional prediction frame is output through the image coding model, so that the accuracy of the motion compensation information can be improved by obtaining the motion compensation information through the image coding model, the motion compensation frame distortion corresponding to the bidirectional prediction frame obtained based on the motion compensation information is avoided, and the effect of the motion compensation frame corresponding to the bidirectional prediction frame is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.
Fig. 1 is a flowchart of a method for encoding a bidirectional predicted frame provided in the present application.
Fig. 2 is a diagram illustrating an example of a group of pictures in the method for encoding a bidirectional predicted frame provided in the present application.
Fig. 3 is a diagram illustrating an example of a bidirectional predicted frame in the method for encoding a bidirectional predicted frame provided in the present application.
Fig. 4 is an exemplary diagram of a first optical flow corresponding to a bidirectional predicted frame in the bidirectional predicted frame encoding method provided in the present application.
Fig. 5 is a diagram illustrating an example of a target forward frame corresponding to a bidirectional predicted frame in the bidirectional predicted frame encoding method provided in the present application.
Fig. 6 is a flowchart of a method for decoding a bidirectional predicted frame provided in the present application.
Fig. 7 is a diagram illustrating an example of a first predicted optical flow corresponding to an encoded file in the bidirectional predicted frame encoding method provided in the present application.
Fig. 8 is an exemplary diagram of a fusion coefficient map corresponding to an encoded file in the bidirectional predictive frame encoding method provided in the present application.
Fig. 9 is a diagram illustrating an example of a motion compensation frame corresponding to a bidirectional predicted frame corresponding to an encoded file in the bidirectional predicted frame encoding method provided in the present application.
Fig. 10 is a schematic structural diagram of a terminal device provided in the present application.
Detailed Description
In order to make the purpose, technical solution, and effect of the present application clearer and clearer, the present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present application, relevant elements related to the embodiments of the present application will be described first.
Video encoding (video encoding): process for compressing video (image sequence) into a code stream.
Video decoding (video decoding): and restoring the code stream into a processing process for reconstructing the bidirectional prediction frame according to a specific grammar rule and a specific processing method.
In most coding frameworks, a video comprises a series of pictures (pictures), one picture being called a frame (frame). An image is divided into at least one stripe, each stripe in turn being divided into image blocks (blocks). Video encoding or video decoding is in units of image blocks. For example, the encoding process or the decoding process may be performed from left to right, from top to bottom, one line after another, starting from the position of the upper left corner of the image. Here, the image block may be a Macroblock (MB) in the video coding standard h.264, or may also be a Coding Unit (CU) in the High Efficiency Video Coding (HEVC) standard, which is not specifically limited in this embodiment of the present invention.
In video encoding, image frames may be divided into I frames, P frames, and B frames according to the prediction type of the image frame. I-frames are frames encoded as independent still pictures, providing random access points in the video stream. A P frame is a frame predicted from a previous I frame or P frame adjacent thereto, and can be used as a reference frame for a next P frame or B frame. The B frame is a frame obtained by bidirectional prediction using the nearest previous frame and the nearest subsequent frame (which may be an I frame or a P frame) as reference frames. In the embodiment of the present application, the bidirectional predicted frame is a B frame.
Motion compensation is a method of describing the difference between adjacent frames (adjacent here means adjacent in coding relation, two frames are not necessarily adjacent in playing order).
The inventor finds that the video coding compression technology mainly adopts block-based hybrid video coding, and realizes video coding compression through the steps of intra-frame prediction (intra-prediction), inter-frame prediction (inter-prediction), transformation (transform), quantization (quantization), entropy coding (entropy encoding), in-loop filtering (in-loop filtering) (mainly de-blocking filtering) and the like. Inter-frame prediction may also be referred to as Motion Compensation Prediction (MCP), in which motion information of a video frame is obtained first, and then a predicted pixel value of the video frame is determined according to the motion information. A process of calculating motion information of a video frame is called Motion Estimation (ME), and a process of determining a predicted pixel value of the video frame based on the motion information is called Motion Compensation (MC). Inter prediction includes forward prediction, backward prediction, and bi-directional prediction according to a difference in prediction direction.
For the bidirectional prediction frame, the bidirectional prediction frame is mainly determined by the bidirectional prediction optical flow technology in the encoding process, the bidirectional prediction frame is encoded based on the bidirectional prediction optical flow technology, and the motion compensation is determined by the artificially designed motion compensation algorithm. However, the manual design of the motion compensation algorithm relies on a priori knowledge, and the lack of a priori knowledge may cause motion compensation to be in error, thereby making the determination of the reconstructed bidirectional prediction frame based on motion compensation less effective.
In order to solve the above problem, in the embodiment of the present application, after a first optical flow of a forward frame and a second optical flow of a backward frame corresponding to a bidirectional prediction frame to be encoded are obtained, motion compensation information corresponding to the bidirectional prediction frame is determined by a preset image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow, so as to obtain an encoded file corresponding to the bidirectional prediction frame. According to the method and the device, the bidirectional prediction frame, the first optical flow and the second optical flow are used as input items of the image coding model, and the motion compensation information corresponding to the bidirectional prediction frame is output through the image coding model, so that the motion compensation information is obtained through the image coding model, the accuracy of the motion compensation information can be improved, further, the distortion of the reconstructed bidirectional prediction frame obtained based on the motion compensation information is avoided, and the effect of reconstructing the bidirectional prediction frame is improved.
The encoding method of the bidirectional prediction frame provided by the embodiment can be executed by an encoding device, the device can be realized by software or hardware, and can be applied to a smart terminal such as a smart phone, a tablet computer or a personal digital assistant and the like which is provided with an operating system. Referring to fig. 1, the present embodiment provides a method for encoding a bidirectional predicted frame, which specifically includes:
s10, acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be coded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame;
specifically, the bidirectional prediction frame is an image frame obtained by using a most adjacent previous frame and a most adjacent next frame as two reference frames and performing prediction based on the two reference frames, and the bidirectional prediction frame is also called a B frame. The first optical flow is an optical flow of a forward frame corresponding to the bidirectional prediction frame, and the second optical flow is an optical flow of a backward frame corresponding to the bidirectional prediction frame, wherein the forward frame can be an I frame or a P frame, and the backward frame can be an I frame or a P frame. For example, the forward frame is an I frame, the backward frame is an I frame, or the forward frame is a P frame, and the backward frame is a P frame, or the forward frame is an I frame, and the backward frame is a P frame, or the forward frame is a P frame, and the backward frame is an I frame. The I frame is a frame coded as an independent static image, and when the I frame is decoded, the I frame can be reconstructed to obtain a complete image corresponding to the I frame only based on a coding file corresponding to the I frame without participation of other image frames; a P frame (also called a predicted frame) is a frame predicted from a previous I frame or P frame adjacent thereto, the P frame compresses the amount of transmission data by reducing redundant information of a previously encoded frame in an image sequence, and the P frame can be used as a reference frame for a P frame or B frame located therebehind.
Further, the bidirectional predicted frame, the forward frame and the backward frame are video frames in the same video sequence, and according to the playing sequence of the video sequence, the forward frame is located before the bidirectional predicted frame, and the backward frame is located after the bidirectional predicted frame; in coding order, the coding time of the forward frame is earlier than the coding time of the bidirectional predicted frame, and the coding time of the backward frame is also earlier than the bidirectional predicted frame. It can be understood that, when decoding and reconstructing the encoded bidirectional prediction frame, the decoded and reconstructed forward frame and backward frame can be obtained.
For example, the following steps are carried out: the bi-directionally predicted frame, the forward frame, and the backward frame are a group of pictures GOP in the video sequence (i.e., a group of consecutive pictures in the video sequence); the allocation of I, P, and B frames In a group of pictures is shown In fig. 2 (where the abscissa represents the frame number and the ordinate represents the encoding size), and the group of pictures includes 13 video frames, where I1 represents the 1 st I frame (main frame) In a group of pictures GOP, B1 represents the 1 st B frame In the group of pictures GOP, P1 represents the 1 st P frame In the group of pictures GOP, and so on, where In represents the nth I frame In the group of pictures GOP, Bn represents the nth B frame In the group of pictures GOP, Pn represents the nth P frame In the group of pictures GOP, where n is a positive integer. For each B frame, the forward frame corresponding to the B frame is an I frame or a P frame located in front of the B frame, and the backward frame corresponding to the B frame is an I frame or a P frame located behind the B frame. For example, for the first B frame B1, the forward frame corresponding to B1 is the first I frame I1; the backward frame corresponding to B1 is the first P frame P1; for the third B frame B3, the forward frame corresponding to B3 is the first P frame P1; the backward frame corresponding to B2 is the second P frame P2.
Further, in an implementation manner of this embodiment, the acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be encoded specifically includes:
s11, acquiring a forward frame and a backward frame corresponding to the bidirectional prediction frame;
s12, acquiring a first optical flow corresponding to the forward frame and a second optical flow corresponding to the backward frame to obtain a first optical flow and a second optical flow corresponding to the bidirectional prediction frame.
Specifically, the optical flow is used for reflecting the speed and direction of a pattern motion in an image, wherein the pattern motion means that when an object moves, the brightness pattern of a corresponding pixel point on the image of the object also moves. Based on this, the optical flow contains information of the motion of the object (e.g., x, y displacement amounts of pixels corresponding to the object during screen movement), and the motion information of the object can be determined based on the optical flow. For example, FIG. 3 shows a bi-directionally predicted frame, and the optical flow graph corresponding to the bi-directionally predicted frame can be as shown in FIG. 4. In addition, the first optical flow is used for reflecting motion information between a forward frame and the bidirectional prediction frame, and the second optical flow is used for reflecting operation information between a backward frame and the bidirectional prediction frame. Thus, the first optical flow is determined based on forward frames and bi-directionally predicted frames, and the second optical flow is determined based on backward frames and bi-directionally predicted frames.
Further, the first optical flow and the second optical flow may be calculated by a conventional method (for example, Lucas-Kanade optical flow method, etc.), or may be determined by a deep learning network. In one implementation of this embodiment, the first optical flow and the second optical flow are each determined based on a deep learning network (e.g., a convolutional neural network, etc.), and the entries corresponding to the first optical flow are forward frames and bi-directionally predicted optical flow, and the entries corresponding to the second optical flow are bi-directionally predicted optical flow and backward frames. Thus, assume that the deep learning network is IRR _ PWC, and the first optical flow is vt-1The second luminous flux is vt+1Bidirectional predicted frame is ftThe forward frame is ft-1The backward frame is ft+1The first luminous flux vt-1And a second luminous flux vt+1The calculation of (a) can be expressed as: v. oft-1=IRR_PWC(ft-1,ft);vt+1=IRR_PWC(ft,ft+1)。
S20, determining motion compensation information corresponding to the bidirectional prediction frame through a preset image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow to obtain a coding file corresponding to the bidirectional prediction frame.
Specifically, the image coding model is a trained network model and is used for determining motion compensation information corresponding to a bidirectional prediction frame, wherein the image coding model is obtained based on a preset training sample set sequence, the preset training sample set includes a plurality of training image groups, each training image group includes a training bidirectional prediction frame, a forward frame corresponding to the training bidirectional prediction frame and a backward frame corresponding to the training bidirectional prediction frame. Further, an output item of the image coding model is motion compensation information corresponding to a bidirectional prediction frame, and the motion compensation information is generated by the image coding model based on the bidirectional prediction frame, the first optical flow, and the second optical flow, so that an input item corresponding to the motion compensation information is determined based on the bidirectional prediction frame, the first optical flow, and the second optical flow.
Based on this, the determining, based on the bidirectional predicted frame, the first optical flow, and the second optical flow, an encoded image frame corresponding to the bidirectional predicted frame specifically includes:
affine transforming the reconstructed forward frame based on the first optical flow to obtain the target forward frame;
performing affine transformation on the reconstructed backward frame based on the first optical flow to obtain the target forward frame;
and splicing the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame according to channels to obtain an encoded image frame corresponding to the bidirectional prediction frame.
Specifically, the affine transformation is a warp operation for spatially moving the image frame, wherein the warp operation spatially moves each pixel in the image according to the optical flow (i.e., moves the pixel in the image by the x, y displacement amount in the optical flow). It is understood that the target forward frame is warp-operated on the forward frame based on the first optical flow, and the target backward frame is warp-operated on the backward frame based on the second optical flow, for example, the target forward frame corresponding to the bidirectional predicted frame as described in fig. 3 can be as shown in fig. 5. Therefore, the image scale of the target forward frame is the same as the image scale corresponding to the forward frame, and the image scale corresponding to the target backward frame is the same as the image scale corresponding to the backward frame. And because the forward frame and the backward frame are video frames in the same video training, the image scale of the forward frame is the same as that of the backward frame, and further the image scale of the target forward frame is the same as that of the target backward frame.
The image scale of the bidirectional prediction frame is the same as that of the forward frame, and correspondingly, the image scale of the bidirectional prediction frame is the same as that of the target forward frame. And in the same way, the image scale of the bidirectional prediction frame is the same as that of the target backward frame. The first optical flow is an optical flow map corresponding to the forward frame, and the image size of the first optical flow is the same as the image size of the forward frame. For example, the image scale of the bi-directionally predicted frame is 224 x 3; then the image scale of the first optical flow prediction graph is 224 x 3, the image scale of the second optical flow prediction graph is 224 x 3, and the image scale of the first optical flow is 224 x 2; the image scale of the second optical flow is 224 x 2.
Further, after the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame are obtained, the image sizes of the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame are the same, so that the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame are spliced according to channels to form a multi-channel image, wherein a multi-channel coding feature map obtained by the splicing is an input item of the image coding model. For example, when the image scale of the bidirectional predicted frame is 224 × 3, the image scale of the spliced encoding feature map is 224 × 13, that is, the encoding feature map is a 13-channel image.
Further, in an implementation manner of this embodiment, the image coding model includes a plurality of cascaded convolution modules and fusion modules; the inputting the encoded image frame to a preset image encoding model, and determining motion compensation information corresponding to the bidirectional prediction frame through the image encoding model to obtain an encoded file corresponding to the bidirectional prediction frame specifically include:
determining a plurality of feature maps corresponding to the image coding model based on a plurality of cascaded convolution modules, wherein the feature maps correspond to the cascaded convolution modules one to one;
and inputting the characteristic graphs into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module to obtain a coding file corresponding to the bidirectional prediction frame.
Specifically, the plurality of cascaded convolution modules are stacked in sequence, and the structures of the convolution modules are the same. In two adjacent sampling modules in the plurality of cascaded convolution modules, the output item of the convolution module positioned in front according to the cascade sequence is the input item of the convolution module positioned in the back; the input item of the convolution module positioned at the first bit in the cascade order is a coding characteristic graph; and the output items of the plurality of cascaded convolution modules are all input into the fusion module. It can be understood that the feature maps output by the plurality of cascaded convolution modules are all input items of the fusion module, and the fusion module is used for splicing the feature maps output by each convolution module to obtain the motion compensation information corresponding to the bidirectional prediction frame.
In one implementation of this embodiment, the image coding model includes four convolution modules, and each of the four convolution modules may adopt bilinear interpolation downsampling, where the bilinear cross-pin downsampling is 2 times bilinear interpolation downsampling, so that the width and height of the feature map of the output of the convolution module are changed to one half of the input terms. In addition, each convolution module performs 2-fold down-sampling on the input feature map, so that the image sizes of the image maps of the input items of the convolution modules are different. In order to enable the fusion module to splice the input feature maps, the feature maps need to be input into one convolution module by the convolution modules other than the fourth convolution module before the output feature map is input into the fusion module, and the feature map with the same image size as the feature map output by the fourth convolution module is output by the convolution module.
Based on this, the image coding module comprises three convolution units which are respectively marked as a first convolution unit, a second convolution unit and a third convolution unit; recording the four convolution modules as a first convolution module, a second convolution module, a third convolution module and a fourth convolution module respectively; the first convolution module is connected with the fusion module through the first convolution unit, the second convolution module is connected with the fusion module through the second convolution unit, the third convolution module is connected with the fusion module through the third convolution unit, and the fourth convolution module is connected with the fusion module. The image size of the feature map output by the first convolution unit, the image size of the feature map output by the second convolution unit, the image size of the feature map output by the third convolution unit and the image size of the feature map output by the fourth convolution module are the same. For example, the image size of the encoded feature map is 224 × 224, then the image size of the feature map output by the first convolution module is 112 × 112; the image size of the feature map of the output of the second convolution module is 56 x 56; the image size of the feature map output by the third convolution module is 28 x 28; the image size of the feature map output by the fourth convolution module is 14 x 14; the image size of the feature map output by the first convolution unit, the image size of the feature map output by the second convolution unit and the image size of the feature map output by the third convolution unit are all 14 × 14.
In addition, in a specific implementation manner of this embodiment, the four convolution modules have the same model structure, and each convolution module includes a convolution layer and a normalization layer, and parameters of the convolution layer are: the convolution kernel is 5 by 5, the number of the convolution kernels is 192, the step length is 2, the normalization layer is a GDN layer, and the generalized bifurcation normalization change is carried out on the characteristic diagram output by the convolution layer through the GDN layer so as to normalize the characteristic diagram. The parameters of the first convolution unit are as follows: 9 x 9, the number of convolution kernels is 192, the step size is 8; the parameters of the second convolution unit are: 5 by 5, the number of convolution kernels is 192, the step size is 4, the parameter of the third convolution unit is 3 by 3, the number of convolution kernels is 192, and the step size is 4.
Further, in order to reduce the data amount of the encoding profile, the encoding profile may be quantized before encoding the encoding profile, and the quantized encoding profile may be encoded. Correspondingly, in an implementation manner of this embodiment, the image coding model includes a quantization module, the inputting the feature maps into the fusion module, and determining, by the fusion module, motion compensation information corresponding to the bidirectional prediction frame to obtain a coding file corresponding to the bidirectional prediction frame specifically includes:
inputting the feature maps into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module;
and inputting the motion compensation information into the quantization module, and generating a coding file corresponding to the bidirectional prediction frame through the quantization module.
Specifically, the quantization module is configured to quantize the motion compensation information, where the quantization refers to dividing a value range of each pixel of the encoding feature map into a plurality of intervals, and setting values of all pixels in each interval to be the same value. The quantization of the motion compensation information may be performed by using an existing quantization method that can perform image quantization. In an implementation manner of this embodiment, the quantizing the motion compensation information may be performed by quantizing the motion compensation information in a clustering quantization manner. The process of quantizing the motion compensation information by using the clustering quantization mode may be: the given clustering quantization center point calculates the distance between each pixel point and the quantization center point in the motion compensation information, and takes the minimum distance in all the acquired distances as a quantization value, wherein the calculation formula of the distance between each pixel point and the quantization center point can be as follows:
Q(input_xi):=argminj(input_xi-cj),
therein, input_xiIth data representing input motion compensation information, cjExpress clustering quantization center point C ═ { C1,c2,...,cLThe jth component of j ∈ [1, L ]]And L is a positive integer.
Further, in order to ensure the error back propagation, the distance between each pixel point and the quantization center point needs to be subjected to soft quantization processing first, and then hard quantization processing is performed. The processing mode of the soft quantization processing is as follows:
Figure BDA0002588184280000141
the processing procedure of the hard quantization processing is as follows:
stop_gradient(Q(input_xi)-soft_Q(input_xi))+soft_Q(input_xi)
wherein stop _ gradient () stops the gradient computation.
After quantization processing, rounding is carried out on each distance obtained by quantization, a quantization value is determined according to each distance obtained by rounding, and finally, motion compensation information is quantized according to the quantization value to obtain the quantized motion compensation information.
Further, in an implementation manner of this embodiment, the image coding model corresponds to an image decoding model, and it is determined based on the image decoding model that a difference quantity between a motion compensation frame corresponding to a bidirectional prediction frame and the bidirectional prediction frame satisfies a preset condition, where the difference quantity refers to a quantity of difference pixels in the motion compensation frame corresponding to the bidirectional prediction frame, the difference pixels are pixels having a pixel value different from a pixel value of a target pixel, and the target pixel is a pixel corresponding to the difference pixel in an image to be coded. For example, if the difference pixel is a (5,5) pixel in the motion compensation frame corresponding to the bidirectional prediction frame and the pixel value is 155, the target pixel is a (5,5) pixel in the image to be encoded and the pixel value of the target pixel is not 155. In addition, the preset condition may be that the number of differences between the motion compensation frame and the bidirectional predicted frame corresponding to the bidirectional predicted frame determined based on the image decoding model is less than a preset threshold, for example, 10.
Further, the image coding model and the image decoding model may be completed together through synchronous training, where the synchronous training refers to training the image decoding model synchronously in the process of training the image coding model, and the conditions of the training completed by the image coding model and the image decoding model are the same. Correspondingly, the training process of the image encoding module and the image decoding module may be:
l10, acquiring a first optical flow and a second optical flow corresponding to the training bidirectional prediction frame, wherein the first optical flow is an optical flow of a forward frame of the training bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame;
l20, determining a coded image frame corresponding to the training bidirectional prediction frame based on the training bidirectional prediction frame, the first optical flow and the second optical flow, and determining motion compensation information corresponding to the training bidirectional prediction frame through an image coding model to obtain a coded file;
l30, determining the motion compensation frame corresponding to the coding file based on the image decoding model;
l40, constructing a loss function based on the motion compensation frame corresponding to the bidirectional prediction frame and the bidirectional prediction frame;
l50, updating model parameters of the image coding model and/or model parameters of the image decoding model based on the loss function.
Specifically, in the step L10, the training bidirectional predicted frame may be a B frame randomly selected from a group of pictures of a video sequence, where the video sequence may be an actual video captured in different scenes, or may be a network video training or the like. Alternatively, the training bidirectional prediction frame may be determined according to a video training format, a type of a scene to which the video sequence is directed, and the like, for example, if the video sequence is an image of a monitored scene, the training bidirectional prediction frame may be a B frame in a group of pictures selected from the monitored video. The process of determining the first optical flow and the second optical flow is the same as the process of step S10, and reference may be made to step S10, which is not repeated here.
Further, in the step L20, when the encoded image frame determined based on the training bidirectional prediction frame is input to the image coding model for the first time, the initial model parameters of the image coding model may be set to initialize the image coding model, and when a subsequent input operation is performed, the model parameters corresponding to the image coding model may be determined to be the model parameters updated after the input operation was performed for the last time. In addition, when the coded image frame is input into the image coding model, a plurality of cascaded convolution modules included in the image coding model are utilized to carry out feature extraction on the coded image frame, so that motion supplementary information of a training bidirectional prediction frame is obtained, and the motion supplementary information is coded to obtain a coded file.
Further, in the step L30, the input item of the image decoding module is an encoded file, and the output item of the image decoding module is a first predicted optical flow, a second predicted optical flow and a fusion coefficient, where the first predicted optical flow, the second predicted optical flow and the fusion coefficient are used to determine the motion compensation frame corresponding to the bidirectional prediction frame corresponding to the training bidirectional prediction frame based on the reconstructed forward frame corresponding to the forward frame and the reconstructed backward frame corresponding to the backward frame. In one implementation of this embodiment, the image decoding module may include a residual module, a first deconvolution module, and a second deconvolution module; the residual error module is connected with the first deconvolution module, and the second deconvolution module is connected with the first deconvolution module. Wherein the residual block comprises 3 concatenated residual blocks. The first deconvolution module comprises two cascaded deconvolution units, the model structures of the two deconvolution units are the same, and the two deconvolution units respectively comprise an IGDN layer (namely the inverse transformation of GDN), a deconvolution layer and a convolution layer; the parameter of the deconvolution layer is convolution kernel 3 × 3, the number of convolution kernels is 192, and the step length is 2; the convolution layer has parameters of 3 × 3 convolution kernels, 193 convolution kernels and a step size of 2. The second deconvolution module comprises an IGDN layer (namely the inverse transformation of GDN) and a deconvolution layer, the parameter of the deconvolution layer is that the convolution kernel is 3 x 3, the number of the convolution kernels is 5, and the step length is 2.
In addition, the image decoding module further includes a first deconvolution layer, a first convolution layer and a second convolution layer, the first deconvolution layer is located between the first convolution layer and the residual error module, an output item of the residual error module is an input item of the first deconvolution layer, an output item of the first deconvolution layer is an input item of the first convolution layer, the first convolution layer is connected with the first deconvolution module, and an output item of the first convolution layer is an input item of the first deconvolution module; and the second convolutional layer is positioned behind the second deconvolution module, the output item of the second deconvolution module is the input item of the second convolutional layer, and the output item of the second convolutional layer is the output item of the image decoding model, namely the output item of the second convolutional layer is the first predicted optical flow, the second predicted optical flow and the fusion coefficient.
Further, after the image decoding model outputs a first predicted optical flow, a second predicted optical flow and a fusion coefficient, determining a reference forward frame based on the first predicted optical flow and a reference backward frame based on the second optical flow; and determining a motion compensation frame corresponding to the coding file according to the reference forward frame, the reference backward frame and the fusion coefficient. The reference backward frame is obtained by performing a warp operation on the reconstructed forward frame based on the first predicted optical flow, and the reference backward frame is obtained by performing a warp operation on the reconstructed backward frame based on the second predicted optical flow; the corresponding relation between the motion compensation frame corresponding to the bidirectional prediction frame and the reference forward frame and the reference backward frame is as follows:
Figure BDA0002588184280000171
wherein,
Figure BDA0002588184280000172
for motion compensated frames corresponding to bi-directionally predicted frames,
Figure BDA0002588184280000173
in order to refer to the forward frame,
Figure BDA0002588184280000174
in order to reconstruct the forward frame,
Figure BDA0002588184280000175
is a first predicted optical flow;
Figure BDA0002588184280000176
in order to refer to the backward frame,
Figure BDA0002588184280000177
in order to reconstruct the backward frame,
Figure BDA0002588184280000178
for the second predicted light flow, α is the fusion coefficient.
Further, in the step L40, the loss function may be used to represent the difference between the predicted result and the actual value of the neural network, i.e. to characterize the accuracy of the predicted result of the neural network. In this embodiment, the loss function may be constructed based on a difference between a motion compensation frame corresponding to the bidirectional prediction frame and the bidirectional prediction frame construction loss function. For example, the loss function L may be constructed as:
Figure BDA0002588184280000179
wherein, IiTo train the element value of the ith element in a bi-directionally predicted frame, JiThe element value of the i-th element in the motion compensated frame corresponding to the bi-directionally predicted frame.
Further, in the step L50, the updating of the model parameters of the image coding model and/or the model parameters of the image decoding model based on the loss function is to update the model parameters of the image coding model and/or the model parameters of the image decoding model by using a gradient descent method. In this embodiment, the model parameters of the image coding model and/or the model parameters of the image decoding model may be used to reduce the difference between the training image obtained by updating the models of the image coding model and the image decoding model after the parameters are updated and the corresponding predictive coding image, so that the steps L10-L50 are performed by multiple iterations to gradually reduce the value of the loss function, that is, to gradually reduce the error between the training bidirectional prediction frame and the motion compensation frame corresponding to the corresponding bidirectional prediction frame.
Further, before updating the model parameters of the image coding model and/or the model parameters of the image decoding model based on the loss function, it may be determined whether the loss function satisfies a preset condition, and if the loss function does not satisfy the preset condition, the step of updating the model parameters of the image coding model and/or the model parameters of the image decoding model based on the loss function is performed. The preset condition includes that the loss function value meets a preset requirement or the training times reach a preset number, for example, 5000 times. Therefore, the process of judging whether the loss function meets the preset condition can be that whether the loss function value meets the preset requirement is judged firstly; if the loss function value meets the preset requirement, ending the training; if the loss function value does not meet the preset requirement, judging whether the training times of the preset network model reach the prediction times, and if not, correcting the network parameters of the preset network model according to the loss function value; and if the preset times are reached, ending the training. Therefore, whether the preset network model training is finished or not is judged through the loss function value and the training times, and the phenomenon that the training of the preset network model enters a dead cycle due to the fact that the loss function value cannot meet the preset requirement can be avoided.
In summary, the present embodiment provides a method for encoding a bidirectional predicted frame, the method including: acquiring a first optical flow of a forward frame and a second optical flow of a backward frame corresponding to a bidirectional prediction frame to be coded; and determining motion compensation information corresponding to the bidirectional prediction frame through an image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow so as to obtain a coding file corresponding to the bidirectional prediction frame. According to the embodiment of the application, the bidirectional prediction frame, the first optical flow and the second optical flow are used as the input items of the image coding model, and the motion compensation information corresponding to the bidirectional prediction frame is output through the image coding model, so that the accuracy of the motion compensation information can be improved by obtaining the motion compensation information through the image coding model, the motion compensation frame distortion corresponding to the bidirectional prediction frame obtained based on the motion compensation information is avoided, and the effect of the motion compensation frame corresponding to the bidirectional prediction frame is improved.
Based on the above bidirectional predicted frame encoding method, the present embodiment provides a bidirectional predicted frame decoding method, as shown in fig. 6, the method includes:
m10, inputting the coding file into a preset image decoding model, and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model.
Specifically, the encoded file carries motion compensation information corresponding to a bidirectional predicted frame. After the encoded file is obtained, the encoded file may be decoded to obtain motion compensation information included in the encoded file. In an implementation manner of this embodiment, the motion compensation information is decoded in a lossless decoding manner, so that all motion compensation information corresponding to the bidirectional prediction frame carried by the obtained motion compensation information can be obtained, thereby avoiding loss of the motion compensation information and avoiding an image distortion rate of the reconstructed bidirectional prediction frame with respect to the bidirectional prediction frame corresponding to the encoded file. The encoded file is generated by using the above encoding method, wherein the encoded file is obtained by encoding the bidirectional predictive frame according to the above encoding method of the embodiment, and the descriptions of step S10 and step S20 in the above embodiment may be specifically referred to.
Further, the image decoding model is an image decoding model corresponding to the image coding model, and a model structure and a working process of the image decoding model are the same as those of the image decoding model corresponding to the image coding model in the coding method of the bidirectional prediction frame, and specifically refer to the description of the image decoding model corresponding to the image coding model in the coding method of the bidirectional prediction frame, which is not repeated here. In addition, the training process of the image decoding model is the same as the training process of the image decoding model in the above example, and the training process described in the above example may be specifically referred to. For example, the image decoding model is trained in synchronization with the image coding model.
Further, the output items of the image decoding model are a first optical flow, a second optical flow and a fusion coefficient, wherein the first optical flow is used for reflecting motion information between a forward frame and the bidirectional prediction frame, and the second optical flow is used for reflecting operation information between a backward frame and the bidirectional prediction frame; for example, the first flow of light may correspond to a flow diagram as described in FIG. 7. And the fusion coefficient is used for the weight coefficient of the target forward frame and the target backward frame when the target forward frame and the target backward frame are fused. The fusion coefficient may be represented as a fusion coefficient map, for example, a fusion coefficient map as shown in fig. 8, in which the element value of each element in the fusion coefficient map is a fusion coefficient, and the fusion coefficient is greater than 0 and less than 1.
M20, determining a target forward frame based on the first optical flow, and determining a target backward frame based on the second optical flow.
Specifically, the first optical flow is used for reflecting motion information between a forward frame and the bidirectional prediction frame, and the second optical flow is used for reflecting operation information between a backward frame and the bidirectional prediction frame. The target forward frame is determined for a reconstructed forward frame corresponding to a forward frame corresponding to the bidirectional prediction frame based on the first optical flow, and the target backward frame is determined for a reconstructed backward frame corresponding to a backward frame corresponding to the bidirectional prediction frame based on the first optical flow.
Based on this, the determining the target forward frame based on the first optical flow and the determining the target backward frame based on the second optical flow specifically include:
acquiring a reconstructed forward frame and a reconstructed backward frame corresponding to the bidirectional prediction frame;
affine transforming the reconstructed forward frame based on the first optical flow to obtain the target forward frame;
performing affine transformation on the reconstructed backward frame based on the second optical flow to obtain the target backward frame.
Specifically, the affine transformation is a warp operation for spatially moving the image frame, wherein the warp operation spatially moves each pixel in the image according to the optical flow (i.e., moves the pixel in the image by the x, y displacement amount in the optical flow). It is understood that the target forward frame is warp-operated on the reconstructed forward frame based on the first optical flow, and the target backward frame is warp-operated on the reconstructed backward frame based on the second optical flow. Therefore, the image scale of the target forward frame is the same as the image scale corresponding to the reconstructed forward frame, and the image scale corresponding to the target backward frame is the same as the image scale corresponding to the reconstructed backward frame.
M30, determining the motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
Specifically, the fusion coefficient is used for a weighting coefficient of the target forward frame and the target backward frame when the target forward frame and the target backward frame are fused. The fusion coefficient may be represented as a fusion coefficient map, for example, a fusion coefficient map as shown in fig. 6, in which the element value of each element in the fusion coefficient map is a fusion coefficient, and the fusion coefficient is greater than 0 and less than 1. Furthermore, after determining the fusion coefficient, the weight coefficient corresponding to the target forward frame and the weight coefficient corresponding to the target backward frame may be determined based on the fusion coefficient, for example, the weight coefficient corresponding to the target forward frame is a fusion coefficient, and the weight coefficient corresponding to the target backward frame is a 1-fusion coefficient.
In addition, after determining and obtaining the weighting coefficient corresponding to the target forward frame and the weighting coefficient corresponding to the target backward frame, the target forward frame and the target backward frame may be weighted to obtain the motion compensation frame corresponding to the bidirectional prediction frame (as shown in fig. 9). Thus, the fusion formula of the motion compensation frame corresponding to the bidirectional prediction frame may be:
Figure BDA0002588184280000201
wherein,
Figure BDA0002588184280000202
corresponding motion for bi-directionally predicted framesThe frame is compensated for in that,
Figure BDA0002588184280000203
in order to refer to the forward frame,
Figure BDA0002588184280000204
in order to reconstruct the forward frame,
Figure BDA0002588184280000205
is a first predicted optical flow;
Figure BDA0002588184280000206
in order to refer to the backward frame,
Figure BDA0002588184280000207
in order to reconstruct the backward frame,
Figure BDA0002588184280000208
for the second predicted light flow, α is the fusion coefficient.
Based on the above coding method for bidirectional predicted frame, the present embodiment further provides a coding and decoding system for bidirectional predicted frame, where the coding and decoding system includes a coding module and a decoding module;
the encoding module is used for acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be encoded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame; determining motion compensation information corresponding to the bidirectional prediction frame through an image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow to obtain a coding file corresponding to the bidirectional prediction frame;
the decoding module is used for inputting a coding file into a preset image decoding model and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model; determining a target forward frame based on the first optical flow and a target backward frame based on a second optical flow; and determining a motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
Specifically, the execution process of the encoding module is the same as the execution process of the encoding method of the bidirectional predictive frame in the foregoing embodiment, and the execution process of the decoding module is the same as the execution process of the decoding method of the bidirectional predictive frame in the foregoing embodiment, which is not repeated here, and specific reference may be made to the description of the encoding method of the bidirectional predictive frame and the decoding method of the bidirectional predictive frame.
Based on the above bidirectional predicted frame encoding method, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the bidirectional predicted frame encoding method according to the above embodiment.
Based on the above bidirectional predictive frame encoding method, the present application also provides a terminal device, as shown in fig. 10, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.
Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data reconstructed according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.
In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the terminal device are described in detail in the method, and are not stated herein.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (12)

1. A method of encoding a bi-directionally predicted frame, the method comprising:
acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be coded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame;
and determining motion compensation information corresponding to the bidirectional prediction frame through a preset image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow so as to obtain a coding file corresponding to the bidirectional prediction frame.
2. The method according to claim 1, wherein said obtaining a first optical flow and a second optical flow corresponding to a bidirectional predicted frame to be encoded comprises:
acquiring a forward frame and a backward frame corresponding to the bidirectional prediction frame, wherein the forward frame is positioned in front of and adjacent to the bidirectional prediction frame, and the backward frame is positioned behind and adjacent to the bidirectional prediction frame;
and acquiring a first optical flow corresponding to the forward frame and a second optical flow corresponding to the backward frame to obtain a first optical flow and a second optical flow corresponding to the bidirectional prediction frame.
3. The method as claimed in claim 1, wherein the determining motion compensation information corresponding to the bidirectional predicted frame by a preset image coding model based on the bidirectional predicted frame, the first optical flow and the second optical flow to obtain a coding file corresponding to the bidirectional predicted frame specifically comprises:
determining an encoded image frame corresponding to the bidirectional prediction frame based on the bidirectional prediction frame, the first optical flow and the second optical flow;
and inputting the coded image frame into the image coding model, and determining motion compensation information corresponding to the bidirectional prediction frame through the image coding model to obtain a coded file corresponding to the bidirectional prediction frame.
4. The method according to claim 3, wherein said determining, based on said bidirectional predicted frame, said first optical flow and said second optical flow, an encoded image frame corresponding to said bidirectional predicted frame specifically comprises:
affine transforming the forward frame based on the first optical flow to obtain the target forward frame;
affine transforming the backward frame based on the second optical flow to obtain the target backward frame;
and splicing the bidirectional prediction frame, the first optical flow, the second optical flow, the target forward frame and the target backward frame according to channels to obtain an encoded image frame corresponding to the bidirectional prediction frame.
5. Method for encoding bidirectional predicted frames according to claim 4, characterized in that said affine transformation is in particular: performing spatial movement warp operation on the reference image frame to obtain a target image frame, wherein when the reference image frame is a forward frame, the target image frame is a target forward frame; when the reference image frame is a backward frame, the target image frame is a target backward frame.
6. The method of claim 3, wherein the image coding model comprises a plurality of cascaded convolution modules and fusion modules; the inputting the encoded image frame to a preset image encoding model, and determining motion compensation information corresponding to the bidirectional prediction frame through the image encoding model to obtain an encoded file corresponding to the bidirectional prediction frame specifically include:
determining a plurality of feature maps corresponding to the image coding model based on a plurality of cascaded convolution modules, wherein the feature maps correspond to the cascaded convolution modules one to one;
and inputting the characteristic graphs into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module to obtain a coding file corresponding to the bidirectional prediction frame.
7. The method according to claim 6, wherein the image coding model comprises a quantization module, and the inputting the feature maps into the fusion module, and the determining motion compensation information corresponding to the bidirectional prediction frame by the fusion module to obtain the encoded file corresponding to the bidirectional prediction frame specifically comprises:
inputting the feature maps into the fusion module, and determining motion compensation information corresponding to the bidirectional prediction frame through the fusion module;
and inputting the motion compensation information into the quantization module, and generating a coding file corresponding to the bidirectional prediction frame through the quantization module.
8. A method for decoding a bidirectionally predicted frame, said method being used for decoding an encoded file encoded based on the bidirectionally predicted frame encoding method according to any one of claims 1 to 7, said method comprising:
inputting a coding file into a preset image decoding model, and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model;
determining a target forward frame based on the first optical flow and a target backward frame based on a second optical flow;
and determining a motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
9. The method for decoding bidirectional predicted frames according to claim 8, wherein said determining a target forward frame based on said first optical flow and a target backward frame based on said second optical flow specifically comprises:
acquiring a reconstructed forward frame and a reconstructed backward frame corresponding to the bidirectional prediction frame;
affine transforming the reconstructed forward frame based on the first optical flow to obtain the target forward frame;
performing affine transformation on the reconstructed backward frame based on the second optical flow to obtain the target backward frame.
10. A coding and decoding system of bidirectional prediction frame is characterized in that the coding and decoding system comprises a coding module and a decoding module;
the encoding module is used for acquiring a first optical flow and a second optical flow corresponding to a bidirectional prediction frame to be encoded, wherein the first optical flow is an optical flow corresponding to a forward frame of the bidirectional prediction frame, and the second optical flow is an optical flow corresponding to a backward frame of the bidirectional prediction frame; determining motion compensation information corresponding to the bidirectional prediction frame through an image coding model based on the bidirectional prediction frame, the first optical flow and the second optical flow to obtain a coding file corresponding to the bidirectional prediction frame;
the decoding module is used for inputting a coding file into a preset image decoding model and outputting a first optical flow, a second optical flow and a fusion coefficient corresponding to the coding file through the image decoding model; determining a target forward frame based on the first optical flow and a target backward frame based on a second optical flow; and determining a motion compensation frame corresponding to the coding file according to the target forward frame, the target backward frame and the fusion coefficient.
11. A computer readable storage medium, storing one or more programs which are executable by one or more processors to implement the steps in the method for encoding a bidirectional predictive frame as recited in any one of claims 1 to 7, or to implement the steps in the method for decoding a bidirectional predictive frame as recited in any one of claims 8 to 9.
12. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in a method of encoding a bidirectional predictive frame as recited in any one of claims 1 to 7, or implements the steps in a method of decoding a bidirectional predictive frame as recited in any one of claims 8 to 9.
CN202010687711.6A 2020-07-16 2020-07-16 Coding method, decoding method and coding and decoding system of bidirectional prediction frame Pending CN113949883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010687711.6A CN113949883A (en) 2020-07-16 2020-07-16 Coding method, decoding method and coding and decoding system of bidirectional prediction frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010687711.6A CN113949883A (en) 2020-07-16 2020-07-16 Coding method, decoding method and coding and decoding system of bidirectional prediction frame

Publications (1)

Publication Number Publication Date
CN113949883A true CN113949883A (en) 2022-01-18

Family

ID=79326417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010687711.6A Pending CN113949883A (en) 2020-07-16 2020-07-16 Coding method, decoding method and coding and decoding system of bidirectional prediction frame

Country Status (1)

Country Link
CN (1) CN113949883A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023197717A1 (en) * 2022-04-15 2023-10-19 华为技术有限公司 Image decoding method and apparatus, and image coding method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023197717A1 (en) * 2022-04-15 2023-10-19 华为技术有限公司 Image decoding method and apparatus, and image coding method and apparatus

Similar Documents

Publication Publication Date Title
CN108921910B (en) JPEG coding compressed image restoration method based on scalable convolutional neural network
CN104581177B (en) Image compression method and device combining block matching and string matching
CN103037214A (en) Video compression method
JP2002531971A (en) Image processing circuit and method for reducing differences between pixel values across image boundaries
US20130163673A1 (en) Methods and apparatus for encoding video signals using motion compensated example-based super-resolution for video compression
CN102263951B (en) Quick fractal video compression and decompression method
CN102291579B (en) Rapid fractal compression and decompression method for multi-cast stereo video
EP3610647A1 (en) Apparatuses and methods for encoding and decoding a panoramic video signal
CN106170093B (en) Intra-frame prediction performance improving coding method
CN103141097A (en) Optimized deblocking filters
CN113766249A (en) Loop filtering method, device and equipment in video coding and decoding and storage medium
TWI468018B (en) Video coding using vector quantized deblocking filters
CN115552905A (en) Global skip connection based CNN filter for image and video coding
CN102316323B (en) Rapid binocular stereo-video fractal compressing and uncompressing method
CN114793282B (en) Neural network-based video compression with bit allocation
CN113810715B (en) Video compression reference image generation method based on cavity convolutional neural network
CN113949883A (en) Coding method, decoding method and coding and decoding system of bidirectional prediction frame
CN111583345B (en) Method, device and equipment for acquiring camera parameters and storage medium
CN104918047A (en) Bidirectional motion estimation elimination method and device
CN115499666A (en) Video compression method, video decompression method, video compression device, video decompression device, and storage medium
KR20110126075A (en) Method and apparatus for video encoding and decoding using extended block filtering
CN112468826B (en) VVC loop filtering method and system based on multilayer GAN
CN102263953B (en) Quick fractal compression and decompression method for multicasting stereo video based on object
CN114079790A (en) Coding method, decoding method, medium and terminal equipment of forward prediction frame
CN113573076A (en) Method and apparatus for video encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination