CN114513660B - Interframe image mode decision method based on convolutional neural network - Google Patents

Interframe image mode decision method based on convolutional neural network Download PDF

Info

Publication number
CN114513660B
CN114513660B CN202210407485.0A CN202210407485A CN114513660B CN 114513660 B CN114513660 B CN 114513660B CN 202210407485 A CN202210407485 A CN 202210407485A CN 114513660 B CN114513660 B CN 114513660B
Authority
CN
China
Prior art keywords
layer
residual
mode
coding
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210407485.0A
Other languages
Chinese (zh)
Other versions
CN114513660A (en
Inventor
蒋先涛
张纪庄
郭咏梅
郭咏阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Kangda Kaineng Medical Technology Co ltd
Original Assignee
Ningbo Kangda Kaineng Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Kangda Kaineng Medical Technology Co ltd filed Critical Ningbo Kangda Kaineng Medical Technology Co ltd
Priority to CN202210407485.0A priority Critical patent/CN114513660B/en
Publication of CN114513660A publication Critical patent/CN114513660A/en
Application granted granted Critical
Publication of CN114513660B publication Critical patent/CN114513660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses an interframe image mode decision method based on a convolutional neural network, which relates to the technical field of image processing and comprises the following steps: acquiring a coded image and a residual image of a next coding depth after the target inter-frame image executes a merging mode; extracting bottom layer characteristics in the input information through a convolution layer in the multilayer tree CNN by taking the connected coded image and the residual image as the input information; performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree CNN; performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition; and coding the target inter-frame image according to the partition mode of each coding block under each coding depth. The method utilizes the advantages of low coding bit rate requirement of the merging mode and the characteristic learning of the convolutional neural network, maintains the distortion performance and reduces the coding time.

Description

Interframe image mode decision method based on convolutional neural network
Technical Field
The invention relates to the technical field of image processing, in particular to an interframe image mode decision method based on a convolutional neural network.
Background
With the development of multimedia technology, new video formats such as Ultra High Definition (UHD), Virtual Reality (VR), 360-degree video, etc. have appeared. Accordingly, there is an increasing demand for new video coding standards that support higher resolution and higher coding efficiency. Universal video coding (VVC) was developed by the joint video development team (jfet) of VCEG and MPEG. This protocol was finalized 7 months in 2020. VVC, as the latest video coding standard, employs several new coding schemes and tools, such as a Coding Tree Unit (CTU) with a maximum size of 128 × 128, a quadtree and multi-type tree structure (QT + MTT) divided by a Coding Unit (CU), affine motion compensated prediction, and the like. These new techniques achieve approximately 50% gain over the HEVC standard in terms of bit rate reduction. However, the computational complexity of encoding and decoding also increases correspondingly dramatically.
The VVC encoder takes advantage of the redundancy that exists between pictures. After the block division, motion compensation is performed on each coding block. There are two main coding methods for intra prediction mode: an Advanced Motion Vector Prediction (AMVP) mode and a merge mode. In the AMVP mode, optimal values of a plurality of motion vector candidates, motion vector difference values, reference picture numbers, and unidirectional/bidirectional prediction modes are encoded. In merge mode, only the optimal values of the plurality of candidate motion vectors are encoded. The AMVP mode has the advantages of free parameter determination and encoding, but the number of bits required for encoding parameters is high, and a complicated encoding process and motion estimation are required. For Merge mode, the number of bits required for encoding is very small, but the prediction value is not accurate. Some studies on coding under the VVC coding standard have been completed, however, few studies have considered the characteristics of mutual prediction, and related studies have shown that CNN-based methods are suitable for processing images. Therefore, we define the problem as using Convolutional Neural Network (CNN) to decide the split mode of the Code Tree Unit (CTU), and propose a new VVC inter-frame prediction fast mode decision method of convolutional neural network.
Disclosure of Invention
In order to reduce the high coding computation complexity of interframe coding caused by adopting an advanced motion vector prediction mode, the invention provides an interframe image mode decision method based on a convolutional neural network, which comprises the following steps of:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: and judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1.
Further, the step of S1 is preceded by the steps of:
s0: and training a multilayer tree CNN based on partition division mode selection results under each coding depth acquired by the advanced motion vector mode and corresponding inter-frame images.
Further, in the step S0, the multi-layer tree CNN is trained based on a weighted classification cross entropy loss function, where the weighted classification cross entropy loss function may be expressed as the following formula:
Figure 511501DEST_PATH_IMAGE001
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,
Figure 350013DEST_PATH_IMAGE002
is a constant that is initially 1 and,
Figure 660909DEST_PATH_IMAGE003
is as follows
Figure 728222DEST_PATH_IMAGE002
The weight of the layer residual layer(s),
Figure 729545DEST_PATH_IMAGE004
for multi-layer tree CNN at
Figure 83166DEST_PATH_IMAGE002
Cross entropy loss when layers are residual.
Further, the partition dividing mode includes a non-dividing mode, a quadtree mode, a horizontal binary tree mode, a vertical binary tree mode, a horizontal ternary tree mode, and a vertical ternary tree mode.
Further, the step of S4 is followed by the step of:
s41: judging whether the partition mode of the coding block under the current coding depth and the partition is a non-partition mode, if so, entering a step S42, and if not, entering a step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
Further, after the step of S3, the method further includes the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding block under the current coding depth and partition division.
Further, the multi-layer tree CNN includes:
the convolution layer comprises a convolution kernel of 3 multiplied by 3 and is used for extracting bottom layer characteristics in the input information;
the transition residual error layer is used for outputting a first residual error block according to the bottom layer characteristics;
a head-end residual layer for outputting a first convolution output and a second residual block by convolution between the bottom layer features and the first residual block;
a middle residual layer for outputting a second convolution output and a third residual block through convolution between the bottom layer features and the second residual block;
a final residual layer for outputting a third convolution output by convolution between the bottom layer features and a third residual block;
the full connection layer is used for fully connecting the first convolution output, the second convolution output and the third convolution output and outputting a partition division mode decision;
the convolutional layer, the transition residual layer, the head residual layer, the middle residual layer and the tail residual layer are sequentially connected.
Furthermore, an information vector connection layer is respectively connected between the head end residual error layer, the middle residual error layer, the tail end residual error layer and the full connection layer, and the information vector connection layer is used for performing information vector connection on convolution output, corresponding image number information and quantization parameters of the coding blocks.
Compared with the prior art, the invention at least has the following beneficial effects:
(1) the interframe image mode decision method based on the convolutional neural network combines a merging mode (Merge) in an internal prediction mode with the Convolutional Neural Network (CNN), and utilizes the advantages of low coding bit rate requirement of the merging mode and feature learning of the convolutional neural network to reduce the time required by coding while maintaining the rate distortion performance;
(2) adding image number information and quantization parameters of coding blocks under the current coding depth and partition division into the multilayer tree-shaped CNN, so that the multilayer tree-shaped CNN can better accord with the parameter characteristics in the actual coding process, and the decision accuracy is further improved;
(3) aiming at the partition problem of the coding block, considering that the partition of the block is similar to a tree-shaped split structure, and learning the characteristics of a layered split tree when the coding block is partitioned through a multilayer tree-shaped CNN by setting a weight different from iteration;
(4) in the training process of the multilayer tree-shaped CNN, higher weights are set for corresponding levels at different training stages, so that the trained multilayer tree-shaped CNN can solve complex problems more effectively.
Drawings
FIG. 1 is a diagram of method steps for an interframe image mode decision method based on a convolutional neural network;
fig. 2 is a schematic diagram of a multi-layer tree CNN architecture.
Detailed Description
The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.
Example one
The VVC inherits the quadtree partitioning of HEVC, and meanwhile, in order to better adapt to encoding of ultra high definition video, the allowed maximum coding tree unit size of the VVC is 128 × 128. For the VVC inter-coding and QT + MTT partition problems, a new computational complexity optimization method is needed. With the QT + MTT partition structure, the coding unit can partition between a Quadtree (QT), a Binary Tree (BT), and a Ternary Tree (TT). In addition, horizontal (H) and vertical (V) direction splitting may also be used for BT and TT. Therefore, the coding unit has 6 split modes in total (i.e., the Non-split mode Non-split, the quadtree mode QT, the horizontal binary tree mode BT _ H, the vertical binary tree mode BT _ V, the horizontal ternary tree mode TT _ H, and the vertical ternary tree mode TT _ V, which are referred to by numerals from 0 to 5, respectively, in the present invention). More specifically, the coding tree units are first partitioned by the QT structure. Then, the coding unit in each QT leaf node is further partitioned by a QT or MTT structure.
Since the VVC expands the maximum coding tree unit allowable size of HEVC and introduces quadtree partitioning, in order to better encode an inter-frame image, the VVC generally adopts an Advanced Motion Vector Prediction (AMVP) mode that requires a complex encoding process and motion estimation and requires a high number of bits for encoding parameters. This results in VVC using AMVP mode for inter prediction requiring a large amount of computation for the optimal coding mode, which in turn results in a coding efficiency that is not as high as desired. Meanwhile, considering that the block division of the coding unit is similar to a tree-shaped split structure, as shown in fig. 1, the invention provides an interframe image mode decision method based on a convolutional neural network, which comprises the following steps:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of convolution output of each layer through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: and judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1.
The multilayer tree-shaped CNN appearing in the step is designed for a QT + MTT structure in the VVC coding standard. As shown in fig. 2, the network is mainly composed of one convolutional layer and four residual layers (ResBlock, including a BN layer, a ReLU layer, and a Conv layer connected in sequence) of different sizes, and is divided into three hierarchical layers. Among them, the convolutional layers (Conv 3, 32), the transition residual layers (ResBlock, 32), the head residual layers (ResBlock, 64), the intermediate residual layers (ResBlock, 128), and the tail residual layers (ResBlock, 256) are connected in this order. Firstly, obtaining a motion vector prediction result with lower prediction precision obtained after a merging mode is executed on a corresponding inter-frame image under the current coding depth, wherein the motion vector prediction result comprises a coding image and a residual image of the next coding depth. Then, the coded image and the residual image are connected (both need to be used, but are guaranteed to be independent) as input information of the multi-layer tree-shaped CNN.
In the multilayer tree CNN, extraction of pixel-level bottom layer features is performed by a convolution layer having a convolution kernel size of 3 × 3. And then acquiring a first residual block based on the bottom layer characteristics through the transition residual layer. Then, by the head end residual layer, the middle residual layer and the tail end residual layer, according to the residual block (namely, the first residual block, the second residual block and the third residual block) output by the previous residual layer, the image residual information in the residual block is convoluted with the bottom layer characteristics, and the convolution output of the corresponding residual block and the hierarchy (namely, the first convolution output of the head end residual layer, the second convolution output of the middle residual layer and the third convolution output of the tail end residual layer) is output. Finally, the convolution outputs of the three levels of residual layers are fully connected through a full connection layer (FC), and the current coding depth of the target inter-frame image and the partition mode of the coding block under partition are output according to the convolution outputs.
Since the primary merging mode + the multi-layer tree CNN can only make a decision on motion vector selection and partition mode decision at one coding depth, the above operations need to be repeated to make motion vector selection and partition mode decision at each coding depth under the condition that the maximum coding depth is not reached. And when the maximum coding depth is reached, the target inter-frame image can be selected and coded according to the partition mode and the motion vector of each coding block under each coding depth.
It should be noted that, because the existence of the non-partition mode, the partition representing the current coded depth is partitioned to achieve the optimal partition effect, and partition partitioning is not required, in the operation process of the multi-layer tree CNN, the step of S4 is followed by the steps of:
s41: judging whether the partition mode of the coding block under the current coding depth and partition is a non-partition mode, if so, entering the step S42, and if not, entering the step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
In order to make the parameter quantity of the multi-layer tree-shaped CNN involved in calculation in the block division decision process consistent with that in the actual operation process and improve the training performance, an information vector connection layer (info. vector) is respectively connected between the head-end residual layer, the middle residual layer, the tail-end residual layer and the full connection layer and is used for performing information vector connection on the convolution output, the corresponding image number information and the quantization parameter of the coding block (if the information is unavailable, the information is set to zero). Correspondingly, after the step of S3, the method further includes the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding block under the current coding depth and partition division.
Of course, it is obvious that the multi-layer tree CNN only based on the above structural and functional description cannot be applied to the actual inter-frame image coding process, and a training process is necessarily required before the actual operation. Therefore, before the multi-layer tree CNN is put into use, that is, before the step S1, the method further includes the steps of:
s0: and training a multilayer tree CNN based on partition division mode selection results under each coding depth acquired by the advanced motion vector mode and corresponding inter-frame images.
As the partition division mode selection result obtained by the method of the invention is not obtained in the initial training stage, the training data acquisition is carried out by depending on the advanced motion vector mode in the initial stage. And when the multilayer tree CNN training is finished and the multilayer tree CNN is put into operation for a period of time, the multilayer tree CNN can be updated by adopting the partition division mode selection result obtained by the method.
In the initial stage of designing the multilayer tree-shaped CNN, the cross loss function adopted by training is as follows:
Figure 396205DEST_PATH_IMAGE005
where s is the pixel value of the input information,
Figure 267209DEST_PATH_IMAGE006
and
Figure 467246DEST_PATH_IMAGE007
when the input samples s are respectively expressed, the category
Figure 524193DEST_PATH_IMAGE008
C is the total number of classes, and the true probability and the predicted probability of the block splitting pattern of (a). Further, due to
Figure 950626DEST_PATH_IMAGE009
Can be expressed as:
Figure 484376DEST_PATH_IMAGE010
in the formula, j is the serial number of the category,
Figure 53766DEST_PATH_IMAGE011
the number of the pairs is a natural logarithm,
Figure 280348DEST_PATH_IMAGE012
is a category
Figure 194078DEST_PATH_IMAGE008
The pixel value of (2).
Therefore, in order to make the multi-layer tree-shaped CNN more suitable for the block division characteristics of the coding units under the VVC coding standard, the invention trains the multi-layer tree-shaped CNN by using a cross entropy loss function of weighted classification, specifically, the formula can be expressed as:
Figure 249627DEST_PATH_IMAGE013
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,
Figure 955415DEST_PATH_IMAGE014
is a constant that is initially 1 and,
Figure 838051DEST_PATH_IMAGE015
is a first
Figure 710848DEST_PATH_IMAGE014
The weight of the layer residual layer(s),
Figure 930608DEST_PATH_IMAGE016
for multi-layer tree CNN at
Figure 536908DEST_PATH_IMAGE014
Cross entropy loss when layers are residual. In addition to the above-mentioned descriptions,
Figure 709937DEST_PATH_IMAGE015
the method needs to be iterated through training after running for a period of time along with the application of the multilayer tree-shaped CNN. As can be seen from the formula, when training the residual layers of different levels of the multi-level tree CNN,in the early stage of training, more weight can be given to the loss of the head-end residual layer, and with the progress of learning, the residual layers (middle residual layer and tail-end residual layer) at lower levels can also obtain more weight, so that the problem under complex conditions can be solved more effectively by the multi-layer tree-shaped CNN under the cross entropy loss function training.
In summary, the interframe image mode decision method based on the convolutional neural network combines the Merge mode (Merge) in the intra-prediction mode with the Convolutional Neural Network (CNN), and utilizes the advantages of the low coding bit rate requirement of the Merge mode and the feature learning of the convolutional neural network, so as to reduce the time required by coding while maintaining the rate-distortion performance.
The image number information, the current coding depth and the quantization parameters of the coding blocks under the partition division are added into the multilayer tree-shaped CNN, so that the multilayer tree-shaped CNN can better accord with the parameter characteristics in the actual coding process, and the decision accuracy is further improved.
Aiming at the partition problem of the coding block, considering that the partition of the block is similar to a tree-shaped split structure, the hierarchical split tree characteristics when the coding block is partitioned are learned through multilayer tree-shaped CNN by setting a weight different from iteration. Meanwhile, in the training process of the multilayer tree-shaped CNN, higher weights are set for corresponding levels in different training stages, so that the trained multilayer tree-shaped CNN can solve the complex problem more effectively.
Example two
In order to better verify the technical effect of the method of the present invention, the present embodiment is described by a set of specific experimental data. Specifically, the performance of the algorithm is verified by comparing the rate distortion and the computational complexity of the algorithm and a VVC reference model (VTM) encoder, and a standard VVC video sequence is adopted in experimental tests. In training a multi-level tree CNN, an Adam optimizer is used, with an initial learning rate of 0.0008. To evaluate the performance of the proposed algorithm, the BDBR (Bj Brontegaard Delta Bit Rate) is used to evaluate the overall Rate-distortion characteristics of the proposed algorithm, with reduced coding computation complexity measured using the average saving coding time (Δ T).
Figure 801521DEST_PATH_IMAGE017
Wherein, the first and the second end of the pipe are connected with each other,
Figure 504598DEST_PATH_IMAGE018
and
Figure 840770DEST_PATH_IMAGE019
the coding time of reference software and the coding time of the algorithm proposed by the patent are respectively under different quantization parameter QP values. The experimental results are shown in table 1, and it can be seen that the method of the present invention can reduce the encoding time by 34%, while the encoding efficiency is only lost by 1.1%, thus confirming the effectiveness of the present invention.
Table 1: list of experimental results
Figure 393106DEST_PATH_IMAGE020
It should be noted that all directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
Moreover, descriptions of the present invention as relating to "first," "second," "a," etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit ly indicating a number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be interconnected within two elements or in a relationship where two elements interact with each other unless otherwise specifically limited. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

Claims (6)

1. An interframe image mode decision method based on a convolutional neural network is characterized by comprising the following steps:
s1: acquiring a coded image and a residual image of a next coding depth after a target inter-frame image executes a merging mode;
s2: extracting bottom layer characteristics in the input information through a convolutional layer in the multilayer tree-shaped CNN by taking the connected coded image and the residual image as the input information;
s3: performing layer-by-layer convolution based on bottom layer characteristics through residual layers of preset layer levels in the multilayer tree-shaped CNN, and acquiring convolution output of each layer;
s4: performing full connection of each layer of convolution output through a full connection layer in the multilayer tree-shaped CNN and obtaining the current coding depth of a target inter-frame image and a partition mode of a coding block under partition;
s5: judging whether the current coding depth reaches the maximum depth, if so, coding the target inter-frame image according to the partition mode of each coding block under each coding depth, and if not, entering the next coding depth and returning to the step of S1;
the partition division mode comprises a non-division mode, a quadtree mode, a horizontal binary tree mode, a vertical binary tree mode, a horizontal ternary tree mode and a vertical ternary tree mode;
the step of S4 is followed by the step of:
s41: judging whether the partition mode of the coding block under the current coding depth and partition is a non-partition mode, if so, entering the step S42, and if not, entering the step S5;
s42: and stopping the partition mode decision of the subsequent coding depth of the coding block, and after the partition mode decision of all the coding blocks, coding the target inter-frame image according to the partition mode of each coding block under each coding depth.
2. The convolutional neural network-based interframe image mode decision method of claim 1, wherein said step of S1 is preceded by the steps of:
s0: and training a multilayer tree CNN based on partition division mode selection results under each coding depth acquired by the advanced motion vector mode and corresponding inter-frame images.
3. The convolutional neural network-based interframe image mode decision method as claimed in claim 2, wherein in the step S0, the multi-layer tree-shaped CNN is trained based on a weighted classification cross entropy loss function, and the weighted classification cross entropy loss function can be expressed as the following formula:
Figure DEST_PATH_IMAGE001
where loss is the weighted classification loss, L is the total number of residual layers in the multi-layer tree CNN,
Figure DEST_PATH_IMAGE002
is a constant that is initially 1 and,
Figure DEST_PATH_IMAGE003
is as follows
Figure 118893DEST_PATH_IMAGE002
The weight of the layer residual layer(s),
Figure DEST_PATH_IMAGE004
for multi-layer tree CNN at
Figure 347618DEST_PATH_IMAGE002
Cross entropy loss when layers are residual.
4. The convolutional neural network based inter-frame image mode decision method as claimed in claim 1, wherein after said step of S3, further comprising the steps of:
s31: and performing information vector connection on the convolution output, the image number information and the quantization parameters of the coding block under the current coding depth and partition division.
5. The convolutional neural network-based interframe image mode decision method as claimed in claim 1, wherein said multilayer tree CNN comprises:
the convolution layer comprises a convolution kernel of 3 multiplied by 3 and is used for extracting bottom layer characteristics in the input information;
the transition residual error layer is used for outputting a first residual error block according to the bottom layer characteristics;
a head-end residual layer for outputting a first convolution output and a second residual block by convolution between the bottom layer features and the first residual block;
a middle residual layer for outputting a second convolution output and a third residual block through convolution between the bottom layer features and the second residual block;
a final residual layer for outputting a third convolution output by convolution between the bottom layer features and a third residual block;
the full connection layer is used for fully connecting the first convolution output, the second convolution output and the third convolution output and outputting a partition division mode decision;
the convolutional layer, the transition residual layer, the head residual layer, the middle residual layer and the tail residual layer are sequentially connected.
6. The convolutional neural network based inter-frame image mode decision method as claimed in claim 5, wherein an information vector connection layer is connected between the head residual layer, the middle residual layer, the tail residual layer and the full connection layer, respectively, and the information vector connection layer is used for performing information vector connection between the convolutional output and the corresponding image number information and the quantization parameter of the coding block.
CN202210407485.0A 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network Active CN114513660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210407485.0A CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210407485.0A CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN114513660A CN114513660A (en) 2022-05-17
CN114513660B true CN114513660B (en) 2022-09-06

Family

ID=81555492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210407485.0A Active CN114513660B (en) 2022-04-19 2022-04-19 Interframe image mode decision method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114513660B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695803B (en) * 2023-01-03 2023-05-12 宁波康达凯能医疗科技有限公司 Inter-frame image coding method based on extreme learning machine

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609601B (en) * 2017-09-28 2021-01-22 北京计算机技术及应用研究所 Ship target identification method based on multilayer convolutional neural network
CN111742553A (en) * 2017-12-14 2020-10-02 交互数字Vc控股公司 Deep learning based image partitioning for video compression
CN108924558B (en) * 2018-06-22 2021-10-22 电子科技大学 Video predictive coding method based on neural network
CN113767400A (en) * 2019-03-21 2021-12-07 谷歌有限责任公司 Using rate distortion cost as a loss function for deep learning
CN110087087B (en) * 2019-04-09 2023-05-12 同济大学 VVC inter-frame coding unit prediction mode early decision and block division early termination method
US11936864B2 (en) * 2019-11-07 2024-03-19 Bitmovin, Inc. Fast multi-rate encoding for adaptive streaming using machine learning
CN115606179A (en) * 2020-05-15 2023-01-13 华为技术有限公司(Cn) CNN filter for learning-based downsampling for image and video coding using learned downsampling features
US11477464B2 (en) * 2020-09-16 2022-10-18 Qualcomm Incorporated End-to-end neural network based video coding
CN112261414B (en) * 2020-09-27 2021-06-29 电子科技大学 Video coding convolution filtering method divided by attention mechanism fusion unit
CN112702599B (en) * 2020-12-24 2022-05-20 重庆理工大学 VVC intra-frame rapid coding method based on deep learning
CN112887712B (en) * 2021-02-03 2021-11-19 重庆邮电大学 HEVC intra-frame CTU partitioning method based on convolutional neural network
CN114286093A (en) * 2021-12-24 2022-04-05 杭州电子科技大学 Rapid video coding method based on deep neural network

Also Published As

Publication number Publication date
CN114513660A (en) 2022-05-17

Similar Documents

Publication Publication Date Title
US11350124B2 (en) Image processing method and image processing device
JP4662636B2 (en) Improvement of motion estimation and block matching pattern
WO2016141609A1 (en) Image prediction method and related device
CN101816183A (en) Method and apparatus for inter prediction encoding/decoding an image using sub-pixel motion estimation
WO2017201678A1 (en) Image prediction method and related device
CN114513660B (en) Interframe image mode decision method based on convolutional neural network
CN113965753B (en) Inter-frame image motion estimation method and system based on code rate control
CN107071421B (en) A kind of method for video coding of combination video stabilization
WO2020181428A1 (en) Prediction method, encoder, decoder, and computer storage medium
CN114900691B (en) Encoding method, encoder, and computer-readable storage medium
CN113810715B (en) Video compression reference image generation method based on cavity convolutional neural network
CN112261413B (en) Video encoding method, encoding device, electronic device, and storage medium
JPH01179584A (en) Method for searching motion compensating dynamic vector
WO2019150411A1 (en) Video encoding device, video encoding method, video decoding device, and video decoding method, and video encoding system
CN110581993A (en) Coding unit rapid partitioning method based on intra-frame coding in multipurpose coding
CN115623214A (en) Interframe image coding method based on ensemble learning
CN114466199A (en) Reference frame generation method and system applicable to VVC (variable valve timing) coding standard
CN105828084B (en) HEVC (high efficiency video coding) inter-frame coding processing method and device
CN111586415B (en) Video coding method, video coding device, video coder and storage device
CN110392264B (en) Alignment extrapolation frame method based on neural network
CN101268623A (en) Variable shape motion estimation in video sequence
CN117676171B (en) Three-tree division processing method, equipment and storage medium for coding unit
CN109587496B (en) Skip block distinguishing method, encoder, electronic device and readable storage medium
WO2024000768A1 (en) Video encoding method and apparatus, video decoding method and apparatus, and code stream, decoder, encoder and storage medium
JP4438949B2 (en) Motion compensated predictive coding apparatus, motion compensated predictive coding method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant