WO2021196582A1 - 一种视频压缩方法、视频解压方法、智能终端及存储介质 - Google Patents

一种视频压缩方法、视频解压方法、智能终端及存储介质 Download PDF

Info

Publication number
WO2021196582A1
WO2021196582A1 PCT/CN2020/125529 CN2020125529W WO2021196582A1 WO 2021196582 A1 WO2021196582 A1 WO 2021196582A1 CN 2020125529 W CN2020125529 W CN 2020125529W WO 2021196582 A1 WO2021196582 A1 WO 2021196582A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
backward
motion compensation
video compression
Prior art date
Application number
PCT/CN2020/125529
Other languages
English (en)
French (fr)
Inventor
樊顺利
Original Assignee
武汉Tcl集团工业研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉Tcl集团工业研究院有限公司 filed Critical 武汉Tcl集团工业研究院有限公司
Publication of WO2021196582A1 publication Critical patent/WO2021196582A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/395Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability involving distributed video coding [DVC], e.g. Wyner-Ziv video coding or Slepian-Wolf video coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • the present disclosure relates to the technical field of computer data processing, and in particular to a video compression method, a video decompression method, an intelligent terminal, and a storage medium.
  • B-frames are also called bidirectional predictive frames. Forward and backward frames are required for encoding and decoding. They are the part with the largest compression rate in video compression and can effectively reduce the video encoding rate.
  • When compressing a frame into a B frame it compresses the frame according to the difference between the adjacent previous frame, the current frame, and the next frame, that is, only the difference between the current frame and the previous and next frames is recorded. Only by using video compression can a high compression of 200:1 be achieved.
  • I frames have the lowest compression efficiency
  • P frames have the highest compression efficiency
  • B frames have the highest compression efficiency.
  • the current B-frame encoding and decoding mainly refers to the B-frame encoding and decoding in the traditional video encoding and decoding.
  • the traditional B-frame encoding and decoding requires a lot of detailed artificial algorithm design, which leads to the poor encoding effect of the B-frame encoding and decoding.
  • the B-frame encoding and decoding process complex.
  • the main purpose of the present disclosure is to provide a video compression method, a video decompression method, an intelligent terminal, and a storage medium, aiming to solve the problems of poor B-frame coding and decoding effects and complicated B-frame coding and decoding processes in the prior art.
  • the present disclosure provides a video compression method, the video compression method includes the following steps:
  • the reconstructed B frame is encoded according to the encoded forward frame and the backward frame.
  • the forward frame is an I frame or a P frame
  • the backward frame is an I frame or a P frame.
  • the video compression method wherein the obtaining the forward frame and the backward frame of the B frame original frame according to the picture group further includes:
  • the forward optical flow and the reverse optical flow of the forward frame are calculated through a spatial pyramid network, and the forward optical flow of the backward frame is calculated through the spatial pyramid network.
  • Flow and reverse optical flow and later also include:
  • a spatial movement operation is performed on the forward frame and the backward frame to obtain the forward frame and the backward frame after the spatial movement operation, respectively.
  • the encoding the forward frame and the backward frame specifically includes:
  • the performing motion compensation on the original B-frames through a motion compensation network specifically includes:
  • the B-frame original frame is subjected to motion compensation through a motion compensation network, and a motion compensation picture is output.
  • the step of performing motion compensation on the original B-frames through a motion compensation network further includes:
  • the residual between the motion-compensated video frame and the original frame is calculated according to the motion compensation result.
  • the reconstructing B frames specifically includes:
  • the residual is obtained, and the reconstructed B frame is calculated according to the residual and the motion compensation result.
  • a picture compression algorithm is used to encode the I frame
  • a distributed video coding algorithm is used to encode the P frame.
  • the present disclosure provides a video decompression method
  • the video decompression method includes:
  • the picture group including the encoded B frame and the forward and backward frames of the B frame;
  • the B frame is decoded according to the decoded forward frame and the backward frame; wherein the B frame is an encoded B frame obtained based on the video compression method.
  • the forward frame is an I frame or a P frame
  • the backward frame is an I frame or a P frame.
  • a picture compression algorithm is used to decode the I frame
  • a distributed video coding algorithm is used to decode the P frame.
  • the present disclosure also provides an intelligent terminal, wherein the intelligent terminal includes: a memory, a processor, and a video compression program or video stored in the memory and running on the processor.
  • a decompression program that implements the steps of the video compression method described above when the video compression program is executed by the processor or implements the steps of the video decompression method described above when the video decompression program is executed by the processor.
  • the present disclosure also provides a storage medium, wherein the storage medium stores a video compression program or a video decompression program, and the video compression program is executed by a processor to implement the video compression method as described above.
  • the video decompression program is executed by the processor, the steps of the video decompression method described above are implemented.
  • the video compression method includes: obtaining a group of pictures of the video, obtaining forward and backward frames of the B-frame original frame according to the group of pictures; obtaining the B-frame original frame, and The original frame of B frame is subjected to motion compensation through the motion compensation network to reconstruct the B frame; the forward frame and the backward frame are encoded; the reconstruction is performed according to the encoded forward frame and the backward frame The subsequent B frames are encoded.
  • the video decompression method includes: obtaining an encoded and compressed picture group in the video, the picture group including the encoded B frame and the forward and backward frames of the B frame; And the backward frame is decoded; and the B frame is decoded according to the decoded forward frame and the backward frame.
  • the present disclosure encodes and decodes B-frames based on deep learning, which can improve the effect of B-frame encoding and decoding, and simplify the process of B-frame encoding and decoding.
  • Figure 1 is a flowchart of a preferred embodiment of the video compression method of the present disclosure
  • FIG. 2 is a flowchart of a preferred embodiment of the video decompression method of the present disclosure
  • FIG. 3 is a schematic diagram of a GOP in the video codec in the preferred embodiment of the video compression method of the present disclosure
  • FIG. 4 is a schematic diagram of the original B frame in the preferred embodiment of the video compression method of the present disclosure
  • FIG. 5 is a schematic diagram of the B-frame motion compensation result in the preferred embodiment of the video compression method of the present disclosure
  • FIG. 6 is a schematic diagram of a B-frame original image and a residual error result of motion compensation in a preferred embodiment of the video compression method of the present disclosure
  • FIG. 7 is a schematic diagram of B-frame reconstruction results in a preferred embodiment of the video compression method of the present disclosure.
  • FIG. 8 is a schematic diagram of a B-frame encoding and decoding process in a preferred embodiment of the video compression method of the present disclosure
  • FIG. 9 is a schematic diagram of the principle of the motion compensation network structure in the preferred embodiment of the video compression method of the present disclosure.
  • FIG. 10 is a schematic diagram of the operating environment of the preferred embodiment of the smart terminal of the present disclosure.
  • the video compression method includes the following steps:
  • Step S11 Obtain a picture group of the video, and obtain the forward frame and the backward frame of the original frame of the B frame according to the picture group.
  • a GOP group of pictures, group of pictures
  • a GOP is a group of continuous pictures
  • MPEG Moving Picture Experts Group (Moving Picture Experts Group) coding divides pictures (ie frames) into three types: I, P, and B.
  • I is an intra-coded frame
  • P is a forward predicted frame
  • B is a two-way interpolation frame.
  • the abscissa represents the frame number
  • the ordinate represents the code size
  • the abscissa represents the first frame-the 13th frame in turn.
  • I1 represents the first I frame (main frame) in each GOP
  • B1 represents each The first B frame in the GOP
  • P1 represents the first P frame in each GOP
  • the In represents the nth I frame in the GOP
  • Bn represents the nth B in each GOP Frame
  • Pn represents the nth P frame in each GOP, where n is a natural number.
  • the forward frame can be I frame or P frame
  • the backward frame I frame or P frame, that is to say the backward frame
  • the forward frame can be an I frame or a P frame
  • the forward frame and the backward frame are marked as with
  • Step S12 Obtain the B-frame original frame, perform motion compensation on the B-frame original frame through a motion compensation network, and reconstruct the B-frame.
  • spyNet (Spatial Pyramid Network) is a model that calculates optical flow by combining the classic spatial pyramid method with deep learning. Unlike the pure deep learning method FlowNet, which calculates optical flow, spyNet does not need to deal with larger ones. motions, these are all handled by pyramids, so spyNet has the following three advantages:
  • spyNet is better than FlowNet in accuracy and speed on standard data sets, proving that it is a good trend to combine the classic optical flow method with deep learning.
  • optical flow has size and direction. If the optical flow from frame 1 to frame 2 is defined as forward optical flow, then the optical flow from frame 2 to frame 1 can be regarded as the reverse optical flow) ;
  • the forward optical flow and reverse optical flow are calculated as follows:
  • the two-way optical flow of the B frame (such as B1) on the left of the time axis are respectively
  • the two-way optical flow of the B frame (such as B2) on the right of the time axis is In this way, there is no need to encode and decode the optical stream when predicting the B frame, which can effectively save the transmission code stream.
  • the B-frame original frame is subjected to motion compensation through a motion compensation network (as shown in Figure 9); among them, Conv (3, 64, 1) in Figure 9 represents the convolution kernel (when the convolution kernel is image processing, give Given the input image, the weighted average of pixels in a small area in the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called the convolution kernel) the size is 3, the output channel is 64, The step size is 1, Conv Relu Leakly Relu are all operations in deep learning, skip means skip connection; the original B-frame frame is subjected to motion compensation through the motion compensation network, and a motion compensation picture is output, specifically, the motion compensation network The number of input channels is 16. After the two branches are processed, the features are channel-connected to output a three-channel motion compensation picture. Refer to Figure 5 for an example of motion compensation.
  • the residual refers to the difference between the actual observation value and the estimated value (fitting value).
  • the residual is calculated as follows:
  • residual coding feature referred to as r 't
  • residual network codec codec residual characteristics are easy to optimize the network, and can improve accuracy by increasing the depth corresponding to) use containing GDN / IGDN of Encoder-decoder (encoding and decoding) network structure. Coding feature referred to as residual r 't. Refer to Figure 6 for an example of residuals.
  • the quantization operation adds uniform noise to the encoded features during training, and uses nearest neighbor rounding to perform rounding during testing (or inference).
  • the decoded residual is recorded as Refer to Figure 7 for an example of B-frame reconstruction.
  • the B frame is reconstructed as:
  • Entropy coding Convolutional Neural Networks (CNN, Convolutional Neural Networks) is a commonly used deep learning model in the computer vision field during the training phase. Convolutional neural networks have representation learning capabilities and can input inputs according to their hierarchical structure. Information is classified into translation invariant) for distribution estimation. In the inference stage, the entropy estimation model obtained by training is used to calculate the probability, and the calculated probability is used for interval coding.
  • CNN Convolutional Neural Networks
  • the loss function is a function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk” or "loss" of the random event.
  • the loss function is usually used as a learning criterion to be associated with optimization problems, that is, to solve and evaluate the model by minimizing the loss function.
  • the calculation formula of the loss function (the loss function is used to calculate the pixels whose absolute value of the residual between the reconstructed image and the original image is less than a certain threshold) is:
  • loss represents loss
  • is 0.01
  • H(*) represents the number of bits represented by the code
  • th represents the threshold
  • th is 0.008
  • x represents the pixel value
  • the loss is calculated using 0.5x 2 , and the remaining positions are calculated using
  • Step S13 Encoding the forward frame and the backward frame.
  • encoding and decoding refer to the process of compressing and decompressing video (such as digital video).
  • encoding is the process of converting information from one form or format to another. It is also called the code of a computer programming language and abbreviated as encoding. , Use a predetermined method to encode words, numbers or other objects into numbers, or convert information and data into specified electrical pulse signals.
  • the encoding is widely used in electronic computers, televisions, remote control and communications; that is to say, the encoding is The process of converting information from one form or format to another.
  • Decoding is the reverse process of encoding.
  • the forward frame and the backward frame are either I frame or P frame.
  • the I frame is encoded using a picture compression algorithm
  • the P frame is encoded using a distributed video encoding algorithm (DVC algorithm). coding.
  • DVC algorithm distributed video encoding algorithm
  • the I frame main frame
  • any picture compression algorithm can be used; then the P frame is encoded, and the P frame can be encoded using the DVC algorithm (distributed video encoding algorithm, using independent encoding, joint Decoding, which transfers complex motion estimation from the encoding end to the decoding end, greatly simplifies the complexity of the encoder).
  • DVC algorithm distributed video encoding algorithm, using independent encoding, joint Decoding, which transfers complex motion estimation from the encoding end to the decoding end, greatly simplifies the complexity of the encoder.
  • Step S14 encoding the reconstructed B frame according to the encoded forward frame and the backward frame.
  • the reconstructed B frame is encoded according to the encoded forward frame and the backward frame, for example, according to the I frame and the backward frame.
  • P frames encode B frames. Subsequent frames are coded in this order.
  • the video decompression method includes the following steps:
  • Step S21 Obtain an encoded and compressed picture group in the video, the picture group including the encoded B frame and the forward and backward frames of the B frame;
  • Step S22 Decoding the forward frame and the backward frame
  • Step S23 Decode the B frame according to the decoded forward frame and the backward frame; wherein the B frame is an encoded B frame obtained based on the video compression method.
  • the video decompression method and the video compression method in the present disclosure are a corresponding process, and the video is decoded after encoding, which is the reverse process of encoding.
  • the encoded B-frame original frame f t is obtained , and the result of the B-frame original frame after the motion compensation is performed by the motion compensation network is The reconstructed B frame is Obtain the forward frame of the B frame original frame according to the picture group And backward frame Forward frame And the backward frame Decode, according to the decoded forward frame And the backward frame The B frame is decoded, and the decoding process of the B frame is completed.
  • the forward frame and the backward frame are I-frames or P-frames.
  • a picture compression algorithm is used to decode the I-frame (in the same manner as the encoding method), and a distributed video encoding algorithm ( DVC algorithm) decodes the P frame (in the same way as the encoding).
  • DVC algorithm distributed video encoding algorithm
  • the I frame (main frame) is first decoded, and any picture decompression algorithm can be used; then the P frame is decoded, and the P frame can be decoded using the DVC algorithm (distributed video decoding algorithm, using independent decoding, joint decoding).
  • DVC algorithm distributed video decoding algorithm, using independent decoding, joint decoding
  • the present disclosure performs B-frame encoding and decoding based on deep learning, simplifies the encoding and decoding process, and saves video encoding and decoding code streams.
  • the present disclosure also provides an intelligent terminal correspondingly.
  • the intelligent terminal includes a processor 10, a memory 20 and a display 30.
  • FIG. 10 only shows part of the components of the smart terminal, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead.
  • the memory 20 may be an internal storage unit of the smart terminal, such as a hard disk or a memory of the smart terminal.
  • the memory 20 may also be an external storage device of the smart terminal, for example, a plug-in hard disk equipped on the smart terminal, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc.
  • the memory 20 may also include both an internal storage unit of the smart terminal and an external storage device.
  • the memory 20 is used to store application software and various types of data installed on the smart terminal, such as the program code of the installed smart terminal.
  • the memory 20 can also be used to temporarily store data that has been output or will be output.
  • a video compression program or a video decompression program 40 is stored on the memory 20, and the video compression program or a video decompression program 40 can be executed by the processor 10, so as to realize the video compression method or the video decompression method in this application. .
  • the processor 10 may be a central processing unit (CPU), microprocessor or other data processing chip in some embodiments, and is used to run the program code or process data stored in the memory 20, for example Execute the video compression method or video decompression method, etc.
  • CPU central processing unit
  • microprocessor or other data processing chip in some embodiments, and is used to run the program code or process data stored in the memory 20, for example Execute the video compression method or video decompression method, etc.
  • the display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display 30 is used for displaying information on the smart terminal and for displaying a visualized user interface.
  • the components 10-30 of the smart terminal communicate with each other via a system bus.
  • the reconstructed B frame is encoded according to the encoded forward frame and the backward frame.
  • the forward frame is an I frame or a P frame; the backward frame is an I frame or a P frame.
  • a spatial movement operation is performed on the forward frame and the backward frame to obtain the forward frame and the backward frame after the spatial movement operation, respectively.
  • the encoding the forward frame and the backward frame specifically includes:
  • the performing motion compensation of the B-frame original frame through a motion compensation network specifically includes:
  • the B-frame original frame is subjected to motion compensation through a motion compensation network, and a motion compensation picture is output.
  • the number of channels input by the motion compensation network is 16, and after the two branches are processed, the features are channel-connected, and a three-channel motion compensation picture is output.
  • Said performing motion compensation on the B-frame original frame through a motion compensation network further includes:
  • the residual between the motion-compensated video frame and the original frame is calculated according to the motion compensation result.
  • the reconstructed B frame specifically includes:
  • the residual is obtained, and the reconstructed B frame is calculated according to the residual and the motion compensation result.
  • the residual refers to the difference between the actual observation value and the estimated value.
  • the I frame is coded using a picture compression algorithm
  • the P frame is coded using a distributed video coding algorithm.
  • the picture group including the encoded B frame and the forward and backward frames of the B frame;
  • the B frame is decoded according to the decoded forward frame and the backward frame; wherein the B frame is an encoded B frame obtained based on the video compression method.
  • the forward frame is an I frame or a P frame; the backward frame is an I frame or a P frame.
  • a picture compression algorithm is used to decode the I frame, and a distributed video coding algorithm is used to decode the P frame.
  • the present disclosure also provides a storage medium, wherein the storage medium stores a video compression program or a video decompression program, and when the video compression program or the video decompression program is executed by a processor, the above-mentioned video compression method or video decompression is implemented Method steps.
  • the present disclosure provides a video compression method, a video decompression method, an intelligent terminal, and a storage medium.
  • the video compression method includes: acquiring a picture group of a video, and acquiring B-frame original frames according to the picture group The forward and backward frames of the B frame; the B frame original frame is acquired, the B frame original frame is subjected to motion compensation through the motion compensation network, and the B frame is reconstructed; the forward frame and the backward frame are performed Encoding; encoding the reconstructed B frame according to the encoded forward frame and the backward frame.
  • the video decompression method includes: obtaining an encoded and compressed picture group in the video, the picture group including the encoded B frame and the forward and backward frames of the B frame; And the backward frame is decoded; and the B frame is decoded according to the decoded forward frame and the backward frame.
  • the present disclosure encodes and decodes B-frames based on deep learning, which can improve the effect of B-frame encoding and decoding, and simplify the process of B-frame encoding and decoding.
  • the processes in the methods of the above-mentioned embodiments can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer program.
  • the program may include the processes of the foregoing method embodiments when executed.
  • the storage medium mentioned may be a memory, a magnetic disk, an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开涉及一种视频压缩方法、视频解压方法、智能终端及存储介质。所述视频压缩方法中,包括:获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;对所述前向帧和所述后向帧进行编码;根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。所述视频解压方法中,包括:获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;对所述前向帧和所述后向帧进行解码;根据解码后的所述前向帧和所述后向帧对所述B帧进行解码。本公开可以提高B帧编码和解码的效果。

Description

一种视频压缩方法、视频解压方法、智能终端及存储介质
优先权
本公开要求于申请日为2020年03月31日提交中国专利局、申请号为“202010244040.6”、申请名称为“一种视频压缩方法、视频解压方法、智能终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机数据处理技术领域,尤其涉及一种视频压缩方法、视频解压方法、智能终端及存储介质。
背景技术
B帧又称双向预测帧,编解码时需要前向帧和后向帧,是视频压缩中压缩率最大的部分,可以有效地降低视频的编码码率。当把一帧压缩成B帧时,它根据相邻的前一帧、本帧以及后一帧数据的不同点来压缩本帧,也即仅记录本帧与前后帧的差值。只有采用视频压缩才能达到200:1的高压缩。一般地,I帧压缩效率最低,P帧较高,B帧最高。
目前的B帧编解码,主要是指传统视频编解码中的B帧编解码,传统B帧编解码需要大量精细的人工算法设计,导致B帧编解码的编码效果不好,B帧编解码流程复杂。
因此,现有技术还有待于改进和发展。
公开内容
本公开的主要目的在于提供一种视频压缩方法、视频解压方法、智能终端及存储介质,旨在解决现有技术中B帧编解码的效果不好,B帧编解码流程复杂的问题。
为实现上述目的,本公开提供一种视频压缩方法,所述视频压缩方法包括如下步骤:
获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;
获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;
对所述前向帧和所述后向帧进行编码;
根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。
可选地,所述的视频压缩方法,其中,所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
可选地,所述的视频压缩方法,其中,所述根据所述画面组获取B帧原始帧的前向帧和后向帧,之后还包括:
通过空间金字塔网络计算所述前向帧的正向光流和反向光流;
通过空间金字塔网络计算所述后向帧的正向光流和反向光流。
可选地,所述的视频压缩方法,其中,所述通过空间金字塔网络计算所述前向帧的正向光流和反向光流,通过空间金字塔网络计算所述后向帧的正向光流和反向光流,之后还包括:
当光流计算完成后,对所述前向帧和所述后向帧进行空间移动操作,分别得到空间移动操作后的前向帧和后向帧。
可选地,所述的视频压缩方法,其中,所述对所述前向帧和所述后向帧进行编码,具体包括:
对经所述空间移动操作后的前向帧和后向帧进行编码。
可选地,所述的视频压缩方法,其中,所述将所述B帧原始帧通过运动补偿网络进行运动补偿,具体包括:
将所述B帧原始帧通过运动补偿网络进行运动补偿,输出一个运动补偿图片。
可选地,所述的视频压缩方法,其中,所述将所述B帧原始帧通过运动补偿网络进行运动补偿,之后还包括:
当所述B帧原始帧完成运动补偿后,根据运动补偿结果计算运动补偿的视频帧和原始帧之间的残差。
可选地,所述的视频压缩方法,其中,所述重构B帧,具体包括:
获取所述残差,根据所述残差和所述运动补偿结果计算得到重构的B帧。
可选地,所述的视频压缩方法,其中,采用图片压缩算法对所述I帧进行编 码,采用分布式视频编码算法对所述P帧进行编码。
另外,为实现上述目的,本公开提供一种视频解压方法,所述视频解压方法包括:
获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;
对所述前向帧和所述后向帧进行解码;
根据解码后的所述前向帧和所述后向帧对所述B帧进行解码;其中所述B帧为基于所述的视频压缩方法得到的编码后的B帧。
可选地,所述的视频解压方法,其中,所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
可选地,所述的视频解压方法,其中,采用图片压缩算法对所述I帧进行解码,采用分布式视频编码算法对所述P帧进行解码。
此外,为实现上述目的,本公开还提供一种智能终端,其中,所述智能终端包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的视频压缩程序或者视频解压程序,所述视频压缩程序被所述处理器执行时实现如上所述的视频压缩方法的步骤或者所述视频解压程序被所述处理器执行时实现如上所述的视频解压方法的步骤。
此外,为实现上述目的,本公开还提供一种存储介质,其中,所述存储介质存储有视频压缩程序或者视频解压程序,所述视频压缩程序被处理器执行时实现如上所述的视频压缩方法的步骤或者所述视频解压程序被处理器执行时实现如上所述的视频解压方法的步骤。
本公开中,所述视频压缩方法中,包括:获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;对所述前向帧和所述后向帧进行编码;根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。。所述视频解压方法中,包括:获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;对所述前向帧和所述后向帧进行解码;根据解码后的所述前向帧和所述后向帧对所述B帧进行解码。本公开基于深度学习的B帧进行编码和解码,可以提高B帧编码和解码的效果,并 简化B帧编码和解码的流程。
附图说明
图1是本公开视频压缩方法的较佳实施例的流程图;
图2是本公开视频解压方法的较佳实施例的流程图;
图3是本公开视频压缩方法的较佳实施例中视频编解码中的一个GOP的示意图;
图4是本公开视频压缩方法的较佳实施例中B帧原图的示意图;
图5是本公开视频压缩方法的较佳实施例中B帧运动补偿结果的示意图;
图6是本公开视频压缩方法的较佳实施例中B帧原图与运动补偿的残差结果的示意图;
图7是本公开视频压缩方法的较佳实施例中B帧重构结果的示意图;
图8是本公开视频压缩方法的较佳实施例中B帧编解码流程示意图;
图9是本公开视频压缩方法的较佳实施例中运动补偿网络结构的原理示意图;
图10为本公开智能终端的较佳实施例的运行环境示意图。
具体实施方式
为使本公开的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本公开进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。
实施例一
本公开较佳实施例所述的视频压缩方法,如图1和图8所示,所述视频压缩方法包括以下步骤:
步骤S11、获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧。
具体地,根据预先的设定,获取视频的一个GOP(group of pictures,画面组,一个GOP就是一组连续的画面),如图3所示,表示一个GOP就是一组连续的画面;MPEG(Moving Picture Experts Group,动态图像专家组)编码将画面(即帧)分为I、P、B三种,I是内部编码帧,P是前向预测帧,B是双向内插帧。
图3中,横坐标表示帧号,纵坐标表示编码大小,横坐标依次表示第1帧-第13帧,例如I1表示每个GOP里的第1个I帧(主帧),B1表示每个GOP里的第1个B帧,P1表示每个GOP里的第1个P帧,以此类推,所述In表示GOP里的第n个I帧,Bn表示每个GOP里的第n个B帧,Pn表示每个GOP里的第n个P帧,其中,n为自然数。
根据所述画面组获取B帧原始帧的前向帧(I帧或者P帧,也就是说前向帧可以为I帧或者P帧)和后向帧(I帧或者P帧,也就是说后向帧可以为I帧或者P帧),将所述前向帧和所述后向帧分别标记为
Figure PCTCN2020125529-appb-000001
Figure PCTCN2020125529-appb-000002
步骤S12、获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧。
具体地,获取当前需要进行编码的B帧原始帧(如图4所示),并标记为f t,运动补偿结果(即所述B帧原始帧经过运动补偿网络进行运动补偿后的结果)标记为
Figure PCTCN2020125529-appb-000003
(如图5所示),重构的B帧标记为
Figure PCTCN2020125529-appb-000004
(如图7所示)。
本公开中,spyNet(空间金字塔网络)是通过将经典的空间金字塔方法与深度学习结合来计算光流的模型,与纯深度学习方法FlowNet计算光流方法不同的是,spyNet不需要处理较大的motions(运动),这些都是金字塔来处理,这样spyNet有如下三个优势:
(1)在模型参数方面,spyNet更小更简单,仅为FlowNet的4%,更有利于嵌入式开发;
(2)因为金字塔每层要处理的motions很小,对一组变形后的图片使用卷积的方法是有效的;
(3)与FlowNet不同的是,spyNet网络学习到的过滤器与经典的时空过滤器非常相似,这有助于模型的优化。
总之,spyNet在标准的数据集上的准确率和速度都优于FlowNet,证明将经典的光流方法和深度学习结合是个很好的趋势。
进一步地,通过空间金字塔网络spyNet计算
Figure PCTCN2020125529-appb-000005
Figure PCTCN2020125529-appb-000006
的正向光流和反向光流(光流是有大小和方向的,如果定义帧1到帧2的光流为正向光流,那么帧2到帧1的可以认为反向光流);正向光流和反向光流计算如下:
Figure PCTCN2020125529-appb-000007
Figure PCTCN2020125529-appb-000008
其中,
Figure PCTCN2020125529-appb-000009
为正向光流,
Figure PCTCN2020125529-appb-000010
为反向光流。
设定运动在极短的时间内是均匀的,则时间轴靠左的B帧(比如B1)的双向光流分别为
Figure PCTCN2020125529-appb-000011
时间轴靠右的B帧(比如B2)的双向光流分别为
Figure PCTCN2020125529-appb-000012
Figure PCTCN2020125529-appb-000013
这样预测B帧时不需要对光流进行编解码,可以有效地节省传输码流。
当光流计算完成后,还需要对
Figure PCTCN2020125529-appb-000014
Figure PCTCN2020125529-appb-000015
进行空间移动的warp操作(根据光流对像素进行空间上的移动),分别得到
Figure PCTCN2020125529-appb-000016
(即第一操作结果)和
Figure PCTCN2020125529-appb-000017
(即第二操作结果);其中,w表示帧进行warp操作后的结果,t表示当前帧,t-1表示前一帧,t+1表示后一帧。
时间轴靠左和靠右的B帧的warp操作结果为、
Figure PCTCN2020125529-appb-000018
Figure PCTCN2020125529-appb-000019
将所述B帧原始帧通过运动补偿网络(如图9所示)进行运动补偿;其中,图9中的Conv(3,64,1)表示卷积核(卷积核就是图像处理时,给定输入图像,输入图像中一个小区域中像素加权平均后成为输出图像中的每个对应像素,其中权值由一个函数定义,这个函数称为卷积核)大小为3,输出通道为64,步长为1,Conv Relu LeaklyRelu都是深度学习中的操作,skip表示跳跃连接;将所述B帧原始帧通过运动补偿网络进行运动补偿,输出一个运动补偿图片,具体地,所述运动补偿网络输入的通道数为16,两条分支处理后,将特征进行通道连接,输出一个三通道的运动补偿图片。一个运动补偿实例可参考图5。
当运动补偿完成后,计算运动补偿的视频帧和原始帧之间的残差,残差在数理统计中是指实际观察值与估计值(拟合值)之间的差,残差计算如下:
Figure PCTCN2020125529-appb-000020
其中,
Figure PCTCN2020125529-appb-000021
表示残差;残差的编码特征记为r′ t;残差编解码网络(残差编解码网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率)采用含有GDN/IGDN的Encoder-decoder(编码解码)网络结构。残差的编码特征记为r′ t。 一个残差实例可参考图6。
量化操作在训练时对编码后的特征加上均匀噪声,在测试(或推理)时,采用最近邻取整操作(rounding)进行取整。
解码后的残差记为
Figure PCTCN2020125529-appb-000022
一个B帧重构实例可参考图7。
B帧重构为:
Figure PCTCN2020125529-appb-000023
熵编码:在训练阶段通过卷积神经网络(CNN,Convolutional Neural Networks,是计算机视觉领域常用的一种深度学***移不变分类)进行分布估计,在推理阶段,用训练得到的熵估计模型计算概率,并利用计算得到的概率进行区间编码。
损失函数(loss function)是将随机事件或其有关随机变量的取值映射为非负实数以表示该随机事件的“风险”或“损失”的函数。在应用中,损失函数通常作为学习准则与优化问题相联系,即通过最小化损失函数求解和评估模型。
损失函数(所述损失函数用来计算重构图与原图残差绝对值小于某一个阈值的像素点)计算公式为:
Figure PCTCN2020125529-appb-000024
其中,loss表示损失,α为0.01,H(*)代表编码表示的比特数;
D(*)表示如下:
Figure PCTCN2020125529-appb-000025
其中,th表示阈值,th取0.008,x表示像素值;
在重构图与原图残差绝对值小于th的像素点,损失采用0.5x 2计算,其余的位置采用|x|-0.5计算损失。
步骤S13、对所述前向帧和所述后向帧进行编码。
其中,编码和解码指的是对视频(例如数字视频)进行压缩和解压缩的过程,例如编码是信息从一种形式或格式转换为另一种形式的过程也称为计算机编程 语言的代码简称编码,用预先规定的方法将文字、数字或其它对象编成数码,或将信息、数据转换成规定的电脉冲信号,编码在电子计算机、电视、遥控和通讯等方面广泛使用;也就是说编码是信息从一种形式或格式转换为另一种形式的过程。解码是编码的逆过程。
所述前向帧和所述后向帧为I帧或者P帧,本公开中,采用图片压缩算法对所述I帧进行编码,采用分布式视频编码算法(DVC算法)对所述P帧进行编码。
具体地,先对I帧(主帧)进行编码,可以采用任何一种图片压缩算法;然后对P帧进行编码,P帧的编码可以采用DVC算法(分布式视频编码算法,采用独立编码、联合解码,将复杂的运动估计从编码端转移到解码端,很大程度上简化了编码器的复杂度)。
步骤S14、根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。
具体地,对所述前向帧和所述后向帧进行编码后,根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码,例如,根据I帧和P帧对B帧进行编码。后续的帧依照此顺序进行编码。
实施例二
另外,本公开较佳实施例所述的视频解压方法,如图2所示,所述视频解压方法包括以下步骤:
步骤S21、获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;
步骤S22、对所述前向帧和所述后向帧进行解码;
步骤S23、根据解码后的所述前向帧和所述后向帧对所述B帧进行解码;其中所述B帧为基于所述的视频压缩方法得到的编码后的B帧。
本公开中的视频解压方法和视频压缩方法是一个对应的过程,视频编码后再进行解码,解码是编码的逆过程。
具体地如图8所示,获取编码后的B帧原始帧f t,所述B帧原始帧经过运动补偿网络进行运动补偿后的结果为
Figure PCTCN2020125529-appb-000026
重构的B帧为
Figure PCTCN2020125529-appb-000027
根据所述画面组获取B帧原始帧的前向帧
Figure PCTCN2020125529-appb-000028
和后向帧
Figure PCTCN2020125529-appb-000029
对所述前向帧
Figure PCTCN2020125529-appb-000030
和所述后向帧
Figure PCTCN2020125529-appb-000031
进行解码,根据解码后的所述前向帧
Figure PCTCN2020125529-appb-000032
和所述后向帧
Figure PCTCN2020125529-appb-000033
对所述B帧进行解码,完 成B帧的解码过程。
其中,所述前向帧和所述后向帧为I帧或者P帧,本公开中,采用图片压缩算法对所述I帧进行解码(与编码的方式相同),采用分布式视频编码算法(DVC算法)对所述P帧进行解码(与编码的方式相同)。
具体地,先对I帧(主帧)进行解码,可以采用任何一种图片解压算法;然后对P帧进行解码,P帧的解码可以采用DVC算法(分布式视频解码算法,采用独立解码、联合解码)。
本公开基于深度学习进行B帧编码和解码,简化了编码和解码流程,节省视频编码和解码码流。
实施例三
进一步地,如图10所示,基于上述视频压缩方法,本公开还相应提供了一种智能终端,所述智能终端包括处理器10、存储器20及显示器30。图10仅示出了智能终端的部分组件,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
所述存储器20在一些实施例中可以是所述智能终端的内部存储单元,例如智能终端的硬盘或内存。所述存储器20在另一些实施例中也可以是所述智能终端的外部存储设备,例如所述智能终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器20还可以既包括所述智能终端的内部存储单元也包括外部存储设备。所述存储器20用于存储安装于所述智能终端的应用软件及各类数据,例如所述安装智能终端的程序代码等。所述存储器20还可以用于暂时地存储已经输出或者将要输出的数据。在一实施例中,存储器20上存储有视频压缩程序或者视频解压程序40,该视频压缩程序或者视频解压程序40可被处理器10所执行,从而实现本申请中视频压缩方法或者或者视频解压方法。
所述处理器10在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行所述存储器20中存储的程序代码或处理数据,例如执行所述视频压缩方法或者或者视频解压方法等。
所述显示器30在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。 所述显示器30用于显示在所述智能终端的信息以及用于显示可视化的用户界面。所述智能终端的部件10-30通过***总线相互通信。
在一实施例中,当处理器10执行所述存储器20中视频压缩程序40时实现以下步骤:
获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;
获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;
对所述前向帧和所述后向帧进行编码;
根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。
所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
所述根据所述画面组获取B帧原始帧的前向帧和后向帧,之后还包括:
通过空间金字塔网络计算所述前向帧的正向光流和反向光流;
通过空间金字塔网络计算所述后向帧的正向光流和反向光流。
所述通过空间金字塔网络计算所述前向帧的正向光流和反向光流,通过空间金字塔网络计算所述后向帧的正向光流和反向光流,之后还包括:
当光流计算完成后,对所述前向帧和所述后向帧进行空间移动操作,分别得到空间移动操作后的前向帧和后向帧。
所述对所述前向帧和所述后向帧进行编码,具体包括:
对经所述空间移动操作后的前向帧和后向帧进行编码。
所述将所述B帧原始帧通过运动补偿网络进行运动补偿,具体包括:
将所述B帧原始帧通过运动补偿网络进行运动补偿,输出一个运动补偿图片。
所述运动补偿网络输入的通道数为16,两条分支处理后,将特征进行通道连接,输出一个三通道的运动补偿图片。
所述将所述B帧原始帧通过运动补偿网络进行运动补偿,之后还包括:
当所述B帧原始帧完成运动补偿后,根据运动补偿结果计算运动补偿的视频帧和原始帧之间的残差。
所述重构B帧,具体包括:
获取所述残差,根据所述残差和所述运动补偿结果计算得到重构的B帧。
所述残差指实际观察值与估计值之间的差。
采用图片压缩算法对所述I帧进行编码,采用分布式视频编码算法对所述P帧进行编码。
或者在另一实施例中,当处理器10执行所述存储器20中视频解压程序40时实现以下步骤:
获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;
对所述前向帧和所述后向帧进行解码;
根据解码后的所述前向帧和所述后向帧对所述B帧进行解码;其中所述B帧为基于所述的视频压缩方法得到的编码后的B帧。
所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
采用图片压缩算法对所述I帧进行解码,采用分布式视频编码算法对所述P帧进行解码。
实施例四
本公开还提供一种存储介质,其中,所述存储介质存储有视频压缩程序或者视频解压程序,所述视频压缩程序或者视频解压程序被处理器执行时实现如上所述的视频压缩方法或者视频解压方法的步骤。
综上所述,本公开提供一种视频压缩方法、视频解压方法、智能终端及存储介质,所述视频压缩方法中,包括:获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;对所述前向帧和所述后向帧进行编码;根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。。所述视频解压方法中,包括:获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;对所述前向帧和所述后向帧进行解码;根据解码后的所述前向帧和所述后向帧对所述B帧进行解码。本公开基于深度学习的B帧进行编码和解码,可以提高B帧编码和解码的效果,并简化B帧编码和解码的流程。
当然,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流 程,是可以通过计算机程序来指令相关硬件(如处理器,控制器等)来完成,所述的程序可存储于一计算机可读取的存储介质中,所述程序在执行时可包括如上述各方法实施例的流程。其中所述的存储介质可为存储器、磁碟、光盘等。
应当理解的是,本公开的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本公开所附权利要求的保护范围。

Claims (16)

  1. 一种视频压缩方法,其中,所述视频压缩方法包括:
    获取视频的一个画面组,根据所述画面组获取B帧原始帧的前向帧和后向帧;
    获取所述B帧原始帧,将所述B帧原始帧通过运动补偿网络进行运动补偿,重构B帧;
    对所述前向帧和所述后向帧进行编码;
    根据编码后的所述前向帧和所述后向帧对重构后的B帧进行编码。
  2. 根据权利要求1所述的视频压缩方法,其中,所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
  3. 根据权利要求1所述的视频压缩方法,其中,所述根据所述画面组获取B帧原始帧的前向帧和后向帧,之后还包括:
    通过空间金字塔网络计算所述前向帧的正向光流和反向光流;
    通过空间金字塔网络计算所述后向帧的正向光流和反向光流。
  4. 根据权利要求3所述的视频压缩方法,其中,所述通过空间金字塔网络计算所述前向帧的正向光流和反向光流,通过空间金字塔网络计算所述后向帧的正向光流和反向光流,之后还包括:
    当光流计算完成后,对所述前向帧和所述后向帧进行空间移动操作,分别得到空间移动操作后的前向帧和后向帧。
  5. 根据权利要求4所述的视频压缩方法,其中,所述对所述前向帧和所述后向帧进行编码,具体包括:
    对经所述空间移动操作后的前向帧和后向帧进行编码。
  6. 根据权利要求4所述的视频压缩方法,其中,所述将所述B帧原始帧通过运动补偿网络进行运动补偿,具体包括:
    将所述B帧原始帧通过运动补偿网络进行运动补偿,输出一个运动补偿图片。
  7. 根据权利要求6所述的视频压缩方法,其中,所述运动补偿网络输入的通道数为16,两条分支处理后,将特征进行通道连接,输出一个三通道的运动补偿图片。
  8. 根据权利要求6所述的视频压缩方法,其中,所述将所述B帧原始帧通过运动 补偿网络进行运动补偿,之后还包括:
    当所述B帧原始帧完成运动补偿后,根据运动补偿结果计算运动补偿的视频帧和原始帧之间的残差。
  9. 根据权利要求8所述的视频压缩方法,其中,所述重构B帧,具体包括:
    获取所述残差,根据所述残差和所述运动补偿结果计算得到重构的B帧。
  10. 根据权利要求9所述的视频压缩方法,其中,所述残差指实际观察值与估计值之间的差。
  11. 根据权利要求2所述的视频压缩方法,其中,采用图片压缩算法对所述I帧进行编码,采用分布式视频编码算法对所述P帧进行编码。
  12. 一种视频解压方法,其中,所述视频解压方法包括:
    获取视频中经编码压缩的一个画面组,所述画面组包括编码后的B帧及所述B帧的前向帧和后向帧;
    对所述前向帧和所述后向帧进行解码;
    根据解码后的所述前向帧和所述后向帧对所述B帧进行解码;其中所述B帧为基于权利要求1-11任意一项所述的视频压缩方法得到的编码后的B帧。
  13. 根据权利要求12所述的视频解压方法,其中,所述前向帧为I帧或者P帧;所述后向帧为I帧或者P帧。
  14. 根据权利要求12所述的视频解压方法,其中,采用图片压缩算法对所述I帧进行解码,采用分布式视频编码算法对所述P帧进行解码。
  15. 一种智能终端,其中,所述智能终端包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的视频压缩程序或者视频解压程序,所述视频压缩程序被所述处理器执行时实现如权利要求1-11任一项所述的视频压缩方法的步骤或者所述视频解压程序被所述处理器执行时实现如权利要求12-14任一项所述的视频解压方法的步骤。
  16. 一种存储介质,其中,所述存储介质存储有视频压缩程序或者视频解压程序,所述视频压缩程序被处理器执行时实现如权利要求1-11任一项所述的视频压缩方法的步骤或者所述视频解压程序被处理器执行时实现如权利要求12-14任一项所述的视频解 压方法的步骤。
PCT/CN2020/125529 2020-03-31 2020-10-30 一种视频压缩方法、视频解压方法、智能终端及存储介质 WO2021196582A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010244040.6A CN113473145B (zh) 2020-03-31 2020-03-31 一种视频压缩方法、视频解压方法、智能终端及存储介质
CN202010244040.6 2020-03-31

Publications (1)

Publication Number Publication Date
WO2021196582A1 true WO2021196582A1 (zh) 2021-10-07

Family

ID=77865616

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125529 WO2021196582A1 (zh) 2020-03-31 2020-10-30 一种视频压缩方法、视频解压方法、智能终端及存储介质

Country Status (2)

Country Link
CN (1) CN113473145B (zh)
WO (1) WO2021196582A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320020A (zh) * 2018-02-07 2018-07-24 深圳市唯特视科技有限公司 一种基于双向光流的端到端无监督学习方法
CN109151476A (zh) * 2018-09-21 2019-01-04 北京大学 一种基于双向预测的b帧图像的参考帧生成方法及装置
US20190297326A1 (en) * 2018-03-21 2019-09-26 Nvidia Corporation Video prediction using spatially displaced convolution
CN110741640A (zh) * 2017-08-22 2020-01-31 谷歌有限责任公司 用于视频代码化中的运动补偿预测的光流估计
CN110913218A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8054882B2 (en) * 2005-05-13 2011-11-08 Streaming Networks (Pvt.) Ltd. Method and system for providing bi-directionally predicted video coding
US20080260023A1 (en) * 2007-04-18 2008-10-23 Chih-Ta Star Sung Digital video encoding and decoding with refernecing frame buffer compression
US10659788B2 (en) * 2017-11-20 2020-05-19 Google Llc Block-based optical flow estimation for motion compensated prediction in video coding
US11284107B2 (en) * 2017-08-22 2022-03-22 Google Llc Co-located reference frame interpolation using optical flow estimation
WO2019168765A1 (en) * 2018-02-27 2019-09-06 Portland State University Context-aware synthesis for video frame interpolation
CN112997499A (zh) * 2018-09-14 2021-06-18 皇家Kpn公司 基于经全局运动补偿的运动矢量预测值的视频编码
EP3850845A1 (en) * 2018-09-14 2021-07-21 Koninklijke KPN N.V. Video coding based on global motion compensated motion vectors
CN109451308B (zh) * 2018-11-29 2021-03-09 北京市商汤科技开发有限公司 视频压缩处理方法及装置、电子设备及存储介质
CN109922231A (zh) * 2019-02-01 2019-06-21 重庆爱奇艺智能科技有限公司 一种用于生成视频的插帧图像的方法和装置
CN110572677B (zh) * 2019-09-27 2023-10-24 腾讯科技(深圳)有限公司 视频编解码方法和装置、存储介质及电子装置
CN110913219A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110741640A (zh) * 2017-08-22 2020-01-31 谷歌有限责任公司 用于视频代码化中的运动补偿预测的光流估计
CN108320020A (zh) * 2018-02-07 2018-07-24 深圳市唯特视科技有限公司 一种基于双向光流的端到端无监督学习方法
US20190297326A1 (en) * 2018-03-21 2019-09-26 Nvidia Corporation Video prediction using spatially displaced convolution
CN109151476A (zh) * 2018-09-21 2019-01-04 北京大学 一种基于双向预测的b帧图像的参考帧生成方法及装置
CN110913218A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备

Also Published As

Publication number Publication date
CN113473145A (zh) 2021-10-01
CN113473145B (zh) 2024-05-31

Similar Documents

Publication Publication Date Title
Hu et al. Improving deep video compression by resolution-adaptive flow coding
Golinski et al. Feedback recurrent autoencoder for video compression
CN110798690B (zh) 视频解码方法、环路滤波模型的训练方法、装置和设备
WO2023016155A1 (zh) 图像处理方法、装置、介质及电子设备
CN101189882A (zh) 用于视频压缩的编码器辅助帧率上变换(ea-fruc)的方法和装置
CN110062239B (zh) 一种用于视频编码的参考帧选择方法及装置
CN103596010B (zh) 基于字典学习的压缩感知视频编解码***
CN111669588B (zh) 一种超低时延的超高清视频压缩编解码方法
Le et al. Mobilecodec: neural inter-frame video compression on mobile devices
WO2018120019A1 (zh) 用于神经网络数据的压缩/解压缩的装置和***
CN113874916A (zh) Ai辅助的可编程硬件视频编解码器
CN115956363A (zh) 用于后滤波的内容自适应在线训练方法及装置
TW202337211A (zh) 條件圖像壓縮
Pinkham et al. Algorithm-aware neural network based image compression for high-speed imaging
Sun et al. Spatiotemporal entropy model is all you need for learned video compression
WO2023024115A1 (zh) 编码方法、解码方法、编码器、解码器和解码***
WO2021196582A1 (zh) 一种视频压缩方法、视频解压方法、智能终端及存储介质
WO2021168827A1 (zh) 图像传输方法及装置
Hu et al. HDVC: Deep Video Compression with Hyperprior-Based Entropy Coding
JP7368639B2 (ja) ビデオ符号化のための方法、装置及びコンピュータプログラム
CN113709483B (zh) 一种插值滤波器系数自适应生成方法及装置
Yang et al. Graph-convolution network for image compression
US20220245449A1 (en) Method for training a single non-symmetric decoder for learning-based codecs
CN114189684A (zh) 一种基于jnd算法的视频编码方法、装置、介质及计算设备
TWI514851B (zh) 影像編碼/解碼系統與其方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928923

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928923

Country of ref document: EP

Kind code of ref document: A1