WO2021254139A1 - 视频处理方法、设备及存储介质 - Google Patents

视频处理方法、设备及存储介质 Download PDF

Info

Publication number
WO2021254139A1
WO2021254139A1 PCT/CN2021/097686 CN2021097686W WO2021254139A1 WO 2021254139 A1 WO2021254139 A1 WO 2021254139A1 CN 2021097686 W CN2021097686 W CN 2021097686W WO 2021254139 A1 WO2021254139 A1 WO 2021254139A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
resolution
key frame
key
video
Prior art date
Application number
PCT/CN2021/097686
Other languages
English (en)
French (fr)
Inventor
杨维
徐科
孔德辉
宋剑军
朱方
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21826930.6A priority Critical patent/EP4167570A4/en
Priority to US18/010,356 priority patent/US20230269395A1/en
Publication of WO2021254139A1 publication Critical patent/WO2021254139A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440227Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to, but is not limited to, video processing technology.
  • Video transmission and compression are particularly important.
  • coding and decoding technologies continue to develop, in areas with poor conditions, such as mountainous areas or other areas with harsh environments, due to insufficient base stations and insufficient broadband resources, video information may be lost and transmission delays may be large.
  • video processing methods which can reduce the resources occupied by video data transmission in harsh environments, insufficient resources, or restricted traffic, and can decode and obtain high-quality videos.
  • An embodiment of the present disclosure provides a video processing method, including: obtaining a mixed-resolution video by encoding and decoding an original video, the mixed-resolution video including a first resolution frame corresponding to a key frame in the original video, and The second resolution frame also includes a third resolution frame corresponding to a non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame and The resolution of the third resolution frame; according to the first resolution frame and the second resolution frame corresponding to the key frame, perform the third resolution frame corresponding to the non-key frame in the mixed-resolution video Amplify and output an amplified video, where the amplified video includes a first resolution frame corresponding to the key frame and an amplified target frame corresponding to the non-key frame.
  • the embodiment of the present disclosure also provides a device, including: one or more processors; and a storage device, configured to store one or more programs; when the one or more programs are used by the one or more processors Execution, so that the one or more processors implement the above-mentioned video processing method.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, and a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the above-mentioned video processing method is realized.
  • Fig. 1 is a flowchart of a video processing method provided by the present disclosure
  • FIG. 2 is a schematic diagram of a mixed-resolution video provided by the present disclosure
  • Fig. 3 is a flowchart of a video processing method provided by the present disclosure
  • FIG. 4 is a schematic diagram of the principle of a video amplification network provided by the present disclosure.
  • FIG. 5 is a schematic diagram of a motion estimation sub-network provided by the present disclosure.
  • FIG. 6 is a schematic diagram of a motion compensation sub-network provided by the present disclosure.
  • FIG. 7 is a schematic diagram of an enlarged sub-network provided by the present disclosure.
  • FIG. 8 is a schematic structural diagram of a video processing device provided by the present disclosure.
  • FIG. 9 is a schematic structural diagram of a video processing system provided by the present disclosure.
  • FIG. 10 is a schematic diagram of the hardware structure of a device provided by the present disclosure.
  • a video processing method is provided.
  • the method only needs to encode and decode non-key frames to obtain low-resolution frames, and then amplify the low-resolution frames with reference to the information of the key frames, so as to obtain less code streams.
  • the amplified high-resolution video reduces the bandwidth pressure of video transmission and improves the subjective and objective quality of video amplification.
  • FIG. 1 is a flowchart of a video processing method provided by the present disclosure. As shown in FIG. 1, the method provided by the present disclosure includes step 110 and step 120.
  • a mixed-resolution video is obtained by encoding and decoding the original video, and the mixed-resolution video includes the first resolution frame and the second resolution frame corresponding to the key frames in the original video, and also includes The third resolution frame corresponding to the non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame and the resolution of the third resolution frame Resolution.
  • the original video refers to the high-resolution video to be processed or to be transmitted.
  • the original video is first encoded and decoded to obtain a mixed-resolution video. Compared with the original video, the resolution of some frames has been reduced to realize the video data In this process, the transmission is completed with less resources.
  • the original video includes key frames and non-key frames. Key frames are, for example, scene transition frames, a certain frame selected from similar frames, group of pictures (GOP) internal coded frames (I frames), etc.
  • the key carries the key information of the video frame and can be used to summarize the scene events of the video. All frames except key frames are non-key frames.
  • SVC Scaled Video Coding
  • the key frames and non-key frames in the original video can be processed differently, and the video can be divided into multiple resolutions to obtain a hybrid Resolution video.
  • SVC is a hybrid codec framework.
  • different code blocks are constructed to form the entire codec protocol.
  • the first resolution frame (high resolution frame) and the second resolution frame (low resolution frame) are obtained through the video codec;
  • the third resolution is obtained through downsampling Frames (low-resolution frames), where the second resolution is the same as the third resolution, resulting in a mixed-resolution video.
  • Fig. 2 is a schematic diagram of a mixed-resolution video provided by the present disclosure.
  • the larger boxes represent key frames, and the smaller boxes represent non-key frames.
  • the high-resolution frame of the key frame and the low-resolution frame of the non-key frame are carried.
  • the high-resolution frame of the key frame can provide a basis for the enlargement of the low-resolution frame of the non-key frame.
  • step 120 the third resolution frame corresponding to the non-key frame in the mixed-resolution video is amplified according to the first resolution frame and the second resolution frame corresponding to the key frame, and the amplified output ,
  • the enlarged video includes a first resolution frame corresponding to the key frame and an enlarged target frame corresponding to the non-key frame.
  • a mixed resolution video can be obtained based on a spatial grading strategy (using different spatial resolutions for compression) through encoding and decoding, which can reduce the number of video transmission bits, reduce bandwidth pressure, and save resources; the mixed resolution video passes through the video
  • the enlargement process can enlarge the non-key frame whose resolution has been reduced, and obtain the high-resolution frame again, thereby obtaining a complete high-resolution video and obtaining complete high-quality video information.
  • the intra-frame reference mode based on the bidirectional reference of the hybrid codec framework enlarges the third resolution frame of the non-key frame, that is, referring to the information of the key frame before and after the non-key, each non-key frame in the middle is After zooming in, the high-resolution frame after the third resolution frame is zoomed in is the target frame.
  • the video processing method of the present disclosure obtains mixed-resolution video based on SVC, in which, for non-key frames, only low-resolution frames need to be coded and decoded, and then the low-resolution frames are amplified with reference to the information of the key frames, so that less
  • the high-resolution video whose code stream is amplified reduces the bandwidth pressure of video transmission and improves the subjective and objective quality of video amplification.
  • Fig. 3 is a flowchart of a video processing method provided by the present disclosure.
  • the encoding and decoding process of the original video and the video amplification process are described in detail with reference to Figure 3.
  • the video encoding and decoding process uses the residual graph to realize the SVC of the original video
  • the video amplification process uses a trained neural network, which combines the two Up, to ensure the efficiency of video transmission and the quality of amplification.
  • the video processing method provided by the present disclosure includes steps 210-250.
  • step 210 key frames in the original video are detected.
  • the first frame (I frame) in the GOP in the H265 protocol can be used as the key frame.
  • the key frame is encoded and decoded to obtain the first frame.
  • the resolution frame has a higher resolution, and the coding and decoding loss is small, so that it can be used as a reference frame in the video amplification process to provide a basis for video amplification.
  • step 220 a residual map of the key frame and a second resolution frame are generated, and the non-key frame is down-sampled to obtain a third resolution frame.
  • the first resolution frame you need to get the first resolution frame and the second resolution frame after encoding and decoding.
  • the first resolution frame you can downsample the key frame to the second resolution frame, and then perform upsampling and The up-sampled frame is subtracted from the original video frame to obtain a residual image.
  • the residual image can clearly reflect the key of the image after down-sampling and then up-sampling (or after splitting and splicing the key frames) and the original video.
  • the change area and the non-change area between frames so that the pixel value change range is smaller than the original one, and a smaller change range can be used for quantization, thereby reducing the number of bits transmitted; for the second resolution frame, it can be down-sampling, Image splitting and other methods obtain the lower resolution second resolution frame corresponding to the key frame; while the non-key frame is directly compressed and down-sampled to obtain the lower resolution third resolution frame.
  • step 230 video encoding and decoding are performed on the residual map, the second resolution frame, and the third resolution frame to obtain a mixed-resolution video.
  • a transmittable bit stream is obtained, and then the bit stream is decoded to obtain a mixed-resolution video.
  • step 240 for each non-key frame, the third resolution frame of the non-key frame (represented as (I t LR ), the third resolution before the previous non-key frame of the non-key frame is enlarged Frame (represented as I t-1 LR ) and the enlarged high-resolution frame (represented as I t-1 SR ) and the first resolution corresponding to the previous key frame and the next key frame adjacent to the non-key frame Rate frames (respectively denoted as I k1 SR and I k2 SR ) and corresponding second resolution frames (respectively denoted as (I k1 LR , I k2 LR ) are input to the video amplification network.
  • a video amplification network for example, a trained cyclic neural network
  • a video amplification network can be used to refer to the high-resolution frame and the key frame before and after each non-key frame based on the bidirectional reference intra-frame reference mode in the hybrid codec framework.
  • the relationship and changes between the low-resolution frames are combined with the magnification result of the previous non-key frame to enlarge the non-key frame.
  • the previous non-key frame of the non-key frame is the third resolution frame (I t-1 LR ) before the enlargement and the high resolution frame (I t-1 SR ) after the enlargement.
  • step 250 through the video amplification network, the third resolution frame before zooming in (ie I t-1 LR ) and the zoomed high resolution frame (ie I t-1 SR ), the first resolution frame corresponding to the previous key frame and the next key frame adjacent to the non-key frame (i.e. I k1 SR , I k2 SR ) and the corresponding second resolution frame (i.e. I k1 LR , I k2 LR ), amplify the third resolution frame (that is, I t LR ) of the non-key frame to obtain the amplified target frame corresponding to the non-key frame.
  • the third resolution frame before zooming in ie I t-1 LR
  • the zoomed high resolution frame ie I t-1 SR
  • the first resolution frame corresponding to the previous key frame and the next key frame adjacent to the non-key frame i.e. I k1 SR , I k2 SR
  • the corresponding second resolution frame i.
  • the video amplification network of the present disclosure can be a trained recurrent neural network, and its working principle is: by inputting the first resolution frame (high resolution frame) and the second resolution frame (low resolution frame) corresponding to two adjacent key frames.
  • Resolution frame as the reference frame of the third resolution frame (low resolution frame) of the entire intermediate non-key frame, and also input the third resolution frame before the previous non-key frame enlargement and the enlarged high resolution frame ,
  • the current non-key frame of the third resolution frame can be enlarged.
  • the current non-key frame refers to the reference frame in two directions. When the scene changes, it can make full use of the information of the low-resolution and high-resolution frames in the forward and backward directions to ensure that the current non-key The zoom effect of the frame.
  • step 220 generating the residual map of the key frame and the second resolution frame includes step 221 and step 222.
  • step 221 the key frame is down-sampled to obtain the second resolution frame of the key frame.
  • the key frame is down-sampled to obtain the second resolution frame (low-resolution frame) of the key frame.
  • the down-sampling can adopt a bilinear or bicubic method, and the decoding process uses the same method for the key frame. Method upsampling.
  • step 222 the second resolution frame of the key frame is up-sampled to obtain the temporary frame of the first resolution frame of the key frame, and the temporary frame of the first resolution frame is subtracted from the key frame to obtain the key frame The residual plot.
  • the second resolution frame of the key frame is up-sampled to obtain the temporary frame (high resolution frame) of the first resolution frame of the key frame, the temporary frame of the first resolution frame and the original video
  • the key frame (Ground Truth Frame) is subtracted to obtain the residual image.
  • the residual map can clearly reflect the changing and non-changing areas between the first resolution frame and the original video key frame, reducing the amount of coding and decoding processing, and saving coding, decoding and transmission resources.
  • the residual code stream can be obtained by encoding the residual image.
  • the second resolution frame (low resolution frame) of the key frame is first obtained by decoding, and then the enlarged image (denoted as the fourth resolution frame) is obtained by the up-sampling method, and the residual code stream is decoded Obtain the residual image, and then add the enlarged image and the residual image to obtain the first resolution frame of the key frame.
  • the encoding and decoding of the residual image needs to be performed at high resolution.
  • the clocks for decoding high resolution frames and decoding low resolution frames need to be set separately.
  • step 230 may include step 231 to step 233.
  • step 231 the residual image, the second resolution frame, and the third resolution frame are encoded by an encoder to obtain a residual code stream, the code stream of the second resolution frame, and the The code stream of the third resolution frame is described.
  • step 232 the residual code stream, the code stream of the second resolution frame, and the code stream of the third resolution frame are decoded by a decoder to obtain the residual image and the first resolution frame respectively.
  • the second resolution frame and the third resolution frame are decoded by a decoder to obtain the residual image and the first resolution frame respectively.
  • step 233 the second resolution frame is up-sampled to obtain an enlarged fourth resolution frame, and the fourth resolution frame is added to the residual map to obtain the second resolution frame of the key frame.
  • the process of video encoding and decoding it is first necessary to encode the residual image to obtain the residual code stream, and at the same time encode (compress) and transmit the second resolution frame and the third resolution frame;
  • the decoding process first obtain the second resolution frame (low-resolution frame) of the key frame by decoding, then obtain the enlarged image (denoted as the fourth resolution frame) by the up-sampling method, and obtain the result by decoding the residual code stream
  • the residual image and then add the enlarged image and the residual image to obtain the first resolution frame of the key frame.
  • the encoding and decoding of the residual image needs to be performed at high resolution.
  • the clocks for decoding high resolution frames and decoding low resolution frames need to be set separately.
  • step 110 may include step 111 to step 115.
  • step 111 key frames in the original video are detected.
  • step 112 the key frame is equally divided into a plurality of sub-pictures, the resolution of each sub-picture is the same as the resolution of the second resolution frame, and the non-key frame is down-sampled to obtain a third Resolution frame.
  • each of the sub-pictures and the third-resolution frames of the non-key frames are input to a video codec, and the video codec includes an encoder and a decoder.
  • each of the sub-pictures and the third resolution frame of the non-key frame is encoded by the encoder to obtain a code stream corresponding to each of the sub-pictures and the non-key frame.
  • step 115 the decoder is used to splice each of the sub-pictures according to the code streams of the sub-pictures to obtain the first resolution frame of the key frame, and the code of the third resolution frame The stream is decoded to obtain the third resolution frame.
  • the down-sampling and encoding/decoding are used to obtain the third resolution frame
  • the down-sampling method is used to obtain the second resolution frame.
  • the key frame is split into multiple sub-frames. Image, and then splicing multiple sub-images to obtain the first resolution frame, that is, split the key frame in the high-resolution original video into several sub-images of the same size, and then divide each sub-image through the codec Spliced into a high-resolution image, the resulting image is the first resolution frame of the key frame, and its resolution is higher than the resolution of the key frame in the original video.
  • both the splitting method and the up-sampling method can be used to obtain the first resolution frame of the key frame.
  • the splitting method is mainly suitable for video processing scenes with low timeliness requirements, while the super-resolution method requires magnification.
  • the splitting method is adopted, there are more sub-pictures that need to be split, resulting in a longer decoding process and a relatively large delay.
  • the video amplification network includes a motion estimation (ME) sub-network, a motion compensation (Motion Compansation, MC) sub-network, and an amplification sub-network.
  • ME motion estimation
  • MC motion compensation
  • amplification sub-network an amplification sub-network.
  • the video amplification network includes three sub-networks: motion estimation sub-network, motion compensation sub-network, and amplification sub-network.
  • motion estimation sub-network the estimation of the motion vector of the low-resolution frame and the motion compensation of the high-resolution frame are also considered. , Can obtain more sufficient information of the reference frame, and improve the quality of video zoom.
  • step 250 may include step 251 to step 253.
  • step 251 based on the third resolution frame (I t LR ) of the non-key frame, through the motion estimation sub-network, the third resolution frame before the previous non-key frame of the non-key frame ( I t-1 LR ) and the second resolution frame (I k1 LR , I k2 LR ) corresponding to the previous key frame and the next key frame adjacent to the non-key frame perform motion estimation.
  • the non-key frame can be divided into several blocks or macro blocks, and the position of each block or macro block in the adjacent frame can be searched out. And get the relative offset of the spatial position between the two, that is, the motion vector.
  • the inter-frame redundancy can be removed, so that the number of video transmission bits is greatly reduced.
  • the motion vector between the low-resolution frame I t LR of the non-key frame and the other three kinds of low-resolution frames is estimated separately, that is, it is estimated that I t LR and I t- 1 LR , I t LR and I k1 LR , I t LR and I k2 LR between the motion vectors.
  • step 252 the high resolution frame (I t-1 SR ) enlarged according to the previous non-key frame and the first resolution frame corresponding to the previous key frame and the next key frame adjacent to the non-key frame respectively (I k1 SR , I k2 SR ), perform motion compensation on each frame after motion estimation and the up-sampled motion vector to obtain an intermediate high-resolution feature map, and transform the intermediate high-resolution feature map through a recombination operation It is a low-resolution feature map.
  • motion compensation is performed on each frame after motion estimation and the up-sampled motion vector to obtain an intermediate high-resolution feature map, and then the intermediate high-resolution feature map is obtained through a recombination operation.
  • the feature map is transformed into a low-resolution feature map.
  • the motion vector can be roughly estimated through motion estimation, and the estimation result can be refined through motion compensation, and the offset between frames can be more clearly defined, so that the amplifying sub-network can get more effective input.
  • step 253 the third resolution frame is enlarged according to the low-resolution feature map through the enlargement sub-network to obtain the enlarged target frame corresponding to the non-key frame.
  • the video processing method of the present disclosure uses the time-space relationship between frames to amplify the third resolution frame, so that the amplified video will not jitter, and the visual effect will be more consistent.
  • FIG. 4 is a schematic diagram of the principle of a video amplification network provided by the present disclosure.
  • the video amplification network adopts a convolutional recurrent neural network to achieve amplification of low-resolution frames in a mixed-resolution video.
  • the network structure of the video amplifying network is shown in Figure 4, including three parts: a motion estimation sub-network, a motion compensation sub-network, and an amplifying sub-network.
  • the amplifying sub-network can be a Super Resolution Network (SRNet).
  • SRNet Super Resolution Network
  • I LR represents the low-resolution frame in the mixed-resolution video after encoding and decoding
  • I SR represents the high-resolution frame in the mixed-resolution video.
  • I k1 LR and I k2 LR are the previous key frame and the next key frame respectively.
  • a key frame corresponds to a low resolution frame (second resolution frame)
  • I k1 SR and I k2 SR are high resolution frames (first resolution frame) corresponding to the previous key frame and the next key frame
  • I t LR represents the low-resolution frame of the current non-key frame that needs to be enlarged (the third resolution frame)
  • I t-1 LR and I t-1 SR are the low-resolution frame of the previous non-key frame and the post-magnification, respectively High resolution frames.
  • the video zooming process may further include the following operations:
  • Pixel shift (Warp) operation which means moving the high-resolution pixel to the current pixel position plus the motion vector position
  • Shuffle operation represents the reorganization operation of the spatial channel (Channel) feature map.
  • Fig. 5 is a schematic diagram of a motion estimation sub-network provided by the present disclosure.
  • the motion estimation sub-network can include a convolutional layer (Conv), and the number of channels of the convolution kernel can be 32, 64, 128, 256, etc.; the activation layer, such as Leaky Rectified Linear Unit )
  • the excitation layer the hyperbolic (Tanh) excitation layer
  • the maximum pooling layer Max Pool
  • Bilinear bilinear transformation layer.
  • Fig. 6 is a schematic diagram of a motion compensation sub-network provided by the present disclosure.
  • the motion estimation sub-network includes a concatenated layer (Concate), which is used to splice two or more feature maps on the channel; the convolutional layer (Conv), the number of channels of the convolution kernel can be 64 , 128, 256, etc.; Residual Block; down-sampling layer (indicated as " ⁇ ”) and up-sampling layer (indicated as " ⁇ ").
  • FIG. 7 is a schematic diagram of an enlarged sub-network provided by the present disclosure.
  • the motion estimation sub-network includes a convolution layer (Conv), a modified linear unit excitation layer (ReLU), and a deconvolution layer (Conv Transpose).
  • Conv convolution layer
  • ReLU modified linear unit excitation layer
  • Conv Transpose deconvolution layer
  • step 252 may include step 2521 to step 2523.
  • step 2521 the motion vectors MV t-1 LR , MV k1 LR and MV k2 LR corresponding to each frame after the motion estimation are respectively up-sampled to obtain MV t-1 SR , MV k1 SR and MV k2 SR ; where MV t-1 LR , MV k1 LR, and MV k2 LR are the third resolution frame (I t-1 LR ) before the previous non-key frame is enlarged, the previous key frame and the next key frame adjacent to the non-key frame The second resolution frame (I k1 LR , I k2 LR ) corresponding to the frame and the third resolution frame of the non-key frame are respectively the corresponding enlarged motion vectors after motion estimation;
  • step 2522 based on MV t-1 SR , MV k1 SR and MV k2 SR , through the pixel shift operation, the high-resolution frame (I t-1 SR ) after the previous non-key frame is enlarged, and the non-key frame
  • the first resolution frame (I k1 SR , I k2 SR ) corresponding to the previous key frame and the next key frame adjacent to the key frame is pixel shifted, and the pixel shifted I t-1->t SR are obtained respectively , I k1->t SR and I k2->t SR ;
  • step 2523 the I t-1-> t SR, I k1-> t SR, I k2-> t SR, MV t-1 SR, MV k1 SR and MV k2 SR input of the compensation sub-network, after the The intermediate high-resolution feature map is obtained after motion compensation of the compensation sub-network.
  • sub-network by the motion estimation based on I t LR, respectively, in addition to the low resolution of the current frame non-key frame of the three low-resolution frame (I k1 LR, I k2 LR , I t-1 LR ) respectively perform motion estimation to obtain the corresponding motion vectors MV t-1 LR , MV k1 LR, and MV k2 LR ; perform motion estimation on each of the motion vectors MV t-1 LR , MV k1 LR, and MV k2 LR respectively.
  • the third resolution frame of the non-key frame is magnified according to the intermediate high-resolution feature map to obtain the magnified target frame corresponding to the non-key frame.
  • the third resolution frame of all non-key frames is magnified according to the intermediate high-resolution feature map to obtain the magnified target frame corresponding to the non-key frame.
  • the second resolution frame of the previous key frame is used as the previous non-key frame
  • the third resolution frame before the non-key frame is enlarged, and the first resolution frame of the previous key frame is used as the high-resolution frame after the previous non-key frame of the non-key frame is enlarged.
  • the non-key frame that currently needs to be enlarged is the first non-key frame after the adjacent previous key frame, that is, there is no other non-key frame between the non-key frame and the adjacent previous key frame.
  • Key frame in this case, in the interval of the two key frames where the non-key frame is located, there is no low-resolution frame (I t-1 LR ) that can be used as the previous non-key frame as a reference frame, There is no corresponding enlarged high-resolution frame (I t-1 SR ), then the low-resolution frame (I t-1 LR ) of the previous non-key frame can be replaced with the low-resolution frame of the previous key frame Frame (the second resolution frame, I k1 LR ), and replace the high-resolution frame (I t-1 SR ) of the previous non-key frame with the high-resolution frame of the previous key frame (the first resolution frame, I k1 SR ), so that when there is no previous non-key frame, the previous key frame is
  • the method before amplifying the third resolution frame in the mixed-resolution video according to the first resolution frame and the second resolution frame corresponding to the key frame, the method further includes: Step 100: Based on the sample video, train the video amplification network according to the first loss function and the second loss function; wherein, the first loss function is based on the original non-key frame between two adjacent key frames in the original video.
  • the mean square error of the high-resolution frame and the original high-resolution frame after down-sampling and then amplified by the video amplification network is determined; the second loss function is based on the original high-resolution frame of the non-key frame and the third resolution and the third resolution frames prior to a frame non-key frames through motion estimation, the motion vector sampling, the pixel offset I t-1-> t SR, the non-key frames of the third resolution I k1->t SR after motion estimation, motion vector up-sampling, and pixel offset between the second resolution frame of the frame and the adjacent previous key frame, the third resolution frame of the non-key frame and the adjacent
  • the second resolution frame of the next key frame after the motion estimation, the motion vector up-sampling, and the pixel offset are determined by the mean square error of I k2->t SR.
  • the video amplification network is pre-trained and has good amplification capabilities.
  • high-resolution videos of different resolutions can be obtained from the database, and the mixed-resolution videos of each high-resolution video after encoding and decoding are used as sample videos for the training of the video amplification network.
  • the content and features contained should be rich enough.
  • the video amplification network is trained according to the first loss function and the second loss function, where the second loss function is mainly used to train a sub-network (motion estimation sub-network) in the video amplification network, for the motion estimation sub-network Separate training can be carried out, and the second loss function can be constructed in a targeted manner.
  • the second loss function is mainly used to train a sub-network (motion estimation sub-network) in the video amplification network, for the motion estimation sub-network Separate training can be carried out, and the second loss function can be constructed in a targeted manner.
  • the first loss function is, for example: Among them, G represents the sum of two adjacent key frames and the middle non-key frame. It can be seen that the first loss function has nothing to do with the two key frames.
  • the second loss function is:
  • Loss ME MSE(I t HR ,W b (I k1 SR ,f t ⁇ k1 ))+MSE(I t HR ,W b (I k1 SR ,f t ⁇ k2 ))+MSE(I t HR ,W b (I t-1 SR ,f t ⁇ t-1 )), where I t HR represents the original high-resolution frame of the non-key frame currently to be amplified, and I t-1 SR represents the previous non-key frame after it has been amplified High resolution frame, f t ⁇ t-1 represents the motion vector from the non-key frame to the previous non-key frame, f t ⁇ k1 represents the motion vector from the non-key frame to the previous key frame, f t ⁇ k2 represents the motion vector from the non-key frame to the next key frame, and W b (Backward Warp) represents the reverse pixel shift (Warp) operation.
  • the principle of the video amplifying network for amplifying the mixed-resolution video can be referred to the foregoing embodiment.
  • the video amplification network training is completed.
  • the video amplification network has a reliable amplification capability and can be applied to the mixed-resolution video corresponding to other videos. Low-resolution frames are zoomed in.
  • the selected sample video is real, effective and diverse, and the sample video is input to the video codec to obtain the training data of the amplified network, which is also true and reliable, rather than artificially generated.
  • the training data of the video amplification network is obtained from the decoder, which is true and effective.
  • the trained video amplification network is also more targeted to ensure the training effect and improve the amplification ability of the video amplification network and after amplification Video quality.
  • the video amplification network includes a motion estimation sub-network, a motion compensation sub-network, and an amplification sub-network; step 100 includes: training the video amplification network according to the first loss function, and according to the second loss function The motion estimation sub-network is trained until the first loss function and the second loss function meet the set conditions.
  • the setting condition refers to that the first loss function and the second loss function are respectively stable near a value, for example, the fluctuation range of the values of the first loss function and the second loss function is less than a certain threshold.
  • the method before training the video amplification network according to the first loss function and the second loss function, the method further includes: Step 101: preprocessing the sample video to obtain a sample video with a preset resolution.
  • part of the content of the obtained high-resolution video can be intercepted to make it meet the preset resolution (for example, 256*256), and then input to the video codec, and the code of the mixed resolution can be obtained after encoding. Stream, decoded to get mixed-resolution frames.
  • the preset resolution for example, 256*256
  • the video processing method of the present disclosure uses the residual map to perform encoding and decoding, and only needs to process the change area of the key frame, which can reduce the code rate of the key frame encoding and decoding; in the video amplification process, the two-way key frame is used as the reference frame.
  • FIG. 8 is a schematic structural diagram of a video processing device provided by the present disclosure. As shown in FIG. 8, the video processing device includes: a coding and decoding module 310 and an amplifying module 320.
  • the codec module 310 is configured to obtain a mixed-resolution video by coding and decoding the original video, the mixed-resolution video including the first resolution frame and the second resolution frame corresponding to the key frames in the original video, It also includes a third resolution frame corresponding to a non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame and the third resolution The resolution of the frame.
  • the amplification module 320 is configured to amplify the third resolution frame corresponding to the non-key frame in the mixed-resolution video according to the first resolution frame and the second resolution frame corresponding to the key frame, and output the amplification The latter video, wherein the enlarged video includes a first resolution frame corresponding to the key frame and an enlarged target frame corresponding to the non-key frame.
  • the video processing device of the present disclosure obtains a mixed resolution video by encoding and decoding the original video; according to the first resolution frame and the second resolution frame corresponding to the key frame, the non-key frame in the mixed resolution video
  • the third resolution frame is enlarged.
  • For non-key frames only low-resolution frames need to be coded and decoded, and then the low-resolution frames are enlarged according to the key frames to obtain an enlarged high-resolution video with less bit stream, which reduces
  • the bandwidth pressure of video transmission improves the subjective and objective quality of video amplification.
  • the codec module 310 includes: a detection unit configured to detect key frames in the original video; a frame processing unit configured to generate a residual map of the key frames and a second resolution frame, And down-sampling the non-key frames to obtain a third resolution frame; the encoding and decoding unit is configured to perform video encoding and decoding on the residual map, the second resolution frame, and the third resolution frame , Get mixed resolution video.
  • generating the residual map of the key frame and the second resolution frame includes: down-sampling the key frame to obtain the second resolution frame of the key frame; Up-sampling is performed to obtain the temporary frame of the first resolution frame of the key frame, and the temporary frame of the first resolution frame is subtracted from the key frame to obtain the residual map of the key frame.
  • the encoding and decoding unit may be configured to encode the residual map, the second resolution frame, and the third resolution frame by obtaining an encoder to obtain a residual code stream, The code stream of the second resolution frame and the code stream of the third resolution frame; the residual code stream, the code stream of the second resolution frame, and the third resolution frame are processed by the decoder Decode the bit stream of the high-speed frame to obtain the residual map, the second resolution frame, and the third resolution frame respectively; perform up-sampling on the second resolution frame to obtain the enlarged fourth Resolution frame, adding the fourth resolution frame and the residual map to obtain the first resolution frame of the key frame.
  • the codec module 310 includes: a detection unit configured to detect key frames in the original video; a frame processing unit configured to equally divide the key frames into a plurality of sub-pictures, each of the sub-pictures The resolution of the image is the same as the resolution of the second resolution frame, and the non-key frame is down-sampled to obtain the third resolution frame; the encoding unit is set to combine each sub-image with the non-key The third resolution frame of the frame is input to a video codec, and the video codec includes an encoder and a decoder; each sub-picture and the third resolution frame of the non-key frame are processed by the encoder.
  • the decoding unit is configured to splice each of the sub-pictures according to the code streams of the sub-pictures through the decoder to obtain the The first resolution frame of the key frame is decoded, and the code stream of the third resolution frame is decoded to obtain the third resolution frame.
  • the amplifying module 320 includes: an input unit configured to, for each non-key frame, respectively, the third resolution frame (I t LR ) of the non-key frame and the previous non-key frame of the non-key frame
  • the first resolution frame (I k1 SR , I k2 SR ) and the corresponding second resolution frame (I k1 LR , I k2 LR ) corresponding to the frame are input to the video amplification network; the amplification unit is set to pass through the video amplification network , According to the previous non-key frame of the non-key frame, the third resolution frame (I t-1 LR ) before zooming in and the high-resolution frame (I t-1 SR ) after zooming in, adjacent to the non-key frame
  • the video amplification network includes a motion estimation sub-network, a motion compensation sub-network, and an amplification sub-network; the amplification unit may be set to: a third resolution frame (I t LR ) based on the non-key frame, Through the motion estimation sub-network, the third resolution frame (I t-1 LR ) before the previous non-key frame of the non-key frame and the previous key frame and the next key frame adjacent to the non-key frame are respectively enlarged.
  • a third resolution frame I t LR
  • the second resolution frame (I k1 LR , I k2 LR ) corresponding to a key frame is used for motion estimation; the high resolution frame (I t-1 SR ) amplified according to the previous non-key frame and the non-key frame
  • the first resolution frames (I k1 SR , I k2 SR ) corresponding to the adjacent previous key frame and the next key frame are subjected to motion compensation for each frame after motion estimation and the up-sampled motion vector to obtain the middle high Resolution feature map, and transform the intermediate high-resolution feature map into a low-resolution feature map through a recombination operation; zoom in the third resolution frame according to the low-resolution feature map through the magnification sub-network , Get the enlarged target frame corresponding to the non-key frame.
  • the high resolution frame (I t-1 SR ) enlarged according to the previous non-key frame and the first resolution corresponding to the previous key frame and the next key frame adjacent to the non-key frame are respectively Rate frames (I k1 SR , I k2 SR ), perform motion compensation on each frame after motion estimation and the up-sampled motion vector to obtain an intermediate high-resolution feature map, including: respectively corresponding to each frame after motion estimation
  • the motion vectors MV t-1 LR , MV k1 LR and MV k2 LR are up-sampled to obtain MV t-1 SR , MV k1 SR and MV k2 SR ; among them, MV t-1 LR , MV k1 LR and MV k2 LR are respectively
  • the third resolution frame (I t-1 LR ) before the previous non-key frame is enlarged, the previous key frame adjacent to the non-key frame and the second resolution frame corresponding to the next key frame (I k1
  • the second resolution frame of the previous key frame is used as the previous non-key frame
  • the third resolution frame before the non-key frame is enlarged, and the first resolution frame of the previous key frame is used as the high-resolution frame after the previous non-key frame of the non-key frame is enlarged.
  • the device further includes: a training module, configured to perform an evaluation of the third resolution in the mixed-resolution video according to the first resolution frame and the second resolution frame corresponding to the key frame.
  • the video amplification network is trained based on the sample video according to the first loss function and the second loss function; wherein, the first loss function is based on the non-interval between two adjacent key frames in the original video.
  • the original high-resolution frame of the key frame and the original high-resolution frame are down-sampled and then amplified by the video amplification network to determine the frame mean square error; the second loss function is based on the original high-resolution frame of the non-key frame and the non-key frame.
  • the third resolution frame of the key frame and the third resolution frame of the previous non-key frame have undergone motion estimation, motion vector upsampling, and pixel shifted, It -1->t SR
  • the first non-key frame I k1->t SR after the three-resolution frame and the second resolution frame of the adjacent previous key frame undergo motion estimation, motion vector upsampling, and pixel offset
  • the third resolution frame of the non-key frame The second resolution frame adjacent to the next key frame is determined by the mean square error of I k2->t SR after motion estimation, motion vector up-sampling, and pixel offset.
  • the video amplification network includes a motion estimation sub-network, a motion compensation sub-network, and an amplification sub-network;
  • the training of the video amplification network according to the first loss function and the second loss function includes: The first loss function trains the video amplification network, and the motion estimation sub-network is trained according to the second loss function, until the first loss function and the second loss function meet a set condition.
  • the device further includes: a preprocessing module configured to preprocess the sample video before training the video amplification network according to the first loss function and the second loss function to obtain a preset resolution Sample video.
  • a preprocessing module configured to preprocess the sample video before training the video amplification network according to the first loss function and the second loss function to obtain a preset resolution Sample video.
  • the video processing device proposed in the present disclosure belongs to the same concept as the above-mentioned video processing method.
  • the video processing device is equipped with the same method as the implementation of the video processing method. Beneficial effect.
  • FIG. 9 is a schematic structural diagram of a video processing system provided by the present disclosure.
  • the system includes: a video codec 410 for processing an original video to obtain a mixed-resolution video, the mixed-resolution video including a first resolution frame corresponding to a key frame in the original video And the second resolution frame, further including a third resolution frame corresponding to the non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame And the resolution of the third resolution frame; the video amplification network 420 is configured to perform processing on the non-key frames in the mixed-resolution video according to the first resolution frame and the second resolution frame corresponding to the key frame
  • the corresponding third resolution frame is enlarged, and the enlarged video is output, wherein the enlarged video includes the first resolution frame corresponding to the key frame and the enlarged target frame corresponding to the non-key frame .
  • the video codec 410 may be used to: detect key frames in the original video; generate a residual map of the key frames and a second resolution frame, and download the non-key frames Sampling to obtain a third resolution frame; after performing video encoding and decoding on the residual map, the second resolution frame, and the third resolution frame, a mixed resolution video is obtained.
  • generating the residual map of the key frame and the second resolution frame includes: down-sampling the key frame to obtain the second resolution frame of the key frame; Up-sampling is performed to obtain the temporary frame of the first resolution frame of the key frame, and the temporary frame of the first resolution frame is subtracted from the key frame to obtain the residual map of the key frame.
  • performing video encoding and decoding on the residual map, the second resolution frame, and the third resolution frame to obtain a mixed-resolution video includes: Encoding a residual map, the second resolution frame, and the third resolution frame to obtain a residual code stream, a code stream of the second resolution frame, and a code stream of the third resolution frame; The residual code stream, the code stream of the second resolution frame, and the code stream of the third resolution frame are decoded by the decoder to obtain the residual image and the second resolution frame respectively. Rate frame and the third resolution frame; up-sampling the second resolution frame to obtain an enlarged fourth resolution frame, adding the fourth resolution frame to the residual map, Obtain the first resolution frame of the key frame.
  • the video amplification network 420 may be used to: for each non-key frame, respectively amplify the third resolution frame (I t LR ) of the non-key frame and the previous non-key frame of the non-key frame
  • the first resolution frame (I k1 SR , I k2 SR ) and the corresponding second resolution frame (I k1 LR , I k2 LR ) are input to the video amplification network; the amplification unit is set to pass through the video amplification network, according to the The previous non-key frame of the non-key frame
  • the video amplification network includes a motion estimation sub-network, a motion compensation sub-network, and an amplification sub-network; according to the non-key frame before the non-key frame, the third resolution frame (I t-1 LR ) and the enlarged high-resolution frame (I t-1 SR ), the first resolution frame (I k1 SR , I k2 SR) corresponding to the previous key frame and the next key frame adjacent to the non-key frame ) And the corresponding second resolution frame (I k1 LR , I k2 LR ), amplify the third resolution frame (I t LR ) of the non-key frame to obtain the amplified target frame corresponding to the non-key frame , Including a third resolution frame (I t LR ) based on the non-key frame, through the motion estimation sub-network, the third resolution frame (I t -1 LR ) and the second resolution frame (I k1 LR , I k2 ), a
  • the high resolution frame (I t-1 SR ) enlarged according to the previous non-key frame and the first resolution corresponding to the previous key frame and the next key frame adjacent to the non-key frame are respectively Rate frames (I k1 SR , I k2 SR ), perform motion compensation on each frame after motion estimation and the up-sampled motion vector to obtain an intermediate high-resolution feature map, including: respectively corresponding to each frame after motion estimation
  • the motion vectors MV t-1 LR , MV k1 LR and MV k2 LR are up-sampled to obtain MV t-1 SR , MV k1 SR and MV k2 SR ; among them, MV t-1 LR , MV k1 LR and MV k2 LR are respectively
  • the third resolution frame (I t-1 LR ) before the previous non-key frame is enlarged and the second resolution frame (I k1 LR , I k1 LR, I k2 LR )
  • the second resolution frame of the previous key frame is used as the previous non-key frame
  • the third resolution frame before the non-key frame is enlarged, and the first resolution frame of the previous key frame is used as the high-resolution frame after the previous non-key frame of the non-key frame is enlarged.
  • a loss function and a second loss function train the video amplification network; wherein, the first loss function is based on the original high-resolution frame and the original high-resolution frame of the non-key frame between two adjacent key frames in the original video.
  • the resolution is down-sampled and then amplified by the video amplification network to determine the mean square error of the frame; the second loss function is based on the original high-resolution frame of the non-key frame, the third resolution frame of the non-key frame, and the previous one.
  • the third resolution frame of the non-key frame is subjected to motion estimation, motion vector upsampling, and pixel shifted, It -1->t SR , the third resolution frame of the non-key frame and the adjacent previous key I k1->t SR of the second resolution frame of the frame after motion estimation, motion vector upsampling, and pixel offset, the third resolution frame of the non-key frame and the second adjacent key frame of the next key frame
  • the resolution frame is determined by the mean square error of I k2->t SR after motion estimation, motion vector up-sampling, and pixel offset.
  • the video amplification network 420 includes a motion estimation sub-network, a motion compensation sub-network, and an amplification sub-network; the training of the video amplification network according to the first loss function and the second loss function includes: A loss function trains the video amplification network, and trains the motion estimation sub-network according to a second loss function, until the first loss function and the second loss function meet a set condition.
  • the sample video is preprocessed to obtain a sample video with a preset resolution.
  • the video processing system provided by the present disclosure and the video processing method proposed above belong to the same concept.
  • the video processing system has the same capabilities as the implementation of the video processing method. The beneficial effects.
  • the present disclosure also provides a device.
  • the video processing method may be executed by a video processing device, which may be implemented by software and/or hardware, and integrated in the device.
  • FIG. 10 is a schematic diagram of the hardware structure of a device provided by the present disclosure.
  • a device provided by the present disclosure includes: a processor 610 and a storage device 520. There may be one or more processors in the device.
  • one processor 610 is taken as an example.
  • the processor 610 and the storage device 620 in the device may be connected through a bus or other methods.
  • FIG. Connect as an example.
  • the storage device 620 in the device serves as a computer-readable storage medium and can be used to store one or more programs.
  • the one or more programs are executed by the one or more processors 610, so that the one or more processors implement the video processing method described in any one of the foregoing embodiments.
  • the method includes: obtaining a mixed-resolution video by encoding and decoding an original video, the mixed-resolution video including a first resolution frame and a second resolution frame corresponding to a key frame in the original video, and all The third resolution frame corresponding to the non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame and the resolution of the third resolution frame Rate; according to the first resolution frame and the second resolution frame corresponding to the key frame, amplify the third resolution frame corresponding to the non-key frame in the mixed-resolution video, and output the amplified video,
  • the enlarged video includes a first resolution frame corresponding to the key frame and an enlarged target frame corresponding to the non-key frame.
  • the program may be a software program, a computer executable program, and a module, such as the program instructions/modules corresponding to the video processing method in the present disclosure (for example, the module in the video processing device shown in FIG. 8 includes: a codec module 310 And amplifying module 320).
  • the processor 610 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the storage device 620, that is, realizes the video processing method in the foregoing method embodiment.
  • the storage device 620 mainly includes a storage program area and a storage data area.
  • the storage program area can store an operating system and an application program required by at least one function; the storage data area can store data created according to the use of the device, etc. Mixed-resolution video, third-resolution frame, etc. in the mode).
  • the storage device 620 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the storage device 620 may further include a memory remotely provided with respect to the processor 610, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • a mixed-resolution video is obtained by encoding and decoding the original video
  • the mixed-resolution video It includes the first resolution frame and the second resolution frame corresponding to the key frame in the original video, and also includes the third resolution frame corresponding to the non-key frame in the original video, wherein the first resolution The resolution of the frame is higher than the resolution of the second resolution frame and the resolution of the third resolution frame; according to the first resolution frame and the second resolution frame corresponding to the key frame, the The third resolution frame corresponding to the non-key frame in the mixed-resolution video is amplified, and the amplified video is output.
  • the amplified video includes the first resolution frame corresponding to the key frame and the non-key frame.
  • the zoomed target frame corresponding to the key frame is a mixed-resolution video obtained by encoding and decoding the original video
  • the mixed-resolution video It includes the first resolution frame and the second resolution frame corresponding to the key frame in the original video, and also includes the third resolution frame corresponding to the non-key frame in
  • the device proposed in the present disclosure and the video processing method proposed above belong to the same concept.
  • the device has the same beneficial effects as the implementation of the video processing method.
  • the present disclosure also provides a storage medium containing computer-executable instructions.
  • the computer-executable instructions are used to execute a video processing method when executed by a computer processor.
  • the method includes: obtaining a mixed-resolution video by encoding and decoding an original video, the mixed-resolution video including a first resolution frame and a second resolution frame corresponding to a key frame in the original video, and all The third resolution frame corresponding to the non-key frame in the original video, wherein the resolution of the first resolution frame is higher than the resolution of the second resolution frame and the resolution of the third resolution frame Rate; according to the first resolution frame and the second resolution frame corresponding to the key frame, amplify the third resolution frame corresponding to the non-key frame in the mixed-resolution video, and output the amplified video,
  • the enlarged video includes a first resolution frame corresponding to the key frame and an enlarged target frame corresponding to the non-key frame.
  • the present disclosure can be implemented by software and general-purpose hardware, and can also be implemented by hardware.
  • the technical solution of the present disclosure can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute any of the disclosures Implement the method described in the mode.
  • the block diagram of any logic flow in the drawings of the present disclosure may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions.
  • the computer program can be stored on the memory.
  • the memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read-only memory (ROM), random access memory (RAM), optical storage devices and systems (digital multi-function discs) DVD or CD disc) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (DSP), application-specific integrated circuits (ASIC), programmable logic devices (FGPA) And processors based on multi-core processor architecture.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FGPA programmable logic devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种视频处理方法、设备及存储介质。该方法通过对原始视频进行编解码,得到混合分辨率视频,混合分辨率视频包括关键帧对应的第一分辨率帧和第二分辨率帧,还包括非关键帧对应的第三分辨率帧,第一分辨率帧的分辨率高于第二分辨率帧的分辨率和第三分辨率帧的分辨率;根据第一分辨率帧和第二分辨率帧,对非关键帧对应的第三分辨率帧进行放大并输出放大后的视频,放大后的视频包括关键帧对应的第一分辨率帧以及非关键帧对应的放大后的目标帧。

Description

视频处理方法、设备及存储介质
相关申请的交叉引用
本申请要求2020年6月15日提交给中国专利局的第202010544611.8号专利申请的优先权,其全部内容通过引用合并于此。
技术领域
本公开涉及但不限于视频处理技术。
背景技术
在如今的信息时代,视频信息占有比例越来越高,视频数据量越来越大,视频的传输与压缩尤为重要。虽然编解码技术不断发展,但在条件较差的地区,比如山区或者其他环境恶劣的区域,由于搭建的基站不够多、宽带资源不足,可能会导致视频信息丢失、传输时延大等,而如果对整个视频进行大幅度的压缩,解码后的视频分辨率低,主客观的质量和放大效果差,严重影响观看感受。目前缺乏有效的视频处理方法,能够在恶劣环境、资源不足或者限制流量等场景下,既减少视频数据传输占用的资源,又可以解码得到高质量的视频。
发明内容
本公开实施例提供一种视频处理方法,包括:通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
本公开实施例还提供了一种设备,包括:一个或多个处理器; 以及存储装置,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的视频处理方法。
本公开实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现上述的视频处理方法。
附图说明
图1为本公开提供的一种视频处理方法的流程图;
图2为本公开提供的一种混合分辨率视频的示意图;
图3为本公开提供的一种视频处理方法的流程图;
图4为本公开提供的一种视频放大网络的原理示意图;
图5为本公开提供的一种运动估计子网络的示意图;
图6为本公开提供的一种运动补偿子网络的示意图;
图7为本公开提供的一种放大子网络的示意图;
图8为本公开提供的一种视频处理装置的结构示意图;
图9为本公开提供的一种视频处理***的结构示意图;
图10为本公开提供的一种设备的硬件结构示意图。
具体实施方式
下面结合附图和实施例对本公开进行说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互任意组合。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分而非全部结构。
在本公开中,提供一种视频处理方法,该方法对于非关键帧只需编解码得到低分辨率帧,然后参考关键帧的信息对低分辨率帧进行放大,从而以较少的码流得到放大的高分辨率视频,降低了视频传输的带宽压力,提高了视频放大的主客观质量。
图1为本公开提供的一种视频处理方法的流程图,如图1所示, 本公开提供的方法包括步骤110和步骤120。
在步骤110中,通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率。
本公开中,原始视频是指待处理或待传输的高分辨率视频,原始视频首先经过编解码处理得到混合分辨率视频,相比于原始视频,部分帧的分辨率已经降低,实现了视频数据的压缩,在此过程中也以较少的资源完成了传输。原始视频中包括关键帧和非关键帧,关键帧例如为场景转换帧、从相似帧中选取出的某个帧、图像组(Group of Pictures,GOP)的内部编码帧(I帧)等,关键帧携带了视频帧的关键信息,可用于概括视频的场景事件,除关键帧以外的所有帧即为非关键帧。
通过视频编解码器可基于分层视频编解码(Scaled Video Coding,SVC)技术,对原始视频中的关键帧和非关键帧分别进行不同的处理,将视频分割为多种分辨率,从而得到混合分辨率视频。SVC是一种混合编解码框架,通过将视频帧帧内的空间冗余和帧间的时间冗余相结合,构建不同的编码块形成整个编解码协议。具体的,对于关键帧,通过视频编解码器得到第一分辨率帧(高分辨率帧)和第二分辨率帧(低分辨率帧);对于非关键帧,通过下采样得到第三分辨率帧(低分辨率帧),其中,第二分辨率与第三分辨率相同,从而得到了混合分辨率视频。
图2为本公开提供的一种混合分辨率视频的示意图。如图2所示,较大的方框代表的是关键帧,较小的方框代表非关键帧。经过编解码处理得到的混合分辨率视频中,携带了关键帧的高分辨率帧,以及非关键帧的低分辨率帧。关键帧的高分辨率帧可为非关键帧的低分辨率帧放大提供依据。
在步骤120中,根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进 行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
本公开中,通过编解码可以基于空间分级策略(使用不同的空间分辨率进行压缩)得到混合分辨率视频,能够减少视频传输的比特数,降低带宽压力,节约资源;混合分辨率视频再经过视频放大处理,可以将被降低分辨率的非关键帧放大,重新得到高分辨率帧,从而得到完整的高分辨率视频,获取完整的高质量的视频信息。本公开中,基于混合编解码框架的双向参考的帧内参考模式对非关键帧的第三分辨率帧进行放大,即,参考非关键前后的关键帧的信息,对中间的每个非关键帧进行放大,第三分辨率帧经过放大后的高分辨率帧即为目标帧。
本公开的视频处理方法,基于SVC得到混合分辨率视频,其中,对于非关键帧只需编解码得到低分辨率帧,然后参考关键帧的信息对低分辨率帧进行放大,从而以较少的码流得到放大的高分辨率视频,降低了视频传输的带宽压力,提高了视频放大的主客观质量。
图3为本公开提供的一种视频处理方法的流程图。参照图3对原始视频的编解码过程和视频放大过程进行具体描述,其中,视频编解码过程利用残差图实现对原始视频的SVC,视频放大过程了利用了经过训练的神经网络,两者结合起来,保证了视频传输的效率和放大的质量。如图3所示,本公开提供的视频处理方法包括步骤210-250。
在步骤210中,检测所述原始视频中的关键帧。
在对高分辨率的原始视频进行编解码器之前,首先检测关键帧,本公开中,可以将H265协议中GOP中的第一帧(I帧)作为关键帧,关键帧经过编解码得到第一分辨率帧,具有较高分辨率,编解码损失很小,从而可以在视频放大过程中作为参考帧,为视频放大提供依据。
在步骤220中,生成所述关键帧的残差图和第二分辨率帧,并对所述非关键帧进行下采样得到第三分辨率帧。
对于关键帧,需要在编解码之后得到第一分辨率帧和第二分辨率帧,对于第一分辨率帧,可以对关键帧进行下采样到第二分辨率帧,然后再进行上采样并将上采样后的帧与原始视频帧相减,得到残差图, 残差图可清晰反映出下采样再上采样后的(或者是对关键帧经过拆分和拼接后的)图像与原始视频关键帧之间的变化区域与非变化区域,从而像素值变化范围比原来小,可以用更小的变化范围做量化,从而降低传输的比特位数;对于第二分辨率帧,可以通过下采样、图像拆分等方法得到关键帧对应的较低分辨率的第二分辨率帧;而对于非关键帧直接进行压缩下采样,得到较低分辨率的第三分辨率帧。
在步骤230中,对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码,得到混合分辨率视频。
通过对残差图、第二分辨率帧以及第三分辨率帧进行编码,得到可传输的码流,然后再对码流进行解码,得到混合分辨率视频。
在步骤240中,对于每个非关键帧,分别将该非关键帧的第三分辨率帧(表示为(I t LR)、该非关键帧的前一非关键帧放大前的第三分辨率帧(表示为I t-1 LR)和放大后的高分辨率帧(表示为I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(分别表示为I k1 SR、I k2 SR)和对应的第二分辨率帧(分别表示为(I k1 LR、I k2 LR)输入所述视频放大网络。
本公开中,可以通过视频放大网络(例如为经过训练的循环神经网络)基于混合编解码框架中的双向参考的帧内参考模式,参考每个非关键帧前后的关键帧的高分辨率帧和低分辨率帧(第一分辨率帧和第二分辨率帧)之间的关系和变化,结合前一非关键帧的放大结果,对该非关键帧进行放大。这种情况下,在对一个非关键帧进行放大时,视频放大网络的输入共有七个:
该非关键帧的第三分辨率帧(I t LR);
与该非关键帧相邻的前一关键帧对应的第一分辨率帧(I k1 SR)和对应的第二分辨率帧(I k1 LR);
与该非关键帧相邻的后一关键帧对应的第一分辨率帧(I k2 SR)和第二分辨率帧(I k2 LR);
该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)。
在步骤250中,通过所述视频放大网络,根据该非关键帧的前 一非关键帧放大前的第三分辨率帧(即I t-1 LR)和放大后的高分辨率帧(即I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(即I k1 SR、I k2 SR)和对应的第二分辨率帧(即I k1 LR、I k2 LR),对该非关键帧的第三分辨率帧(即I t LR)进行放大,得到非关键帧对应的放大后的目标帧。
本公开的视频放大网络可以为一个经过训练的循环神经网络,其工作原理是:通过输入相邻两个关键帧对应的第一分辨率帧(高分辨率帧)和第二分辨率帧(低分辨率帧)作为整个中间非关键帧的第三分辨率帧(低分辨率帧)的参考帧,同时还输入前一非关键帧放大前的第三分辨率帧和放大后的高分辨率帧,通过循环神经网络,根据各参考帧与当前帧时空之间的关系,可以对当前的非关键帧的第三分辨率帧进行放大。其中,当前的非关键帧参考了两个方向的参考帧,在场景发生变化的情况下,能够充分利用前向和后向的低分辨率帧与高分辨率帧的信息,保证对当前非关键帧的放大效果。
在一实施方式中,步骤220中,生成所述关键帧的残差图和第二分辨率帧,包括步骤221和步骤222。
在步骤221,对所述关键帧进行下采样,得到关键帧的第二分辨率帧。
本实施方式中,对关键帧进行下采样以得到关键帧的第二分辨率帧(低分辨率帧),下采样可以采用双线性或者双三次的方法,且解码过程对关键帧采用相同的方法上采样。
在步骤222,对所述关键帧的第二分辨率帧进行上采样,得到关键帧的第一分辨率帧的临时帧,将第一分辨率帧的临时帧与关键帧相减,得到关键帧的残差图。
本实施方式中,对关键帧的第二分辨率帧进行上采样以得到关键帧的第一分辨率帧的临时帧(高分辨率帧),第一分辨率帧的临时帧和原始视频中真实的关键帧(Ground Truth Frame)相减得到残差图。残差图可清晰反映出第一分辨率帧与原始视频关键帧之间的变化区域与非变化区域,降低编解码的处理量,节约编解码和传输资源。通过对残差图进行编码可以得到残差码流。在解码过程中,首先通过 解码得到关键帧的第二分辨率帧(低分辨率帧),然后通过上采样方法得到放大后的图(记为第四分辨率帧),通过解码残差码流得到残差图,然后将该放大后的图与残差图相加即可得到关键帧的第一分辨率帧。本实施方式中,对于残差图的编解码需要在高分辨率下进行,在设计芯片时,解码高分辨率帧和解码低分辨率帧的时钟需要分别设置。
在一实施方式中,步骤230可以包括步骤231至步骤233。
在步骤231,通过编码器对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行编码,得到残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流。
在步骤232,通过解码器对所述残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流进行解码,分别得到所述残差图、所述第二分辨率帧以及所述第三分辨率帧。
在步骤233,对所述第二分辨率帧进行上采样,得到放大后的第四分辨率帧,将所述第四分辨率帧与所述残差图相加,得到所述关键帧的第一分辨率帧。
本实施方式中,视频编解码的过程中,首先需要对残差图进行编码得到残差码流,同时也对第二分辨率帧以及第三分辨率帧进行编码(压缩)和传输;而在解码过程中,首先通过解码得到关键帧的第二分辨率帧(低分辨率帧),然后通过上采样方法得到放大后的图(记为第四分辨率帧),通过解码残差码流得到残差图,然后将该放大后的图与残差图相加即可得到关键帧的第一分辨率帧。本实施方式中,对于残差图的编解码需要在高分辨率下进行,在设计芯片时,解码高分辨率帧和解码低分辨率帧的时钟需要分别设置。
在一实施方式中,步骤110,可以包括步骤111至步骤115。
在步骤111,检测所述原始视频中的关键帧。
在步骤112,将所述关键帧等分为多个子图,各所述子图的分辨率与所述第二分辨率帧的分辨率相同,并对所述非关键帧进行下采样得到第三分辨率帧。
在步骤113,将各所述子图与所述非关键帧的第三分辨率帧输入 视频编解码器,所述视频编解码器包括编码器和解码器。
在步骤114,通过所述编码器对各所述子图与所述非关键帧的第三分辨率帧进行编码,得到各所述子图以及所述非关键帧对应的码流。
在步骤115,通过所述解码器根据各所述子图的码流对各所述子图进行拼接,得到所述关键帧的第一分辨率帧,并对所述第三分辨率帧的码流进行解码,得到所述第三分辨率帧。
本实施方式中,对于非关键帧采用下采样和编解码得到第三分辨率帧,而对于关键帧,采用下采样的方法得到第二分辨率帧,此外,通过将关键帧拆分为多个子图,然后再对多个子图进行拼接得到第一分辨率帧,即,将高分辨率的原始视频中的关键帧拆分为若干个相同大小的子图,通过编解码器再将各子图拼接成一个高分辨率图,得到的图即为该关键帧的第一分辨率帧,其分辨率高于原始视频中关键帧的分辨率。采用拆分方法与上采样方法均可得到关键帧的第一分辨率帧,本实施方式中,拆分方法主要适用于时效性要求不高的视频处理场景,而在超分辨率方法需要放大倍数比较大的情况下,如果采用拆分方法,需要拆分的子图较多,导致解码过程较长,延迟相对较大。
在一实施方式中,视频放大网络包括运动估计(Motion Estimation,ME)子网络、运动补偿(Motion Compansation,MC)子网络和放大子网络。
本实施方式中,视频放大网络包括三种子网络:运动估计子网络、运动补偿子网络和放大子网络,在放大过程中,也考虑了低分辨帧的运动矢量的估计和高分辨帧的运动补偿,能够获取参考帧更充分的信息,提高视频放大的质量。
在一实施方式中,步骤250可以包括步骤251至步骤253。
在步骤251,基于该非关键帧的第三分辨率帧(I t LR),通过所述运动估计子网络,分别对该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)进行运动估计。
在帧间预测过程中,由于邻近帧中的场景存在着一定的相关性,可以将该非关键帧分成若干块或宏块,并设法搜索出每个块或宏块在 邻近帧中的位置,并得出两者之间的空间位置的相对偏移量,即运动矢量,通过矢量运动估计可以去除帧间冗余度,使得视频传输的比特数大为减少。
本实施方式中,对于每个非关键帧,分别估计该非关键帧的低分辨率帧I t LR与另外三种低分辨率帧之间的运动矢量,即,估计I t LR与I t-1 LR、I t LR与I k1 LR、I t LR与I k2 LR之间的运动矢量。
在步骤252,分别根据前一非关键帧放大后的高分辨率帧(I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR),对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,并通过重组操作将所述中间高分辨率特征图转化为低分辨率特征图。
本实施方式中,对于每个非关键帧,对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,然后将通过重组操作将所述中间高分辨率特征图转化为低分辨率特征图。通过运动估计可粗略估计得到运动矢量,通过运动补偿可以将估计结果细化,更加明确帧间之间的偏移,使放大子网络得到更加有效的输入。
在步骤253,通过所述放大子网络根据所述低分辨率特征图对所述第三分辨率帧进行放大,得到非关键帧对应的放大后的目标帧。
本公开的视频处理方法,利用帧间的时空关系对第三分辨率帧进行放大,使得放大后的视频不会抖动,视觉效果更为连贯。
图4为本公开提供的一种视频放大网络的原理示意图。本公开中,视频放大网络采用卷积循环神经网络实现对混合分辨率视频中低分辨率帧的放大。视频放大网络的网络结构如图4所示,包括三个部分:运动估计子网络、运动补偿子网络和放大子网络,放大子网络可以为一个超分辨率网络(Super Resolution Network,SRNet)。其中,运动估计子网络共有三个,分别用于估计I t LR与I t-1 LR、I t LR与I k1 LR、I t LR与I k2 LR之间的运动矢量,三个运动估计子网络的参数共享。I LR表示经过编解码后的混合分辨率视频中的低分辨率帧,I SR表示混合分辨率视频中的高分辨率帧,例如,I k1 LR和I k2 LR分别为前一关键帧 和后一关键帧对应的低分辨率帧(第二分辨率帧),I k1 SR和I k2 SR为前一关键帧和后一关键帧对应的高分辨率帧(第一分辨率帧),I t LR表示需要放大的当前的非关键帧的低分辨率帧(第三分辨率帧),I t-1 LR和I t-1 SR分别为前一非关键帧的低分辨率帧和经过放大后的高分辨率帧。
在一实施方式中,视频放大过程还可以包括以下操作:
“↑”,表示上采样操作;
像素偏移(Warp)操作,表示将高分辨率的像素移动到当前像素位置加上运动矢量后的位置;
重组(Shuffle)操作,表示空间向通道(Channel)特征图的重组操作。
图5为本公开提供的一种运动估计子网络的示意图。如图5所示,运动估计子网络可以包括卷积层(Conv),卷积核的通道数可以为32、64、128、256等;激活层,如带泄露修正线性单元(Leaky Rectified Linear Unit)激励层、双曲(Tanh)激励层;最大池化层(Max Pool)以及双线性(Bilinear)变换层。
图6为本公开提供的一种运动补偿子网络的示意图。如图6所示,运动估计子网络包括级联层(Concate),用于将两个及以上的特征图在channel上进行拼接;卷积层(Conv),卷积核的通道数可以为64、128、256等;残差块(Residual Block);下采样层(表示为“↓”)和上采样层(表示为“↑”)。
图7为本公开提供的一种放大子网络的示意图。如图7所示,运动估计子网络包括卷积层(Conv)、修正线性单元激励层(ReLU)、反卷积层(Conv Transpose)。
在一实施方式中,步骤252可以包括步骤2521至步骤2523。
在步骤2521,分别对经过运动估计的各帧对应的运动矢量MV t-1 LR、MV k1 LR和MV k2 LR进行上采样得到MV t-1 SR、MV k1 SR和MV k2 SR;其中,MV t-1 LR、MV k1 LR和MV k2 LR分别为前一非关键帧放大前的第三分辨率帧(I t-1 LR)、该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)分别与该非关键帧的第三分辨率帧经过运 动估计后对应的放大后的运动矢量;
在步骤2522,基于MV t-1 SR、MV k1 SR和MV k2 SR,通过像素偏移操作,分别对前一非关键帧放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)进行像素偏移,分别得到像素偏移后的I t-1->t SR、I k1->t SR和I k2->t SR
在步骤2523,将I t-1->t SR、I k1->t SR、I k2->t SR、MV t-1 SR、MV k1 SR和MV k2 SR输入所述补偿子网络,经过所述补偿子网络的运动补偿后得到所述中间高分辨率特征图。
如图4所示,通过运动估计子网络,基于I t LR,分别对除当前非关键帧的低分辨率帧以外的三种低分辨率帧(I k1 LR、I k2 LR、I t-1 LR)分别进行运动估计,得到对应的运动矢量MV t-1 LR、MV k1 LR和MV k2 LR;对经过运动估计的各运动矢量MV t-1 LR、MV k1 LR和MV k2 LR分别进行上采样,得到放大的MV t-1 SR、MV k1 SR和MV k2 SR;然后使用MV t-1 SR、MV k1 SR和MV k2 SR,通过像素偏移操作,分别对I t-1 SR、I k1 SR、I k2 SR进行像素偏移,得到像素偏移后的I t-1->t SR、I k1->t SR和I k2->t SR;将I t-1->t SR、I k1->t SR、I k2->t SR、MV t-1 SR、MV k1 SR和MV k2 SR输入所述补偿子网络,经过所述补偿子网络的运动补偿后得到所述中间高分辨率特征图。在此基础上,通过放大子网络,根据中间高分辨率特征图对非关键帧的第三分辨率帧进行放大,得到非关键帧对应的放大后的目标帧。对于相邻的两个关键帧之间的每个非关键帧,都按此过程执行得到放大后的目标帧,最终实现所有非关键帧的第三分辨率帧的放大,得到高质量的超分辨率视频。
在一实施方式中,若该非关键帧为相邻的前一关键帧后的第一个非关键帧,则将所述前一关键帧的第二分辨率帧作为该非关键帧的前一非关键帧放大前的第三分辨率帧,并将所述前一关键帧的第一分辨率帧作为该非关键帧的前一非关键帧放大后的高分辨率帧。
本实施方式中,如果当前需要放大的非关键帧为其相邻的前一关键帧后的第一个非关键帧,即,该非关键帧与相邻的前一关键帧之间没有其他非关键帧,这种情况下,在该非关键帧所处的两个关键帧的区间内,不存在可用于作为参考帧的前一非关键帧的低分辨率帧 (I t-1 LR),也不存在对应的放大后的高分辨率帧(I t-1 SR),则可以将前一非关键帧的低分辨率帧(I t-1 LR)替换为前一关键帧的低分辨率帧(第二分辨率帧,I k1 LR),并将前一非关键帧的高分辨率帧(I t-1 SR)替换为前一关键帧的高分辨率帧(第一分辨率帧,I k1 SR),从而在不存在前一非关键帧的情况下,利用前关键帧作为参考帧,为放大算法提供完整依据,确保算法的可行性。
在一实施方式中,在根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大之前,所述方法还包括:步骤100:基于样本视频,根据第一损失函数和第二损失函数训练所述视频放大网络;其中,第一损失函数根据所述原始视频中两个相邻关键帧之间的非关键帧的原始高分辨率帧与原始高分辨率经过下采样然后通过视频放大网络放大后的帧的均方误差确定;第二损失函数根据所述非关键帧的原始高分辨率帧、所述非关键帧的第三分辨率帧与前一非关键帧的第三分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I t-1->t SR、所述非关键帧的第三分辨率帧与相邻的前一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k1->t SR、所述非关键帧的第三分辨率帧与相邻的后一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k2->t SR的均方误差确定。
视频放大网络预先经过训练,具有较好的放大能力。本实施方式中,可以从数据库中获取不同分辨率的高分辨率视频,将各高分辨率视频经过编解码处理后的混合分辨率视频分别作为样本视频用于视频放大网络的训练,样本视频中包含的内容和特征应足够丰富。
本实施方式中,根据第一损失函数和第二损失函数训练视频放大网络,其中,第二损失函数主要用于训练视频放大网络中的一个子网络(运动估计子网络),对于运动估计子网络可进行单独训练,并有针对性地构建第二损失函数。
第一损失函数例如为:
Figure PCTCN2021097686-appb-000001
其中,G表示相邻的两个关键帧和中间的非关键帧的总和,可以看出, 第一损失函数与两帧关键帧无关。
第二损失函数例如为:
Loss ME=MSE(I t HR,W b(I k1 SR,f t→k1))+MSE(I t HR,W b(I k1 SR,f t→k2))+MSE(I t HR,W b(I t-1 SR,f t→t-1)),其中,I t HR表示当前待放大的非关键帧的原始高分辨率帧,I t-1 SR表示前一非关键帧放大后的高分辨率帧,f t→t-1表示该非关键帧到前一非关键帧之间的运动矢量,f t→k1表示该非关键帧到前一关键帧之间的运动矢量,f t→k2表示该非关键帧到后一关键帧之间的运动矢量,W b(Backward Warp)表示反向像素偏移(Warp)操作。视频放大网络对于混合分辨率视频进行放大的原理可参见上述实施方式。当第一损失函数以及第二损失函数分别稳定在一个值附近时,视频放大网络训练完成,此时的视频放大网络具有可靠的放大能力,可应用于对其他视频对应的混合分辨率视频中的低分辨率帧进行放大。
本实施方式中,选取的样本视频是真实有效的且具有多样性,样本视频输入到视频编解码器中得到放大网络的训练数据也是真实可靠的,而不是人为生成的。将视频编解码器作为一个整体,视频放大网络的训练数据从解码端获得,真实有效,训练得到的视频放大网络也更具针对性,从而保证训练效果、提高视频放大网络的放大能力和放大后的视频质量。
在一实施方式中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;步骤100包括:根据所述第一损失函数训练所述视频放大网络,并根据第二损失函数对所述运动估计子网络进行训练,直至第一损失函数和第二损失函数满足设定条件。其中,设定条件是指,第一损失函数和第二损失函数分别稳定在一个值附近,例如,第一损失函数和第二损失函数的值的波动范围小于一定阈值。
在一实施方式中,在根据第一损失函数和第二损失函数训练所述视频放大网络之前,所述方法还包括:步骤101:对样本视频进行预处理,得到预设分辨率的样本视频。
在一些实施方式中,可以将获取到的高分辨率视频截取部分内容,使其符合预设分辨率(例如为256*256),然后输入至视频编解码器,经过编码得到混合分辨率的码流,经过解码得到混合分辨率帧。
本公开的视频处理方法,利用残差图进行编解码,只需处理关键帧的变化区域,能够降低关键帧编解码的码率;在视频放大过程中,采用双向关键帧作为参考帧,在场景发生变化的情况下,能够充分利用前向和后向的低分辨率帧与高分辨率帧的信息,保证对当前非关键帧的放大效果;对关键帧采用下采样的方法得到第二分辨率帧,减少延迟,提高时效性;在放大过程中,也考虑了低分辨帧的运动矢量的估计和运动补偿,能够获取参考帧更充分的信息,提高视频放大的质量;通过矢量运动估计可以去除帧间冗余度,使得视频传输的比特数大为减少;此外,视频放大网络的训练数据从解码端获得,真实有效,训练得到的视频放大网络也更具针对性,从而保证训练效果、提高视频放大网络的放大能力和放大后的视频质量。
本公开还提供一种视频处理装置。图8为本公开提供的一种视频处理装置的结构示意图。如图8所示,所述视频处理装置包括:编解码模块310和放大模块320。
编解码模块310,设置为通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率。
放大模块320,设置为根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
本公开的视频处理装置,通过对原始视频进行编解码,得到混合分辨率视频;根据关键帧对应的第一分辨率帧和第二分辨率帧,对混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,对于非关键帧只需编解码得到低分辨率帧,然后根据关键帧对低分辨率帧进行放大,以较少的码流得到放大的高分辨率视频,降低了视频传输的 带宽压力,提高了视频放大的主客观质量。
在一实施方式中,编解码模块310,包括:检测单元,设置为检测所述原始视频中的关键帧;帧处理单元,设置为生成所述关键帧的残差图和第二分辨率帧,并对所述非关键帧进行下采样得到第三分辨率帧;编解码单元,设置为对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码后,得到混合分辨率视频。
在一实施方式中,生成所述关键帧的残差图和第二分辨率帧,包括:对所述关键帧进行下采样,得到所述关键帧的第二分辨率帧;对所述关键帧进行上采样,得到所述关键帧的第一分辨率帧的临时帧,将所述第一分辨率帧的临时帧与所述关键帧相减,得到所述关键帧的残差图。
在一实施方式中,编解码单元,可设置为:通过得到编码器对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行编码,得到残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流;通过所述解码器对所述残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流进行解码,分别得到所述残差图、所述第二分辨率帧以及所述第三分辨率帧;对所述第二分辨率帧进行上采样,得到放大后的第四分辨率帧,将所述第四分辨率帧与所述残差图相加,得到所述关键帧的第一分辨率帧。
在一实施方式中,编解码模块310,包括:检测单元,设置为检测所述原始视频中的关键帧;帧处理单元,设置为将所述关键帧等分为多个子图,各所述子图的分辨率与所述第二分辨率帧的分辨率相同,并对所述非关键帧进行下采样得到第三分辨率帧;编码单元,设置为将各所述子图与所述非关键帧的第三分辨率帧输入视频编解码器,所述视频编解码器包括编码器和解码器;通过所述编码器对各所述子图与所述非关键帧的第三分辨率帧进行编码,得到各所述子图以及所述非关键帧对应的码流;解码单元,设置为通过所述解码器根据各所述子图的码流对各所述子图进行拼接,得到所述关键帧的第一分辨率帧,并对所述第三分辨率帧的码流进行解码,得到所述第三分辨率帧。
在一实施方式中,放大模块320,包括:输入单元,设置为对于 每个非关键帧,分别将该非关键帧的第三分辨率帧(I t LR)、该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)、以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)和对应的第二分辨率帧(I k1 LR、I k2 LR)输入视频放大网络;放大单元,设置为通过所述视频放大网络,根据该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)和对应的第二分辨率帧(I k1 LR、I k2 LR),对该非关键帧的第三分辨率帧(I t LR)进行放大,得到该非关键帧对应的放大后的目标帧。
在一实施方式中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;放大单元,可设置为:基于该非关键帧的第三分辨率帧(I t LR),通过所述运动估计子网络,分别对该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)进行运动估计;分别根据前一非关键帧放大后的高分辨率帧(I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR),对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,并通过重组操作将所述中间高分辨率特征图转化为低分辨率特征图;通过所述放大子网络根据所述低分辨率特征图对所述第三分辨率帧进行放大,得到非关键帧对应的放大后的目标帧。
在一实施方式中,分别根据前一非关键帧放大后的高分辨率帧(I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR),对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,包括:分别对经过运动估计的各帧对应的运动矢量MV t-1 LR、MV k1 LR和MV k2 LR进行上采样得到MV t-1 SR、MV k1 SR和MV k2 SR;其中,MV t-1 LR、MV k1 LR和MV k2 LR分别为前一非关键帧放大前的第三分辨率帧(I t-1 LR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)分别 与该非关键帧的第三分辨率帧经过运动估计后对应的放大后的运动矢量;基于MV t-1 SR、MV k1 SR和MV k2 SR,通过像素偏移操作,分别对前一非关键帧放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)进行像素偏移,分别得到像素偏移后的I t-1->t SR、I k1->t SR和I k2->t SR;将I t-1->t SR、I k1->t SR、I k2->t SR、MV t-1 SR、MV k1 SR和MV k2 SR输入所述补偿子网络,经过所述补偿子网络的运动补偿后得到所述中间高分辨率特征图。
在一实施方式中,若该非关键帧为相邻的前一关键帧后的第一个非关键帧,则将所述前一关键帧的第二分辨率帧作为该非关键帧的前一非关键帧放大前的第三分辨率帧,并将所述前一关键帧的第一分辨率帧作为该非关键帧的前一非关键帧放大后的高分辨率帧。
在一实施方式中,所述装置还包括:训练模块,设置为在根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大之前,基于样本视频,根据第一损失函数和第二损失函数训练所述视频放大网络;其中,所述第一损失函数根据所述原始视频中两个相邻关键帧之间的非关键帧的原始高分辨率帧与原始高分辨率经过下采样然后通过视频放大网络放大后的帧均方误差确定;第二损失函数根据所述非关键帧的原始高分辨率帧、所述非关键帧的第三分辨率帧与前一非关键帧的第三分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I t-1->t SR、所述非关键帧的第三分辨率帧与相邻的前一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k1->t SR、所述非关键帧的第三分辨率帧与相邻的后一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k2->t SR的均方误差确定。
在一实施方式中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;所述根据第一损失函数和第二损失函数训练所述视频放大网络,包括:根据所述第一损失函数训练所述视频放大网络,并根据第二损失函数对所述运动估计子网络进行训练,直至所述第一损失函数和所述第二损失函数满足设定条件。
在一实施方式中,所述装置还包括:预处理模块,设置为在根 据第一损失函数和第二损失函数训练所述视频放大网络之前,对样本视频进行预处理,得到预设分辨率的样本视频。
本公开提出的视频处理装置与上述的视频处理方法属于同一构思,未在视频处理装置的实施方式中详尽描述的技术细节可参见上述任意实施方式,并且视频处理装置具备与执行视频处理方法相同的有益效果。
本公开还提供一种视频处理***。图9为本公开提供的一种视频处理***的结构示意图。如图9所示,该***包括:视频编解码器410,用于处理原始视频,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;视频放大网络420,用于根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
在一实施方式中,视频编解码器410,可用于:检测所述原始视频中的关键帧;生成所述关键帧的残差图和第二分辨率帧,并对所述非关键帧进行下采样得到第三分辨率帧;对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码后,得到混合分辨率视频。
在一实施方式中,生成所述关键帧的残差图和第二分辨率帧,包括:对所述关键帧进行下采样,得到所述关键帧的第二分辨率帧;对所述关键帧进行上采样,得到所述关键帧的第一分辨率帧的临时帧,将所述第一分辨率帧的临时帧与所述关键帧相减,得到所述关键帧的残差图。
在一实施方式中,对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码后,得到混合分辨率视频,包括:通过所 述编码器对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行编码,得到残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流;通过所述解码器对所述残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流进行解码,分别得到所述残差图、所述第二分辨率帧以及所述第三分辨率帧;对所述第二分辨率帧进行上采样,得到放大后的第四分辨率帧,将所述第四分辨率帧与所述残差图相加,得到所述关键帧的第一分辨率帧。
在一实施方式中,视频放大网络420,可用于:对于每个非关键帧,分别将该非关键帧的第三分辨率帧(I t LR)、该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)、以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)和对应的第二分辨率帧(I k1 LR、I k2 LR)输入视频放大网络;放大单元,设置为通过所述视频放大网络,根据该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)和对应的第二分辨率帧(I k1 LR、I k2 LR),对该非关键帧的第三分辨率帧(I t LR)进行放大,得到该非关键帧对应的放大后的目标帧。
在一实施方式中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;根据该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)和放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)和对应的第二分辨率帧(I k1 LR、I k2 LR),对该非关键帧的第三分辨率帧(I t LR)进行放大,得到该非关键帧对应的放大后的目标帧,包括基于该非关键帧的第三分辨率帧(I t LR),通过所述运动估计子网络,分别对该非关键帧的前一非关键帧放大前的第三分辨率帧(I t-1 LR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)进行运动估计;分别根据前一非关键帧放大后的高分辨率帧(I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR),对经过运动估计的各帧以及上采样后的运 动矢量进行运动补偿,得到中间高分辨率特征图,并通过重组操作将所述中间高分辨率特征图转化为低分辨率特征图;通过所述放大子网络根据所述低分辨率特征图对所述第三分辨率帧进行放大,得到非关键帧对应的放大后的目标帧。
在一实施方式中,分别根据前一非关键帧放大后的高分辨率帧(I t-1 SR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR),对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,包括:分别对经过运动估计的各帧对应的运动矢量MV t-1 LR、MV k1 LR和MV k2 LR进行上采样得到MV t-1 SR、MV k1 SR和MV k2 SR;其中,MV t-1 LR、MV k1 LR和MV k2 LR分别为前一非关键帧放大前的第三分辨率帧(I t-1 LR)以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧(I k1 LR、I k2 LR)分别与该非关键帧的第三分辨率帧经过运动估计后对应的放大后的运动矢量;基于MV t-1 SR、MV k1 SR和MV k2 SR,通过像素偏移Warp操作,分别对前一非关键帧放大后的高分辨率帧(I t-1 SR)、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧(I k1 SR、I k2 SR)进行像素偏移,分别得到像素偏移后的I t-1->t SR、I k1->t SR和I k2->t SR;将I t-1->t SR、I k1->t SR、I k2->t SR、MV t-1 SR、MV k1 SR和MV k2 SR输入所述补偿子网络,经过所述补偿子网络的运动补偿后得到所述中间高分辨率特征图。
在一实施方式中,若该非关键帧为相邻的前一关键帧后的第一个非关键帧,则将所述前一关键帧的第二分辨率帧作为该非关键帧的前一非关键帧放大前的第三分辨率帧,并将所述前一关键帧的第一分辨率帧作为该非关键帧的前一非关键帧放大后的高分辨率帧。
在一实施方式中,在根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大之前,基于样本视频,根据第一损失函数和第二损失函数训练所述视频放大网络;其中,所述第一损失函数根据所述原始视频中两个相邻关键帧之间的非关键帧的原始高分辨率帧与原始高分辨率经过下采样然后通过视频放大网络放大后的帧均方误差确定;第二损失函数根据所述 非关键帧的原始高分辨率帧、所述非关键帧的第三分辨率帧与前一非关键帧的第三分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I t-1->t SR、所述非关键帧的第三分辨率帧与相邻的前一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k1->t SR、所述非关键帧的第三分辨率帧与相邻的后一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的I k2->t SR的均方误差确定。
在一实施方式中,视频放大网络420包括运动估计子网络、运动补偿子网络和放大子网络;所述根据第一损失函数和第二损失函数训练所述视频放大网络,包括:根据所述第一损失函数训练所述视频放大网络,并根据第二损失函数对所述运动估计子网络进行训练,直至所述第一损失函数和所述第二损失函数满足设定条件。
在一实施方式中,在根据第一损失函数和第二损失函数训练所述视频放大网络之前,对样本视频进行预处理,得到预设分辨率的样本视频。
本公开提供的视频处理***与上述提出的视频处理方法属于同一构思,未在视频处理***的实施方式中详尽描述的技术细节可参见上述任意实施方式,并且视频处理***具备与执行视频处理方法相同的有益效果。
本公开还提供一种设备。所述视频处理方法可以由视频处理装置执行,该视频处理装置可以通过软件和/或硬件的方式实现,并集成在所述设备中。
图10为本公开提供的一种设备的硬件结构示意图。如图10所示,本公开提供的一种设备,包括:处理器610和存储装置520。该设备中的处理器可以是一个或多个,图10中以一个处理器610为例,所述设备中的处理器610和存储装置620可以通过总线或其他方式连接,图10中以通过总线连接为例。该设备中的存储装置620作为一种计算机可读存储介质,可用于存储一个或多个程序。
所述一个或多个程序被所述一个或多个处理器610执行,使得 所述一个或多个处理器实现上述任一实施方式所述的视频处理方法。该方法包括:通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
所述程序可以是软件程序、计算机可执行程序以及模块,如本公开中视频处理方法对应的程序指令/模块(例如,附图8所示的视频处理装置中的模块,包括:编解码模块310和放大模块320)。处理器610通过运行存储在存储装置620中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述方法实施方式中的视频处理方法。
存储装置620主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等(如上述实施方式中的混合分辨率视频、第三分辨率帧等)。此外,存储装置620可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置620可进一步包括相对于处理器610远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
并且,当上述设备中所包括一个或者多个程序被所述一个或者多个处理器610执行时,实现如下操作:通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高 于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
本公开提出的设备与上述提出的视频处理方法属于同一构思,未在设备的实施方式中详尽描述的技术细节可参见上述任意实施方式,并且本设备具备与执行视频处理方法相同的有益效果。
本公开还提供一种包含计算机可执行指令的存储介质,计算机可执行指令在由计算机处理器执行时用于执行一种视频处理方法。该方法包括:通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
通过以上关于实施方式的描述,所属领域的技术人员可以了解到,本公开可借助软件及通用硬件来实现,也可以通过硬件实现。基于这样的理解,本公开的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开任意实施方式所述的方法。
以上所述,仅为本公开的示例性实施方式而已,并非用于限定本公开的保护范围。
本公开附图中的任何逻辑流程的框图可以表示程序步骤,或者 可以表示相互连接的逻辑电路、模块和功能,或者可以表示程序步骤与逻辑电路、模块和功能的组合。计算机程序可以存储在存储器上。存储器可以具有任何适合于本地技术环境的类型并且可以使用任何适合的数据存储技术实现,例如但不限于只读存储器(ROM)、随机访问存储器(RAM)、光存储器装置和***(数码多功能光碟DVD或CD光盘)等。计算机可读介质可以包括非瞬时性存储介质。数据处理器可以是任何适合于本地技术环境的类型,例如但不限于通用计算机、专用计算机、微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、可编程逻辑器件(FGPA)以及基于多核处理器架构的处理器。
通过示范性和非限制性的示例,上文已提供了对本公开的示范实施方式的详细描述。但结合附图和权利要求来考虑,对以上实施方式的多种修改和调整对本领域技术人员来说是显而易见的,但不偏离本公开的范围。因此,本公开的恰当范围将根据权利要求确定。

Claims (14)

  1. 一种视频处理方法,包括:
    通过对原始视频进行编解码,得到混合分辨率视频,所述混合分辨率视频包括所述原始视频中的关键帧对应的第一分辨率帧和第二分辨率帧,还包括所述原始视频中的非关键帧对应的第三分辨率帧,其中,所述第一分辨率帧的分辨率高于所述第二分辨率帧的分辨率和所述第三分辨率帧的分辨率;
    根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的非关键帧对应的第三分辨率帧进行放大,并输出放大后的视频,其中,所述放大后的视频包括所述关键帧对应的第一分辨率帧以及所述非关键帧对应的放大后的目标帧。
  2. 根据权利要求1所述的方法,其中,所述通过对原始视频进行编解码,得到混合分辨率视频,包括:
    检测所述原始视频中的关键帧;
    生成所述关键帧的残差图和第二分辨率帧,并对所述非关键帧进行下采样得到第三分辨率帧;
    对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码后,得到混合分辨率视频。
  3. 根据权利要求2所述的方法,其中,生成所述关键帧的残差图和第二分辨率帧,包括:
    对所述关键帧进行下采样,得到所述关键帧的第二分辨率帧;
    对所述关键帧进行上采样,得到所述关键帧的第一分辨率帧的临时帧,将所述第一分辨率帧的临时帧与所述关键帧相减,得到所述关键帧的残差图。
  4. 根据权利要求2所述的方法,其中,对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行视频编解码,得到混合分辨率视频,包括:
    通过编码器对所述残差图、所述第二分辨率帧以及所述第三分辨率帧进行编码,得到残差码流、所述第二分辨率帧的码流以及所述第 三分辨率帧的码流;
    通过解码器对所述残差码流、所述第二分辨率帧的码流以及所述第三分辨率帧的码流进行解码,分别得到所述残差图、所述第二分辨率帧以及所述第三分辨率帧;
    对所述第二分辨率帧进行上采样,得到放大后的第四分辨率帧,将所述第四分辨率帧与所述残差图相加,得到所述关键帧的第一分辨率帧。
  5. 根据权利要求1所述的方法,其中,所述通过对原始视频进行编解码,得到混合分辨率视频,包括:
    检测所述原始视频中的关键帧;
    将所述关键帧等分为多个子图,各所述子图的分辨率与所述第二分辨率帧的分辨率相同,并对所述非关键帧进行下采样得到第三分辨率帧;
    将各所述子图与所述非关键帧的第三分辨率帧输入视频编解码器,所述视频编解码器包括编码器和解码器;
    通过所述编码器对各所述子图与所述非关键帧的第三分辨率帧进行编码,得到各所述子图以及所述非关键帧对应的码流;
    通过所述解码器根据各所述子图的码流对各所述子图进行拼接,得到所述关键帧的第一分辨率帧,并对所述第三分辨率帧的码流进行解码,得到所述第三分辨率帧。
  6. 根据权利要求1所述的方法,其中,根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大,包括:
    对于每个非关键帧,分别将该非关键帧的第三分辨率帧、该非关键帧的前一非关键帧放大前的第三分辨率帧和放大后的高分辨率帧、以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧和对应的第二分辨率帧输入视频放大网络;
    通过所述视频放大网络,根据该非关键帧的前一非关键帧放大前的第三分辨率帧和放大后的高分辨率帧、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧和对应的第二分辨率帧,对该 非关键帧的第三分辨率帧进行放大,得到该非关键帧对应的放大后的目标帧。
  7. 根据权利要求6所述的方法,其中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;
    通过所述视频放大网络,根据该非关键帧的前一非关键帧放大前的第三分辨率帧和放大后的高分辨率帧、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧和对应的第二分辨率帧,对该非关键帧的第三分辨率帧进行放大,得到该非关键帧对应的放大后的目标帧,包括:
    基于该非关键帧的第三分辨率帧,通过所述运动估计子网络,分别对该非关键帧的前一非关键帧放大前的第三分辨率帧以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧进行运动估计;
    分别根据前一非关键帧放大后的高分辨率帧以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧,对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,并通过重组操作将所述中间高分辨率特征图转化为低分辨率特征图;
    通过所述放大子网络根据所述低分辨率特征图对所述第三分辨率帧进行放大,得到非关键帧对应的放大后的目标帧。
  8. 根据权利要求7所述的方法,其中,分别根据前一非关键帧放大后的高分辨率帧以及与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧,对经过运动估计的各帧以及上采样后的运动矢量进行运动补偿,得到中间高分辨率特征图,包括:
    分别对经过运动估计的各帧对应的运动矢量MV t-1 LR、MV k1 LR和MV k2 LR进行上采样得到MV t-1 SR、MV k1 SR和MV k2 SR,其中,MV t-1 LR、MV k1 LR和MV k2 LR分别为前一非关键帧放大前的第三分辨率帧、该非关键帧相邻的前一关键帧和后一关键帧对应的第二分辨率帧分别与该非关键帧的第三分辨率帧经过运动估计后对应的放大后的运动矢量;
    基于MV t-1 SR、MV k1 SR和MV k2 SR,通过像素偏移操作,分别对前一非关键帧放大后的高分辨率帧、与该非关键帧相邻的前一关键帧和后一关键帧对应的第一分辨率帧进行像素偏移,分别得到像素偏移后的各帧I t-1->t SR、I k1->t SR和I k2->t SR
    将I t-1->t SR、I k1->t SR、I k2->t SR、MV t-1 SR、MV k1 SR和MV k2 SR输入所述补偿子网络,经过所述补偿子网络的运动补偿后得到所述中间高分辨率特征图。
  9. 根据权利要求6所述的方法,其中,若该非关键帧为相邻的前一关键帧后的第一个非关键帧,则将所述前一关键帧的第二分辨率帧作为该非关键帧的前一非关键帧放大前的第三分辨率帧,并将所述前一关键帧的第一分辨率帧作为该非关键帧的前一非关键帧放大后的高分辨率帧。
  10. 根据权利要求8所述的方法,其中,在根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大之前,所述方法还包括:
    基于样本视频,根据第一损失函数和第二损失函数训练视频放大网络,所述视频放大网络用于根据所述关键帧对应的第一分辨率帧和第二分辨率帧,对所述混合分辨率视频中的第三分辨率帧进行放大;
    其中,所述第一损失函数根据所述原始视频中两个相邻关键帧之间的非关键帧的原始高分辨率帧与原始高分辨率经过下采样然后通过视频放大网络放大后的帧的均方误差确定;
    所述第二损失函数根据所述非关键帧的原始高分辨率帧、所述非关键帧的第三分辨率帧与前一非关键帧的第三分辨率帧经过运动估计、运动矢量上采样、像素偏移后的帧I t-1->t SR、所述非关键帧的第三分辨率帧与相邻的前一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的帧I k1->t SR、所述非关键帧的第三分辨率帧与相邻的后一关键帧的第二分辨率帧经过运动估计、运动矢量上采样、像素偏移后的帧I k2->t SR的均方误差确定。
  11. 根据权利要求10所述的方法,其中,所述视频放大网络包括运动估计子网络、运动补偿子网络和放大子网络;
    所述根据第一损失函数和第二损失函数训练所述视频放大网络,包括:
    根据所述第一损失函数训练所述视频放大网络,并根据所述第二损失函数对所述运动估计子网络进行训练,直至所述第一损失函数和所述第二损失函数满足设定条件。
  12. 根据权利要求10所述的方法,其中,在根据第一损失函数和第二损失函数训练所述视频放大网络之前,还包括:
    对样本视频进行预处理,得到预设分辨率的样本视频。
  13. 一种设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任一所述的视频处理方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-12中任一所述的视频处理方法。
PCT/CN2021/097686 2020-06-15 2021-06-01 视频处理方法、设备及存储介质 WO2021254139A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21826930.6A EP4167570A4 (en) 2020-06-15 2021-06-01 VIDEO PROCESSING METHOD AND DEVICE AND RECORDING MEDIUM
US18/010,356 US20230269395A1 (en) 2020-06-15 2021-06-01 Video processing method, device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010544611.8 2020-06-15
CN202010544611.8A CN113810763A (zh) 2020-06-15 2020-06-15 一种视频处理方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021254139A1 true WO2021254139A1 (zh) 2021-12-23

Family

ID=78944414

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097686 WO2021254139A1 (zh) 2020-06-15 2021-06-01 视频处理方法、设备及存储介质

Country Status (4)

Country Link
US (1) US20230269395A1 (zh)
EP (1) EP4167570A4 (zh)
CN (1) CN113810763A (zh)
WO (1) WO2021254139A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523758A (zh) * 2023-07-03 2023-08-01 清华大学 基于关键帧的端云联合超分辨率视频重建方法及***

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4145383A4 (en) * 2020-07-14 2023-06-21 Guangdong Oppo Mobile Telecommunications Corp., Ltd. VIDEO PROCESSING METHOD AND APPARATUS, DEVICE, SET-TOP, SYSTEM AND STORAGE MEDIA
CN113347421B (zh) * 2021-06-02 2023-07-14 黑芝麻智能科技(上海)有限公司 视频编码和解码方法、装置和计算机设备
CN114827669B (zh) * 2022-03-31 2023-08-18 杭州网易智企科技有限公司 一种视频数据的传输方法、装置、介质及设备
CN115208959B (zh) * 2022-05-30 2023-12-12 武汉市水务集团有限公司 一种物联网安全通信***
CN115361582B (zh) * 2022-07-19 2023-04-25 鹏城实验室 一种视频实时超分辨率处理方法、装置、终端及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006078109A1 (en) * 2005-01-21 2006-07-27 Samsung Electronics Co., Ltd. Method of multi-layer based scalable video encoding and decoding and apparatus for the same
CN1939065A (zh) * 2004-03-31 2007-03-28 皇家飞利浦电子股份有限公司 视频数据的运动估计和分段
CN102196256A (zh) * 2010-03-11 2011-09-21 中国科学院微电子研究所 一种视频编码方法及装置
CN102714726A (zh) * 2010-01-15 2012-10-03 杜比实验室特许公司 使用元数据的用于时间缩放的边缘增强
CN103607591A (zh) * 2013-10-28 2014-02-26 四川大学 结合超分辨率重建的图像压缩方法
CN103813174A (zh) * 2012-11-12 2014-05-21 腾讯科技(深圳)有限公司 一种混合分辨率编解码方法和装置
WO2019197661A1 (en) * 2018-04-13 2019-10-17 Koninklijke Kpn N.V. Frame-level super-resolution-based video coding

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1939065A (zh) * 2004-03-31 2007-03-28 皇家飞利浦电子股份有限公司 视频数据的运动估计和分段
WO2006078109A1 (en) * 2005-01-21 2006-07-27 Samsung Electronics Co., Ltd. Method of multi-layer based scalable video encoding and decoding and apparatus for the same
CN102714726A (zh) * 2010-01-15 2012-10-03 杜比实验室特许公司 使用元数据的用于时间缩放的边缘增强
CN102196256A (zh) * 2010-03-11 2011-09-21 中国科学院微电子研究所 一种视频编码方法及装置
CN103813174A (zh) * 2012-11-12 2014-05-21 腾讯科技(深圳)有限公司 一种混合分辨率编解码方法和装置
CN103607591A (zh) * 2013-10-28 2014-02-26 四川大学 结合超分辨率重建的图像压缩方法
WO2019197661A1 (en) * 2018-04-13 2019-10-17 Koninklijke Kpn N.V. Frame-level super-resolution-based video coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4167570A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523758A (zh) * 2023-07-03 2023-08-01 清华大学 基于关键帧的端云联合超分辨率视频重建方法及***
CN116523758B (zh) * 2023-07-03 2023-09-19 清华大学 基于关键帧的端云联合超分辨率视频重建方法及***

Also Published As

Publication number Publication date
EP4167570A4 (en) 2024-04-24
CN113810763A (zh) 2021-12-17
EP4167570A1 (en) 2023-04-19
US20230269395A1 (en) 2023-08-24

Similar Documents

Publication Publication Date Title
WO2021254139A1 (zh) 视频处理方法、设备及存储介质
TWI624804B (zh) 利用超解析重建法生成高解析度影像的方法與系統
WO2019242491A1 (zh) 视频编码、解码方法、装置、计算机设备和存储介质
TWI647946B (zh) 一種圖像編解碼方法及裝置
US8804821B2 (en) Adaptive video processing of an interactive environment
ES2532829T3 (es) Sistema de codificación de vídeo escalable
WO2019242499A1 (zh) 视频编码、解码方法、装置、计算机设备和存储介质
JP7114153B2 (ja) ビデオエンコーディング、デコーディング方法、装置、コンピュータ機器及びコンピュータプログラム
KR100985464B1 (ko) 이미지 및 비디오 프로세싱을 위한 스케일러 구조
WO2021036795A1 (zh) 视频超分辨率处理方法及装置
US20050140674A1 (en) System and method for scalable portrait video
CN110049336B (zh) 视频编码方法和视频解码方法
JP2008541653A (ja) スムージング予測を用いた多階層基盤のビデオエンコーディング方法、デコーディング方法、ビデオエンコーダ及びビデオデコーダ
WO2021057481A1 (zh) 视频编解码方法和相关装置
WO2022068682A1 (zh) 图像处理方法及装置
US20120257675A1 (en) Scalable video codec encoder device and methods thereof
WO2021057705A1 (zh) 视频编解码方法和相关装置
US20220239939A1 (en) Temporal Prediction Shifting for Scalable Video Coding
CN113747242B (zh) 图像处理方法、装置、电子设备及存储介质
CN111860363A (zh) 一种视频图像的处理方法及装置、电子设备、存储介质
US20200120349A1 (en) Optimization of downsampling occurring before encoding images during compression
CN114363617A (zh) 一种网络轻量级视频流传输方法、***及设备
WO2021057686A1 (zh) 视频解码方法和装置、视频编码方法和装置、存储介质及电子装置
KR100996364B1 (ko) 동영상 압축을 위한 서브샘플링기반 화면내 예측 부호화/복호화 장치 및 방법
US10157447B2 (en) Multi-level spatial resolution increase of video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21826930

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021826930

Country of ref document: EP

Effective date: 20230116