CN110177282B - Interframe prediction method based on SRCNN - Google Patents

Interframe prediction method based on SRCNN Download PDF

Info

Publication number
CN110177282B
CN110177282B CN201910388829.6A CN201910388829A CN110177282B CN 110177282 B CN110177282 B CN 110177282B CN 201910388829 A CN201910388829 A CN 201910388829A CN 110177282 B CN110177282 B CN 110177282B
Authority
CN
China
Prior art keywords
image
frame
resolution
super
psnr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910388829.6A
Other languages
Chinese (zh)
Other versions
CN110177282A (en
Inventor
颜成钢
黄智坤
李志胜
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910388829.6A priority Critical patent/CN110177282B/en
Publication of CN110177282A publication Critical patent/CN110177282A/en
Application granted granted Critical
Publication of CN110177282B publication Critical patent/CN110177282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses an interframe prediction method based on SRCNN, which is characterized in that a super-resolution convolutional neural network is used for interframe prediction of an image sequence; after motion estimation and motion compensation operations are carried out on the image sequence, a characteristic model is trained by combining a super-resolution convolutional neural network; and performing super-resolution reconstruction on the image by using the parameters in the model, and performing motion estimation and motion compensation on the image to obtain an image consistent with the next frame of image of the current image. The invention applies deep learning to interframe prediction of video coding, and uses a convolutional neural network to perform feature extraction and training learning on motion estimation and motion compensation operation among image sequences. Meanwhile, the super-resolution neural network is used, so that the image quality of the image can be enhanced during image reconstruction.

Description

Interframe prediction method based on SRCNN
Technical Field
The invention belongs to interframe prediction in the field of video coding, mainly aims to improve video transmission efficiency, and particularly relates to an interframe prediction method based on SRCNN.
Background
Super-Resolution (Super-Resolution) means that a Low-Resolution (Low-Resolution) image is converted into a High-Resolution (High-Resolution) image, and generally, the image quality and definition can be improved. The Super-Resolution Convolutional Neural Network (SRCNN) is a Convolutional Neural Network applied to image Super-Resolution reconstruction, and reconstructs a high-Resolution image after extracting the characteristics of an image block and performing nonlinear mapping on the characteristics. The convolutional neural network is widely used after being proposed, and the accuracy and the reliability are well verified.
In this information age today, research and statistical data from scientists show that roughly 75% of the information from the outside world that is acquired by humans is acquired by the eyes, which are converted into images by the visual system and transmitted to the brain. With the rapid improvement of the current living standard, the requirements of people on the quality of image videos are higher and higher. The continuous improvement of the resolution of images and videos also brings great challenges to information transmission. Sharper images, video, mean larger data volumes and require higher transmission rates. In order to ensure the comfort of people, the frame rate of videos such as movies and the like is generally higher than 24 frames per second nowadays, and if images of each frame are stored and played frame by frame, the requirements on the capacity of a hard disk are particularly high, and the transmission and display rates of playing equipment are greatly challenged. If the video is played in this manner, there will be no high definition video, such as 2K, 4K, etc., because of the transmission rate limitations. The video coding technology eliminates the redundancy among image sequences to the greatest extent, so that the data volume of the video is greatly compressed, the ultra-high-definition video is enabled to enter the life of people by matching with the existing hardware technology, and the visual and sensory requirements of people are met to the greatest extent.
Interframe prediction is the most important ring in video coding, achieves the purpose of image compression by utilizing the correlation among video image frames, namely time correlation, and is widely used for compression coding of common televisions, conference televisions, video telephones and high-definition televisions. In the image transmission technology, moving images, particularly, television images are a main object of interest. A moving image is a temporal image sequence consisting of successive image frames spaced apart in time by a frame period, which has a greater correlation in time than in space. Most of television images have small detail change between adjacent frames, namely video images have strong correlation between frames, and the compression ratio higher than that of intra-frame coding can be obtained by using the characteristic of the correlation of the frames to perform inter-frame coding.
In the inter-frame prediction coding, there is a certain correlation between scenes in adjacent frames of moving pictures. Therefore, the moving image can be divided into a plurality of blocks or macro blocks, and the position of each block or macro block in the adjacent frame image is searched out, and the relative offset of the spatial position between the two is obtained, the obtained relative offset is commonly referred to as a motion vector, and the process of obtaining the motion vector is called motion estimation. The motion vector and the prediction error obtained after motion matching are jointly sent to a decoding end, and the corresponding block or macro block is found from the decoded adjacent reference frame image at the position indicated by the motion vector at the decoding end and is added with the prediction error to obtain the position of the block or macro block in the current frame. The inter-frame redundancy can be removed by motion estimation, so that the number of bits for video transmission is greatly reduced, and therefore, the motion estimation is an important component in a video compression processing system. This section starts with a general method of motion estimation, and focuses on discussing three key issues of motion estimation: parameterizing the motion field, defining an optimal matching function, and how to find an optimal match.
Disclosure of Invention
The invention aims to provide an inter-frame prediction method based on SRCNN, which is different from a main stream HEVC video coding mode. The invention aims to perform inter-frame prediction on an image sequence by using a super-resolution convolutional neural network. And after the motion estimation and motion compensation operation is carried out on the image sequence, a characteristic model is trained by combining a super-resolution convolutional neural network. By using the parameters in the model, super-resolution reconstruction can be performed on the image, and meanwhile motion estimation and motion compensation are performed on the image to obtain an image which is basically consistent with the next frame of image of the current image.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: collecting a large number of video files of different scenes, and compressing the video according to different Quantization Parameters (QPs);
step 2: extracting an image sequence from a video, wherein the time interval of two frames of images before and after the image sequence is set as t, and t is less than 0.1 second;
and step 3: portions of the image sequence are divided into verification sets. Reading the residual images frame by frame, calculating the residual error between the two frames of images by using the current frame and the previous frame of each image except the first frame of the read image sequence, combining the previous frame of image and the residual error, and performing motion compensation on the previous frame of image to obtain the predicted frame of the previous frame of image. Storing the calculated predicted frame sequence, and dividing the predicted frame image sequence to obtain a training set and a test set, wherein the ratio of the training set to the test set is 4: 1.
and 4, step 4: inputting a training set and a test set, setting appropriate hyper-parameters, and training a parameter model by using a super-resolution convolutional neural network (SRCNN);
and 5: calculating peak signal-to-noise ratio (PSNR) of the ith frame image and the (i + 1) th frame in each image sequence in the verification set, and recording the PSRN 1; reading parameters in the parameter model to process the ith frame of image in the acquired image sequence to obtain a reconstructed image I; calculating PSNR between the reconstructed image I and the ith frame image of the image sequence in the verification set, and recording the PSNR as PSNR 2;
comparing the two PSNR values obtained by calculation, and if the PSNR2 is more than or equal to the PSNR1, considering the model to be effective;
if PSNR2< PSNR1, the model effect is considered not good; notation ERR-PSNR 1-PSNR 2; if ERR is less than 5, the training hyper-parameter is considered to have a problem, the step 4 is returned, the hyper-parameter of the learning rate is adjusted, and then the parameter model is retrained; if ERR is larger than or equal to 5, the partitioning strategy of the data set is considered to be a problem, the step 3 is returned, the data set comprises more scenes by expanding the data set, and training and verification are carried out after the training set and the test set are partitioned again;
if the difference between the two images is large and the PSNR value exceeds the lowest preset threshold value, adjusting a training set and a testing set;
and if the difference between the two images is small and the PSNR value is between the optimal preset threshold and the lowest preset threshold, returning to the step 4 to adjust the parameters of the super-resolution convolutional neural network and retraining the parameter model.
The reconstruction of the image by using the parametric model is specifically realized as follows:
1. and (4) converting the input low-resolution image into a YCbCr color space to obtain a gray scale image, and taking the gray scale image as an input i of the image reconstruction operation. Carrying out down-sampling on the image i, wherein the step length of the down-sampling is set to be k, and obtaining a low-dimensional image;
2. using bicubic interpolation on the low-dimensional image, and amplifying the low-dimensional image to a target size, namely the size of the input low-resolution image;
3. parameters in the parametric model are read, including weights and biases of the respective network nodes. Carrying out nonlinear mapping on the interpolated image through three layers of convolution networks to obtain a reconstructed result, namely an image I;
4. and converting the image I back to the RGB color image to obtain a reconstructed high-resolution image.
The invention has the following beneficial effects:
the invention has the innovativeness that deep learning is applied to interframe prediction of video coding, and a convolutional neural network is used for carrying out feature extraction and training learning on motion estimation and motion compensation operation among image sequences. Meanwhile, the super-resolution neural network is used, so that the image quality of the image can be enhanced during image reconstruction.
Drawings
FIG. 1 is a schematic diagram of a super-resolution convolutional neural network SRCNN;
FIG. 2 is a feature model training flow diagram of the present invention.
Detailed Description
The invention mainly aims at carrying out algorithm innovation on an interframe prediction method in video coding, introduces the training flow of the whole model in detail, and explains the specific implementation steps of the invention in detail by combining the attached drawings, so that the aim and the effect of the invention are more obvious.
Fig. 1 is a schematic diagram of a super-resolution convolutional neural network SRCNN, which is clearly seen in the figure that the convolutional neural network has a simple structure, and can enhance the image quality of an image through nonlinear mapping and image reconstruction. By using the network, the resolution of the image can be improved while the inter-frame prediction is carried out on the image sequence.
FIG. 2 is a flowchart of feature model training according to the present invention, wherein the specific operations include:
1. a large number of video files in YUV format are collected, containing various scenes.
2. Video files are compressed using different quantization parameters, the higher the quantization parameter, the higher the degree of compression, mainly focusing on the compression ratio of the quantization parameter between 28 and 42.
3. And extracting image sequences from the video files, and extracting different numbers of images according to videos with different durations to ensure that the intervals of the image sequences are consistent. In order to ensure that the change between the two previous and next frames is not large, the time interval for extracting the images is set to be small, and is particularly set according to the length of the video.
4. And performing motion estimation and motion compensation on each extracted image, wherein the operation specifically comprises the steps of inputting a current frame and a next frame of image, and performing motion estimation and motion compensation on the current frame by comparing the two frames of images.
5. The training set and the test set are organized using the processed image sequence. The verification set required to verify the model then requires the use of a sequence of images that have not been motion estimated, motion compensated.
6. And inputting a training set and a test set, setting appropriate parameters, and training the model by using the super-resolution convolutional neural network SRCNN.
7. And verifying whether the trained model is effective, and comparing the next frame of image originally extracted with the image reconstructed by using the model parameters, wherein if the two images are almost indistinguishable, the model can be considered to be effective. If the two images have obvious difference, the adjustment is needed to be made according to different situations. If the difference between the two images is large, the data set needs to be adjusted, and the model needs to be retrained, and if the difference between the two images is not large, the imaging effect needs to be improved, the network parameters need to be adjusted, and the model with the composite requirement needs to be retrained.
When comparing the generated image with the next frame of image of the original image, the subjective judgment and the objective numerical analysis need to be combined visually. Subjectively, two frames of images are observed by naked eyes, and if the two images have little difference, the model can be subjectively considered to be effective. However, since the difference between the original previous and next frame images is not very large, a mathematical tool is also needed to compare the two images. The reconstruction effect can be objectively evaluated using, i.e., peak signal-to-noise ratio (PSNR), which is an objective criterion for evaluating an image, and is expressed as follows:
Figure GDA0002932459990000051
where MSE is the Mean squared error (Mean squared error). PSNR values between the original image and the next frame image of the original image and between the original image and the reconstructed image are respectively calculated, if the PSNR values are close to the PSNR values, the model effect is good, and basically the same picture as the next frame image of the original image is reconstructed. If the PSNR value of the latter is higher, it is considered that the program performs inter prediction on an image and also improves image quality.
With PSNR, the accuracy of the model can be verified again from the customer, thereby reducing the workload and ensuring that the solution is implemented efficiently.

Claims (1)

1. An interframe prediction method based on SRCNN is characterized in that a super-resolution convolutional neural network is used for interframe prediction of an image sequence; after motion estimation and motion compensation operations are carried out on the image sequence, a characteristic model is trained by combining a super-resolution convolutional neural network; performing super-resolution reconstruction on the image by using parameters in the model, and performing motion estimation and motion compensation on the image to obtain an image consistent with the next frame of image of the current image;
the specific implementation comprises the following steps:
step 1: collecting a large number of video files of different scenes, and compressing the video according to different quantization parameters;
step 2: extracting an image sequence from a video, wherein the time interval of two frames of images before and after the image sequence is set as t, and t is less than 0.1 second;
and step 3: dividing parts in the image sequence into verification sets; reading the residual image sequence frame by frame, calculating a residual error between two frames of images by using a current frame and a previous frame of each image except for a first frame of the read image sequence, combining the previous frame of image with the residual error, and performing motion compensation on the previous frame of image to obtain a predicted frame of the previous frame of image; storing the calculated prediction frame image sequence, and dividing the prediction frame image sequence to obtain a training set and a test set, wherein the ratio of the training set to the test set is 4: 1;
and 4, step 4: inputting a training set and a test set, setting a super parameter, and training a parameter model by using a super-resolution convolutional neural network;
and 5: calculating peak signal-to-noise ratio (PSNR) of the ith frame image and the (i + 1) th frame in each image sequence in the verification set, and recording the PSRN 1; reading parameters in the parameter model to process the ith frame of image in the acquired image sequence to obtain a reconstructed image I; calculating PSNR between the reconstructed image I and the ith frame image of the image sequence in the verification set, and recording the PSNR as PSNR 2;
comparing the two PSNR values obtained by calculation, and if the PSNR2 is more than or equal to the PSNR1, considering the model to be effective;
if PSNR2< PSNR1, the model effect is considered not good; notation ERR-PSNR 1-PSNR 2; if ERR is less than 5, the training hyper-parameter is considered to have a problem, the step 4 is returned, the hyper-parameter of the learning rate is adjusted, and then the parameter model is retrained; if ERR is larger than or equal to 5, the partitioning strategy of the data set is considered to be a problem, the step 3 is returned, the data set comprises more scenes by expanding the data set, and training and verification are carried out after the training set and the test set are partitioned again;
if the difference between the two images is large and the PSNR value exceeds the lowest preset threshold value, adjusting a training set and a testing set;
if the difference between the two images is small and the PSNR value is between the optimal preset threshold and the lowest preset threshold, returning to the step 4 to adjust the parameters of the super-resolution convolutional neural network and retraining the parameter model;
the reconstruction of an image using a parametric model is specifically realized as follows:
1. converting the input low-resolution image into a YCbCr color space to obtain a gray scale image, and taking the gray scale image as an input image i for image reconstruction operation; carrying out down-sampling on an input image i, wherein the step length of the down-sampling is set to be k, and obtaining a low-dimensional image;
2. using bicubic interpolation on the low-dimensional image, and amplifying the low-dimensional image to a target size, namely the size of the input low-resolution image;
3. reading parameters in the parameter model, including the weight and bias of each network node; carrying out nonlinear mapping on the interpolated image through three layers of convolution networks to obtain a reconstructed image I;
4. and converting the image I back to the RGB color image to obtain a reconstructed high-resolution image.
CN201910388829.6A 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN Active CN110177282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910388829.6A CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910388829.6A CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Publications (2)

Publication Number Publication Date
CN110177282A CN110177282A (en) 2019-08-27
CN110177282B true CN110177282B (en) 2021-06-04

Family

ID=67690836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910388829.6A Active CN110177282B (en) 2019-05-10 2019-05-10 Interframe prediction method based on SRCNN

Country Status (1)

Country Link
CN (1) CN110177282B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112155511B (en) * 2020-09-30 2023-06-30 广东唯仁医疗科技有限公司 Method for compensating human eye shake in OCT acquisition process based on deep learning
CN112601095B (en) * 2020-11-19 2023-01-10 北京影谱科技股份有限公司 Method and system for creating fractional interpolation model of video brightness and chrominance
CN113191945B (en) * 2020-12-03 2023-10-27 陕西师范大学 Heterogeneous platform-oriented high-energy-efficiency image super-resolution system and method thereof
CN113592719B (en) * 2021-08-14 2023-11-28 北京达佳互联信息技术有限公司 Training method of video super-resolution model, video processing method and corresponding equipment
CN117313818A (en) * 2023-09-28 2023-12-29 四川大学 Method for training lightweight convolutional neural network and terminal equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN109087243A (en) * 2018-06-29 2018-12-25 中山大学 A kind of video super-resolution generation method generating confrontation network based on depth convolution

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733714B2 (en) * 2017-11-09 2020-08-04 Samsung Electronics Co., Ltd Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133919A (en) * 2017-05-16 2017-09-05 西安电子科技大学 Time dimension video super-resolution method based on deep learning
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN109087243A (en) * 2018-06-29 2018-12-25 中山大学 A kind of video super-resolution generation method generating confrontation network based on depth convolution

Also Published As

Publication number Publication date
CN110177282A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN110177282B (en) Interframe prediction method based on SRCNN
US9781443B2 (en) Motion vector encoding/decoding method and device and image encoding/decoding method and device using same
CN111464815B (en) Video coding method and system based on neural network
Moorthy et al. Efficient video quality assessment along temporal trajectories
JP4429968B2 (en) System and method for increasing SVC compression ratio
JP2011199380A (en) Objective image quality assessment device of video quality, and automatic monitoring device
TW201127064A (en) System and method to process motion vectors of video data
CN111479110B (en) Fast affine motion estimation method for H.266/VVC
JP7015183B2 (en) Image coding device and its control method and program
Barman et al. Analysis of spatial and temporal information variation for 10-bit and 8-bit video sequences
CN113066022B (en) Video bit enhancement method based on efficient space-time information fusion
JP2004173011A (en) Apparatus and method for processing image signal, apparatus and method for generating coefficient data used therefor, and program for implementing each method
Yilmaz et al. End-to-end rate-distortion optimization for bi-directional learned video compression
CN116916036A (en) Video compression method, device and system
CN114793282A (en) Neural network based video compression with bit allocation
Liu et al. Gated context model with embedded priors for deep image compression
CN116233438B (en) Data prediction acquisition system using weighting algorithm
JP2001285881A (en) Digital information converter and method, and image information converter and method
Jenab et al. Content-adaptive resolution control to improve video coding efficiency
US9277213B2 (en) Video encoding device
CN114885178A (en) Extremely-low-bit-rate face video hybrid compression method and system based on bidirectional frame prediction
CN113822801A (en) Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network
Sinha et al. Deep Video Compression using Compressed P-Frame Resampling
Al-Juboori et al. Content characterization for live video compression optimization
Jadhav Variable rate video compression using a hybrid recurrent convolutional learning framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant