CN114640815A - Video processing method and device, electronic equipment and storage medium - Google Patents

Video processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114640815A
CN114640815A CN202210268679.7A CN202210268679A CN114640815A CN 114640815 A CN114640815 A CN 114640815A CN 202210268679 A CN202210268679 A CN 202210268679A CN 114640815 A CN114640815 A CN 114640815A
Authority
CN
China
Prior art keywords
image
video
target image
quality
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210268679.7A
Other languages
Chinese (zh)
Inventor
磯部駿
陶鑫
戴宇荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202210268679.7A priority Critical patent/CN114640815A/en
Publication of CN114640815A publication Critical patent/CN114640815A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0127Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/646Circuits for processing colour signals for image enhancement, e.g. vertical detail restoration, cross-colour elimination, contour correction, chrominance trapping filters

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

The disclosure relates to a video processing method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring quality scores of images in a video; determining a target image from the images according to the quality scores; performing noise reduction processing on the target image, and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video; and inputting the preprocessed video into a video super-resolution model for super-resolution processing, and outputting to obtain a processed video. According to the embodiment of the disclosure, the problem of noise of low-resolution images in a real scene is considered, the noise is selectively processed according to the quality of the images in the video, and the high-resolution video with higher quality can be obtained after the obtained preprocessed video is input into the video super-resolution model.

Description

Video processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision, and in particular, to a method and an apparatus for processing a video, an electronic device, and a storage medium.
Background
With the development of computer vision technology, video super-resolution technology has emerged. The purpose of video super-resolution techniques is to restore a video sequence from low resolution to high resolution and then fill in the missing details under resolution changes. In recent years, the video super-resolution technology is more comprehensively applied in the fields of mobile phone photographing, medical images, short videos and the like.
In the related art, a video super-resolution model based on mechanical learning is proposed for performing high-resolution image restoration on a low-resolution video. However, the video quality recovered by the video super resolution technology in the related art is not high for various reasons.
Disclosure of Invention
The present disclosure provides a video processing method, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem of low quality of a video recovered by a video super-resolution technology in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a method comprising:
acquiring quality scores of images in a video;
determining a target image from the images according to the quality scores;
performing noise reduction processing on the target image, and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video;
and inputting the preprocessed video into a video super-resolution model for super-resolution processing, and outputting to obtain a processed video.
In one possible implementation, the acquiring a quality score of an image in a video includes:
acquiring images in a video;
and inputting the image into an image quality evaluation model, and outputting the image through the image quality evaluation model to obtain the noise category and the quality score of the image, wherein the image quality evaluation model is obtained by utilizing the corresponding relation between the sample image and the noise category and the quality score.
In a possible implementation manner, the image quality evaluation model is obtained by training using a corresponding relationship between a sample image and a noise category and a quality score, and includes:
acquiring a sample image set, wherein the sample image set comprises a plurality of sample images marked with noise categories and quality scores;
inputting the sample image into an initial image quality evaluation model to generate a prediction result;
and iteratively adjusting the training parameters of the initial image quality evaluation model based on the difference between the prediction result and the labeled noise category and quality score until the difference meets the preset requirement to obtain the image quality evaluation model.
In one possible implementation, the determining a target image from the images according to the quality scores includes:
acquiring an image with a quality score lower than a preset threshold value from the image;
and taking the image with the quality score lower than a preset threshold value as a target image.
In one possible implementation, the determining a target image from the images according to the quality scores includes:
and acquiring a target image from the images, wherein the quality score of the target image is determined to be respectively lower than the quality score of the image in the frame before the target image and the quality score of the image in the frame after the target image.
In a possible implementation manner, the performing noise reduction processing on the target image and replacing the target image with the target image after the noise reduction processing to obtain a pre-processed video includes:
and smoothing the target image, and replacing the target image with the smoothed target image to obtain a preprocessed video.
In one possible implementation manner, the obtaining manner of the video super-resolution model includes:
obtaining a sample video set, wherein the sample image set comprises a plurality of sample video pairs, the sample video pairs comprise a first sample video and a second sample video matched with the first sample video, and the resolution of the first sample video is lower than that of the second sample video;
inputting the first sample video into an initial video super-resolution model to generate a prediction result;
and iteratively adjusting the training parameters of the initial video super-resolution model based on the difference between the prediction result and the second sample video until the difference meets the preset requirement to obtain the video super-resolution model.
According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus, including:
the acquisition module is used for acquiring the quality score of the image in the video;
a determining module for determining a target image from the images according to the quality scores;
the noise reduction module is used for carrying out noise reduction processing on the target image and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video;
and the processing module is used for inputting the preprocessed video into a video super-resolution model for super-resolution processing and outputting the processed video.
In one possible implementation manner, the obtaining module includes:
the first acquisition submodule is used for acquiring images in a video;
and the evaluation submodule is used for inputting the image into an image quality evaluation model, outputting the image through the image quality evaluation model, and obtaining the noise category and the quality score of the image, wherein the image quality evaluation model is obtained by utilizing the corresponding relation between the sample image and the noise category and the quality score through training.
In a possible implementation manner, the method further includes a first generation sub-module, where the first generation sub-module includes:
the device comprises an acquisition unit, a quality evaluation unit and a processing unit, wherein the acquisition unit is used for acquiring a sample image set, and the sample image set comprises a plurality of sample images marked with noise categories and quality scores;
the prediction unit is used for inputting the sample image into an initial image quality evaluation model and generating a prediction result;
and the generating unit is used for iteratively adjusting the training parameters of the initial image quality evaluation model based on the difference between the prediction result and the labeled noise category and quality score until the difference meets the preset requirement, so as to obtain the image quality evaluation model.
In one possible implementation, the determining module includes:
the second obtaining submodule is used for obtaining an image with the quality score lower than a preset threshold value from the image;
and the first determining submodule is used for taking the image with the quality score lower than a preset threshold value as a target image.
In one possible implementation, the determining module includes:
and the second determining submodule is used for acquiring a target image from the image, wherein the quality score of the target image is determined to be respectively lower than the quality score of the image in the previous frame of the target image and the quality score of the image in the next frame of the target image.
In one possible implementation, the noise reduction module includes:
and the smoothing module is used for smoothing the target image and replacing the target image with the smoothed target image to obtain a preprocessed video.
In a possible implementation manner, the method further includes a generating module, where the generating module includes:
a third obtaining sub-module, configured to obtain a sample video set, where the sample image set includes multiple sample video pairs, where the sample video pairs include a first sample video and a second sample video that matches content of the first sample video, and a resolution of the first sample video is lower than a resolution of the second sample video;
the prediction submodule is used for inputting the first sample video into an initial video super-resolution model to generate a prediction result;
and the second generation submodule is used for iteratively adjusting the training parameters of the initial video super-resolution model based on the difference between the prediction result and the second sample video until the difference meets the preset requirement, so that the video super-resolution model is obtained.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video processing method according to any one of the embodiments of the present disclosure.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions of the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video processing method according to any one of the embodiments of the present disclosure.
According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product, which includes instructions, and is characterized in that the instructions, when executed by a processor of an electronic device, enable the electronic device to execute the video processing method according to any one of the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: according to the embodiment of the disclosure, before the low-resolution video is input into the video super-resolution model, the target image with a low quality score is selectively determined according to the quality score of the image in the video. The noise reduction processing is carried out on the target image, so that the noise information in the target image frame can be eliminated; and the target image with higher quality score is not subjected to noise reduction processing, so that more useful image texture information is reserved. According to the embodiment of the disclosure, the problem of noise of low-resolution images in a real scene is considered, the noise is selectively processed according to the quality of the images in the video, and the high-resolution video with higher quality can be obtained after the obtained preprocessed video is input into the video super-resolution model.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flow diagram illustrating a method of processing video according to an exemplary embodiment.
Fig. 2 is a schematic diagram illustrating a structure of an image quality evaluation model according to an exemplary embodiment.
Fig. 3 is a comparison diagram illustrating quality scoring results according to an example embodiment.
Fig. 4 is a schematic structural diagram illustrating a video super-resolution model according to an exemplary embodiment.
Fig. 5 is a flow chart illustrating a method of processing video according to an exemplary embodiment.
Fig. 6(a) is an effect diagram of a video processing method in the prior art.
Fig. 6(b) is an effect diagram illustrating a method of processing a video according to an exemplary embodiment.
Fig. 7 is a block diagram illustrating a video processing apparatus according to an example embodiment.
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 9 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.
It should also be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are both information and data that are authorized by the user or sufficiently authorized by various parties.
Fig. 1 is a flow diagram illustrating a method of processing video according to an exemplary embodiment. As shown in fig. 1, the method is used in a terminal or a server, and includes the following steps.
And step S101, acquiring the quality score of the image in the video.
In the embodiment of the present disclosure, the video is a video to be processed, and may include a low-resolution video. The image may comprise each frame of image in the video, or may comprise several images of a frame of video. In the embodiment of the present disclosure, the method for evaluating the quality of an image to obtain a quality score may include multiple methods, such as a subjective evaluation method and an objective evaluation method. The subjective evaluation method may determine the image quality by normalizing the scores of the observers, for example: double-stimulation injury grading method, double-stimulation continuous quality grading method and single-stimulation continuous quality grading method. The objective evaluation method may include: and (3) evaluating the full reference image instruction, and analyzing an error obtained by comparing the image to be evaluated with the original image, for example: mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). The method can also comprise the following steps: semi-reference image quality evaluation, extracting only partial information of an image as a reference, for example: a method based on original image characteristics, a method based on wavelet domain statistical models, and a method based on digital watermarking. Non-reference image quality assessment may also be included, for example: algorithms for distortion types or algorithms based on machine learning, etc.
And step S103, determining a target image from the images according to the quality scores.
In the embodiment of the present disclosure, the manner of determining the target image from the images may include multiple ways, and in one example, the image with the lowest quality score may be selected from the images as the target image. In another example, an image having a quality score lower than a preset value may be selected from the images as the target image. For example, the quality score is projected into a coordinate system, the abscissa is the image frame number corresponding to the quality score, the ordinate is the quality score, different quality scores form a curve fluctuating up and down, by adjusting a preset value, a minimum value point of a trough or a point near the minimum value in the curve can be screened, and an image corresponding to the minimum value point or the point near the minimum value point is taken as the target image. In another example, images with quality scores respectively lower than the quality score of the previous frame image and the quality score of the next frame image can be screened out in a comparison mode and taken as the target image.
And step S104, performing noise reduction processing on the target image, and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video.
In the embodiment of the disclosure, the method for performing noise reduction processing on the target image may include a spatial domain pixel feature denoising algorithm and a transform domain denoising algorithm. The spatial pixel characteristic denoising algorithm may include arithmetic mean filtering, gaussian filtering, bilateral filtering, guided filtering, NLM (Non-Local means) algorithm, and the like. The transform domain denoising algorithm may include: wavelet transform domain algorithm, BM3D denoising algorithm, etc. In one example, one or more of the above denoising algorithms can be used in combination to achieve better denoising effect.
And S105, inputting the preprocessed video into a video super-resolution model for super-resolution processing, and outputting to obtain a processed video.
In the embodiment of the disclosure, the video super-resolution model is used for restoring one or more low-resolution images into a model of a high-resolution image. The generation mode of the video super-resolution model can comprise an alignment method and a non-alignment method, and the alignment method can comprise the following steps: in one example, techniques of motion compensation and motion estimation may be employed, the purpose of motion estimation is to extract inter-frame motion information, and motion compensation is used to perform inter-frame warping operation based on the inter-frame motion information to restore it, thereby restoring a high-resolution image. In one example, a video super-resolution model may be trained using a mechanical learning method, such as a convolutional neural network-based method, by first aligning input frames through a cascade and deformable alignment module, and then fusing the aligned frames through a spatio-temporal attention fusion module. And inputting the fused result into a reconstruction module for feature extraction, obtaining a residual image through up-sampling, and adding the residual image into a directly up-sampled target frame to obtain a final high-resolution image. The non-alignment method may include a spatial non-alignment method and a spatiotemporal non-alignment method, and the spatial non-alignment method may perform the hyper-resolution reconstruction by using a network to be able to learn the related information in the frame by itself. The spatio-temporal non-alignment method may include a three-dimensional convolution method, a cyclic convolution neural network, a non-local method, and the like.
According to the embodiment of the disclosure, before the low-resolution video is input into the video super-resolution model, the target image with a low quality score is selectively determined according to the quality score of the image in the video. The noise reduction processing is carried out on the target image, so that the noise information in the target image frame can be eliminated; and the target image with higher quality score is not subjected to noise reduction processing, so that more useful image texture information is reserved. According to the embodiment of the invention, the noise problem of the low-resolution image in the real scene is considered, the noise is selectively processed according to the image quality in the video, and the high-resolution video with higher quality can be obtained after the obtained preprocessed video is input into the video super-resolution model.
In one possible implementation, the acquiring a quality score of an image in a video includes:
acquiring images in a video;
and inputting the image into an image quality evaluation model, and outputting the image through the image quality evaluation model to obtain the noise category and the quality score of the image, wherein the image quality evaluation model is obtained by utilizing the corresponding relation between the sample image and the noise category and the quality score.
In the embodiment of the present disclosure, the noise category may include video coding and decoding noise, gaussian noise, poisson noise, salt and pepper noise, dim light noise, and the like. Noise of more noise classes is added to the sample image, which will be closer to the image in the real scene. In the embodiment of the disclosure, the image quality evaluation model is obtained by training through a deep learning method by using the corresponding relation between the sample image and the noise category and quality score. The sample images may include sample images to which more noise classes are added. The annotated quality score of the sample image may be obtained in advance, e.g. manually, e.g. by dividing the quality score into 1-10 points.
Fig. 2 is a schematic diagram illustrating a structure of an image quality evaluation model according to an exemplary embodiment. Referring to fig. 2, in the embodiment of the present disclosure, the image quality evaluation model may be based on a VGG-16 network structure, which includes 13 convolutional layers, as represented by dark rectangular boxes in fig. 2, 5 pooling layers, as represented by white rectangular boxes in the figure, and 3 full link layers, which are not shown in fig. 2. The task of training according to the model is to predict the score of image quality, and therefore, the probability of obtaining a quality score (e.g., 1-10 points) of a sample image can be output at the last time of the network. And calculating a difference value between the two through a loss function based on the predicted result and the true value, and adjusting a training parameter in the image quality evaluation model. It should be noted that the network structure of the image quality evaluation model is not limited to the above examples, such as an inclusion-v 2 network structure, a MobileNet network structure, and a ResNet-18 network structure, and may be used as the network of the image quality evaluation model, and other modifications may be made by those skilled in the art within the spirit of the present application, but the scope of the present application should be covered as long as the functions and effects achieved by the network structure are the same as or similar to the present application.
The image quality evaluation model in the embodiment of the disclosure can be obtained by a deep learning method, and in the process of model training, more noise-type sample images are used for training the image quality evaluation model, so that the generalization capability of the model can be improved, and more accurate image quality scores can be obtained.
In a possible implementation manner, the image quality evaluation model is obtained by training using a corresponding relationship between a sample image and a noise category and a quality score, and includes:
acquiring a sample image set, wherein the sample image set comprises a plurality of sample images marked with noise categories and quality scores;
inputting the sample image into an initial image quality evaluation model to generate a prediction result;
and iteratively adjusting the training parameters of the initial image quality evaluation model based on the difference between the prediction result and the labeled noise category and quality score until the difference meets the preset requirement to obtain the image quality evaluation model.
In the embodiment of the present disclosure, the sample set includes a plurality of sample images labeled with noise categories and quality scores, for example, a sample image a: gaussian noise, 3 points; sample image b: coding and decoding noise, 6 min; sample image c: motion estimation noise, 7 points. In one example, the sample set may be divided into a training set, a validation set, and a test set in a certain proportion. The sample images in the training set are input to an initial image quality assessment model, which may be any of the network structures in the above embodiments. The sample image is subjected to processes of feature extraction, pooling, normalization and the like in an initial image quality evaluation model, and finally the probability of the sample image corresponding to the noise category and the probability of the quality score corresponding to the score are predicted. And iteratively adjusting the training parameters in the initial image quality evaluation model based on the difference between the predicted result and the result labeled in the training set until the difference meets the preset requirement. In one example, the hyper-parameters in the initial image quality model may be adjusted by using the sample images in the verification set, resulting in a final image quality evaluation model. In one example, the sample images in the test set can be used to test the performance of the image quality assessment model, testing the accuracy of the model.
Fig. 3 is a comparison diagram illustrating quality scoring results according to an example embodiment. Referring to fig. 3, the horizontal axis indicates the frame number of an image in the video, i.e., the number of a frame, for indicating the number of the frame. The vertical axis represents the quality score. The data curve 301 is a result of performing noise processing on the sample image with a single noise and performing quality evaluation on the processed sample image. The data curve 302 is a result of obtaining a sample image with high resolution by using an interpolation up-sampling method for the processed sample image, and performing quality evaluation on the sample image with high resolution. The data curve 303 is a result of performing noise processing on the sample image by using a plurality of kinds of noise and performing quality evaluation on the processed sample image. The data curve 301 and the data curve 302 have relatively smooth quality changes. The different frames in the data curve 303 have very different quality and are thus closer to the real scene.
The embodiment of the disclosure provides a method for training an image quality evaluation model, which can predict a noise category to which an image belongs and accurately predict a quality score under the noise category through a deep learning method. Due to the fact that different noise categories are quoted, the generalization capability of the image quality evaluation model is stronger, and the prediction result is more accurate.
In one possible implementation, the determining a target image from the images according to the quality scores includes:
acquiring an image with a quality score lower than a preset threshold value from the image;
and taking the image with the quality score lower than a preset threshold value as a target image.
In the embodiment of the disclosure, according to the images in the video and the corresponding quality scores, images with lower quality scores can be screened out from the images in a mode of setting a preset threshold value. In one example, by adjusting the size of the preset threshold, a different number of target images may be obtained. For example: and when the preset value is 0.4, obtaining a group A of target images, and when the preset value is 0.3, obtaining a group B of target images, wherein the number of the group A of target images may be different from that of the group B of target images. In one example, the group a and the group B target images are subjected to noise reduction processing, and video super-resolution models are input to obtain processed videos, that is, a high-resolution video corresponding to the group a and a high-resolution video corresponding to the group B, and a high-quality video is selected from the high-resolution video corresponding to the group a and the high-resolution video corresponding to the group B as a recovered high-resolution video.
According to the embodiment of the disclosure, the target images with lower quality scores are screened in a threshold setting mode, and the method has the advantages of being simple to operate and easy to implement.
In one possible implementation, the determining a target image from the images according to the quality scores includes:
and acquiring a target image from the image, wherein the quality score of the target image is determined to be respectively lower than the quality score of the image in the frame before the target image and the quality score of the image in the frame after the target image.
In the embodiment of the disclosure, the target image with the quality score at the minimum value point can be screened out in a comparison mode. In one example, for example, the frame numbers of the video are arranged as follows: 1 frame, 2 frames, 3 frames. Among these frames, there are frame No. 3, frame No. 8, frame No. 12, frame No. 15, and frame No. 19 that meet the conditions of the target image. Frame number 12 is used as an example because the quality score for frame number 12 is less than both frame number 11 and frame number 13. In another example, referring to FIG. 3, there are 9 minimum points in the data curve 303. In one example, the target image satisfying the above condition is subjected to noise reduction processing, and a video super-resolution model is input for super-resolution processing, so as to obtain a processed video.
According to the embodiment of the disclosure, the quality score of the selected target image is respectively lower than the quality score of the previous frame image of the target image and the quality score of the next frame image of the target image, so that the target image with lower quality score can be screened out, and the screened target image is not an adjacent image in the front and the back. Since the adjacent images include much inter-frame information, if noise processing is performed simultaneously, useful image information is easily removed. Therefore, the embodiment of the disclosure can greatly improve the recovery quality of the high-resolution video.
In a possible implementation manner, the performing noise reduction processing on the target image, and replacing the target image with the target image after the noise reduction processing to obtain a pre-processed video includes:
and smoothing the target image, and replacing the target image with the smoothed target image to obtain a preprocessed video.
In the embodiment of the present disclosure, the smoothing process may include processing the target image by using a low-pass filtering algorithm. In one example, the following algorithm is used to smooth the target image:
Figure BDA0003553553770000101
wherein, C represents a smoothing module,
Figure BDA0003553553770000102
which represents the image after the smoothing process and,
Figure BDA0003553553770000103
presentation pair
Figure BDA0003553553770000104
The image after the smoothing process is performed again,
Figure BDA0003553553770000105
representing images before smoothing, i.e. pairs
Figure BDA0003553553770000106
Performing a smoothing process to obtain
Figure BDA0003553553770000107
Sending the image into a smoothing module, obtaining the smoothed image, calculating an absolute error between the smoothed image and each pixel of the input image, and if the absolute error is greater than or equal to a preset value θ, performing smoothing on the smoothed image, for example: if in the formula (1)
Figure BDA0003553553770000108
Then
Figure BDA0003553553770000109
When the absolute error is less than the preset value theta, finishing the smoothing treatment and obtaining the smoothing treatment result
Figure BDA00035535537700001010
Replacing the pre-processed image xi
In the noise reduction processing process, the smoothing processing mode is selected, the method has the advantages of being simple in operation and easy to implement, and by smoothing the target image, when the result after smoothing processing does not meet the preset requirement, the result after smoothing processing can be further smoothed, so that the effectiveness of the smoothing result is ensured, and the super-resolution effect is finally improved.
Fig. 4 is a schematic structural diagram illustrating a video super-resolution model according to an exemplary embodiment. Referring to fig. 4, the obtaining manner of the video super-resolution model includes:
obtaining a sample video set, wherein the sample image set comprises a plurality of sample video pairs, the sample video pairs comprise a first sample video and a second sample video matched with the first sample video, and the resolution of the first sample video is lower than that of the second sample video;
inputting the first sample video into an initial video super-resolution model to generate a prediction result;
and iteratively adjusting the training parameters of the initial video super-resolution model based on the difference between the prediction result and the second sample video until the difference meets the preset requirement to obtain the video super-resolution model.
In the embodiment of the present disclosure, the sample set includes a plurality of sample video pairs, for example, a sample video a1, a sample video a 2; sample video b2, sample video b 2; the method comprises the following steps of a sample video c1, a sample video c2 and the like, wherein the sample video a1 and the sample video a2 are videos with the same content, namely, the videos are in one-to-one correspondence on image content, and the resolution of the sample video a1 is lower than that of the sample video a 2; the sample video b1 and the sample video b2 are videos with the same content, and the resolution of the sample video b1 is lower than that of the sample video b 2; the sample video c1 and the sample video c2 are videos having the same content, and the resolution of the sample video c1 is lower than that of the sample video c 2.
In one example, the sample set may be divided into a training set, a validation set, and a test set in a certain proportion. The sample images in the training set are input to an initial image quality assessment model, which may include a bi-directional circular convolution network as shown in fig. 4. In fig. 4, step 401 indicates inputting a low-resolution video, step 402 indicates performing noise reduction processing on the low-resolution video, step 403 indicates inputting the video subjected to noise reduction processing into a video super-resolution model based on a bidirectional cyclic convolution network, and step 404 indicates obtaining a high-resolution video.
In the embodiment of the disclosure, during model training, the first sample video is subjected to processes of feature extraction, pooling, normalization and the like in the initial video super-resolution model, and finally a video with higher resolution is predicted. And iteratively adjusting the training parameters in the initial video super-resolution model based on the difference between the predicted result and the second sample view until the difference meets the preset requirement. In one example, the super-parameters in the initial video super-resolution model can be adjusted by using the sample images in the verification set, so as to obtain a final video super-resolution model. In one example, the sample images in the test set can be used to test the performance of the video super-resolution model to test the accuracy of the model.
The embodiment of the disclosure provides a method for training a video super-resolution model, which is obtained by deep learning method training based on a bidirectional cyclic convolution network. Because the step of noise reduction processing is added before the high-resolution processing, the complexity of the video super-resolution model training can be reduced, the model convergence speed is high, and the prediction result is more accurate.
Fig. 5 is a flow chart illustrating a method of processing video according to an exemplary embodiment. Referring to fig. 5, the method includes:
step S501, an image in a video is acquired.
In the embodiment of the present disclosure, the video is a video to be processed, and may include a low-resolution video. The image may comprise each frame of image in the video, or may comprise several images of a frame of video.
Step S503, inputting the image into an image quality evaluation model, and outputting the image to obtain the noise category and the quality score of the image through the image quality evaluation model.
In the embodiment of the present disclosure, the noise category may include video coding and decoding noise, gaussian noise, poisson noise, salt and pepper noise, dim light noise, and the like. Noise of more noise classes is added to the sample image, which will be closer to the image in the real scene. In the embodiment of the disclosure, the image quality evaluation model is obtained by training through a deep learning method by using the corresponding relation between the sample image and the noise category and quality score. The sample images may include sample images to which more noise classes are added.
Step S505, acquiring a target image from the image, wherein the quality score of the target image is determined to be lower than the quality score of the image in the frame before the target image and the quality score of the image in the frame after the target image respectively.
In the embodiment of the disclosure, the target image with the quality score at the minimum value point can be screened out in a comparison mode. In one example, for example, the frame numbers of the video are arranged as follows: 1 frame, 2 frames, 3 frames. Among these frames, there are frame No. 3, frame No. 8, frame No. 12, frame No. 15, and frame No. 19 that meet the conditions of the target image. Frame number 12 is used as an example because the quality score for frame number 12 is less than both frame number 11 and frame number 13.
And step S507, carrying out smoothing processing on the target image, and replacing the target image with the smoothed target image to obtain a preprocessed video.
In the embodiment of the present disclosure, the smoothing process may include processing the target image by using a low-pass filtering algorithm. And replacing the target image with the target image subjected to low-pass filtering processing to obtain a preprocessed video.
And S107, inputting the preprocessed video into a video super-resolution model for super-resolution processing, and outputting the processed video.
In the embodiment of the disclosure, the video super-resolution model is used for restoring one or more low-resolution images into a model of a high-resolution image. Referring to fig. 4, step 401 represents inputting a low-resolution video, step 402 represents performing noise reduction processing on the low-resolution video, step 403 represents inputting the video after the noise reduction processing into a video super-resolution model based on a bidirectional cyclic convolution network, and step 404 represents obtaining a high-resolution video.
Fig. 6(a) is an effect diagram of a video processing method in the prior art. Fig. 6(b) is an effect diagram illustrating a method of processing a video according to an exemplary embodiment. Referring to fig. 6(a), the image is a high-resolution video restored by upsampling the difference from the input low-resolution video. Referring to fig. 6(b), the image is an effect diagram after the video processing is performed on the low-resolution video by using the video processing method according to the embodiment of the disclosure. It can be seen that the method of the embodiments of the present disclosure is sharper than the prior art white vertical lines, while it is more blurred in fig. 6 (a).
It should be understood that, although the steps in the flowchart are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the figures may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed alternately or in alternation with other steps or at least some of the other steps or stages.
It is understood that the same/similar parts between the embodiments of the method described above in this specification can be referred to each other, and each embodiment focuses on the differences from the other embodiments, and it is sufficient that the relevant points are referred to the descriptions of the other method embodiments.
Fig. 7 is a block diagram illustrating a video processing apparatus according to an example embodiment. A video processing apparatus, comprising:
an obtaining module 701, configured to obtain a quality score of an image in a video;
a determining module 703, configured to determine a target image from the images according to the quality score;
a denoising module 705, configured to perform denoising processing on the target image, and replace the target image with the denoised target image to obtain a preprocessed video;
and the processing module 707 is configured to input the preprocessed video to a video super-resolution model for super-resolution processing, and output the processed video.
In one possible implementation manner, the obtaining module includes:
the first acquisition submodule is used for acquiring images in a video;
and the evaluation sub-module is used for inputting the image into an image quality evaluation model, outputting the image through the image quality evaluation model, and obtaining the noise type and the quality score of the image, wherein the image quality evaluation model is obtained by utilizing the corresponding relation between the sample image and the noise type and the quality score.
In a possible implementation manner, the method further includes a first generation sub-module, where the first generation sub-module includes:
the device comprises an acquisition unit, a quality evaluation unit and a processing unit, wherein the acquisition unit is used for acquiring a sample image set, and the sample image set comprises a plurality of sample images marked with noise categories and quality scores;
the prediction unit is used for inputting the sample image into an initial image quality evaluation model and generating a prediction result;
and the generating unit is used for iteratively adjusting the training parameters of the initial image quality evaluation model based on the difference between the prediction result and the labeled noise category and quality score until the difference meets the preset requirement, so as to obtain the image quality evaluation model.
In one possible implementation, the determining module includes:
the second obtaining submodule is used for obtaining an image with the quality score lower than a preset threshold value from the image;
and the first determining submodule is used for taking the image with the quality score lower than a preset threshold value as a target image.
In one possible implementation, the determining module includes:
and the second determining submodule is used for acquiring a target image from the image, wherein the quality score of the target image is determined to be respectively lower than the quality score of the image in the previous frame of the target image and the quality score of the image in the next frame of the target image.
In one possible implementation, the noise reduction module includes:
and the smoothing module is used for smoothing the target image and replacing the target image with the smoothed target image to obtain a preprocessed video.
In a possible implementation manner, the method further includes a generating module, where the generating module includes:
a third obtaining sub-module, configured to obtain a sample video set, where the sample image set includes a plurality of sample video pairs, where the sample video pairs include a first sample video and a second sample video matching with the first sample video, and a resolution of the first sample video is lower than a resolution of the second sample video;
the prediction submodule is used for inputting the first sample video into an initial video super-resolution model to generate a prediction result;
and the second generation submodule is used for iteratively adjusting the training parameters of the initial video super-resolution model based on the difference between the prediction result and the second sample video until the difference meets the preset requirement, so that the video super-resolution model is obtained.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
Fig. 8 is a block diagram illustrating an electronic device 800 for a method of processing video according to an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and so forth.
Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen providing an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive an external audio signal when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or components of the electronic device 800, the presence or absence of user contact with the electronic device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided that includes instructions executable by the processor 820 of the electronic device 800 to perform the above-described method.
Fig. 9 is a block diagram illustrating an electronic device 900 for a method of processing video in accordance with an example embodiment. For example, the electronic device 900 may be a server. Referring to fig. 9, electronic device 900 includes a processing component 920 that further includes one or more processors and memory resources, represented by memory 922, for storing instructions, such as applications, that are executable by processing component 920. The application programs stored in memory 922 may include one or more modules that each correspond to a set of instructions. Further, the processing component 920 is configured to execute instructions to perform the above-described methods.
The electronic device 900 may further include: the power component 924 is configured to perform power management of the electronic device 900, the wired or wireless network interface 926 is configured to connect the electronic device 900 to a network, and the input-output (I/O) interface 928. The electronic device 900 may operate based on an operating system stored in memory 922, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 922 comprising instructions, executable by a processor of the electronic device 900 to perform the above-described method is also provided. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided that includes instructions executable by a processor of the electronic device 900 to perform the above-described method.
It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for processing video, comprising:
acquiring quality scores of images in a video;
determining a target image from the images according to the quality scores;
carrying out noise reduction processing on the target image, and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video;
and inputting the preprocessed video into a video super-resolution model for super-resolution processing, and outputting to obtain a processed video.
2. The method of claim 1, wherein obtaining the quality score of the image in the video comprises:
acquiring an image in a video;
and inputting the image into an image quality evaluation model, and outputting the image through the image quality evaluation model to obtain the noise category and the quality score of the image, wherein the image quality evaluation model is obtained by utilizing the corresponding relation between the sample image and the noise category and the quality score.
3. The method according to claim 2, wherein the image quality evaluation model is obtained by training using a corresponding relationship between the sample image and the noise category and the quality score, and comprises:
acquiring a sample image set, wherein the sample image set comprises a plurality of sample images marked with noise categories and quality scores;
inputting the sample image into an initial image quality evaluation model to generate a prediction result;
and iteratively adjusting the training parameters of the initial image quality evaluation model based on the difference between the prediction result and the labeled noise category and quality score until the difference meets the preset requirement to obtain the image quality evaluation model.
4. The method of claim 1, wherein determining a target image from the images based on the quality scores comprises:
acquiring an image with a quality score lower than a preset threshold value from the image;
and taking the image with the quality score lower than a preset threshold value as a target image.
5. The method of claim 1, wherein determining a target image from the images based on the quality scores comprises:
and acquiring a target image from the image, wherein the quality score of the target image is determined to be respectively lower than the quality score of the image in the frame before the target image and the quality score of the image in the frame after the target image.
6. The method according to claim 1, wherein the performing noise reduction processing on the target image and replacing the target image with the target image after noise reduction processing to obtain a pre-processed video comprises:
and smoothing the target image, and replacing the target image with the smoothed target image to obtain a preprocessed video.
7. An apparatus for processing video, comprising:
the acquisition module is used for acquiring the quality score of the image in the video;
a determining module for determining a target image from the images according to the quality scores;
the noise reduction module is used for carrying out noise reduction processing on the target image and replacing the target image with the target image subjected to noise reduction processing to obtain a preprocessed video;
and the processing module is used for inputting the preprocessed video into the video super-resolution model for super-resolution processing and outputting the processed video.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of processing video of any of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of processing video of any of claims 1 to 6.
10. A computer program product comprising instructions which, when executed by a processor of an electronic device, enable the electronic device to carry out the method of processing video according to any one of claims 1 to 6.
CN202210268679.7A 2022-03-18 2022-03-18 Video processing method and device, electronic equipment and storage medium Pending CN114640815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210268679.7A CN114640815A (en) 2022-03-18 2022-03-18 Video processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210268679.7A CN114640815A (en) 2022-03-18 2022-03-18 Video processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114640815A true CN114640815A (en) 2022-06-17

Family

ID=81950302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210268679.7A Pending CN114640815A (en) 2022-03-18 2022-03-18 Video processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114640815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309274A (en) * 2022-12-12 2023-06-23 湖南红普创新科技发展有限公司 Method and device for detecting small target in image, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309274A (en) * 2022-12-12 2023-06-23 湖南红普创新科技发展有限公司 Method and device for detecting small target in image, computer equipment and storage medium
CN116309274B (en) * 2022-12-12 2024-01-30 湖南红普创新科技发展有限公司 Method and device for detecting small target in image, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107798669B (en) Image defogging method and device and computer readable storage medium
CN110084775B (en) Image processing method and device, electronic equipment and storage medium
CN110060215B (en) Image processing method and device, electronic equipment and storage medium
CN108154465B (en) Image processing method and device
CN105809704A (en) Method and device for identifying image definition
US11580327B2 (en) Image denoising model training method, imaging denoising method, devices and storage medium
CN108241855B (en) Image generation method and device
CN109784164B (en) Foreground identification method and device, electronic equipment and storage medium
CN111819837B (en) Method and system for identifying static video
CN112258404A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111985281A (en) Image generation model generation method and device and image generation method and device
CN111968052B (en) Image processing method, image processing apparatus, and storage medium
CN112188091B (en) Face information identification method and device, electronic equipment and storage medium
CN112634160A (en) Photographing method and device, terminal and storage medium
CN108171222B (en) Real-time video classification method and device based on multi-stream neural network
CN113160277A (en) Image processing method and device, electronic equipment and storage medium
CN114640815A (en) Video processing method and device, electronic equipment and storage medium
CN111553865B (en) Image restoration method and device, electronic equipment and storage medium
CN112750081A (en) Image processing method, device and storage medium
CN116805282A (en) Image super-resolution reconstruction method, model training method, device and electronic equipment
CN111741187A (en) Image processing method, device and storage medium
US20230063201A1 (en) Image processing device and super-resolution processing method
CN115953339A (en) Image fusion processing method, device, equipment, storage medium and chip
CN113592733A (en) Image processing method, image processing device, storage medium and electronic equipment
CN110910304B (en) Image processing method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination