CN114205642A

CN114205642A - Video image processing method and device

Info

Publication number: CN114205642A
Application number: CN202010898421.6A
Authority: CN
Inventors: 张海斌; 蔡媛; 樊鸿飞; 汪贤
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2022-03-18
Anticipated expiration: 2040-08-31
Also published as: CN114205642B

Abstract

The application relates to a video image processing method and device, wherein the method comprises the following steps: acquiring an initial frame sequence corresponding to a current video frame to be processed; calculating a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set; converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, wherein the similarity between the target frame sequence and the current video frame is higher than a similarity threshold value; and inputting the target frame sequence into a video image processing model to obtain a processing result corresponding to the current video frame, wherein the video image processing model is used for carrying out video image processing on the current video frame according to the target frame sequence. The method and the device solve the technical problem that the processing efficiency of the video image is low in the related technology.

Description

Video image processing method and device

Technical Field

The present application relates to the field of computers, and in particular, to a method and an apparatus for processing a video image.

Background

With the development of deep learning technology, image processing technology based on deep learning is increasingly applied to practical tasks. Compared with a video processing method based on a single-frame image processing technology, the video processing method based on continuous multi-frame processing can acquire not only information on the current frame space, but also information of reference frames, and the reference frames are more adjacent frames in a time sequence.

In the prior art, whether in model training or model inference, when selecting a reference frame, multiple frames immediately before and after a current frame are generally selected, including but not limited to: selecting backward continuous multiframes as reference frames; selecting a forward multiframe as a reference frame; both backward and forward multiframes are selected as reference frames. Wherein, the backward finger appears in time sequence, and the forward finger does not appear in time sequence.

The video is an animation scene composed of a plurality of continuous pictures in time sequence, and in general, the difference of contents between adjacent frames is small, and the similarity is very high. However, if there is a scene change in the video, or the motion of the content is severe, the content between adjacent frames is either completely different or very different. Therefore, in the model inference process under this condition, adjacent frames with greatly different contents are selected as reference frames, and since samples during model training are all sequences in the same scene, the model cannot process the conditions in different scenes, and as a result, the situation of artifacts in predicted frames occurs, and the processing efficiency of video images is affected.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The application provides a video image processing method and device, which are used for at least solving the technical problem of low video image processing efficiency in the related art.

According to an aspect of the embodiments of the present application, there is provided a method for processing a video image, including:

acquiring an initial frame sequence corresponding to a current video frame to be processed, wherein the initial frame sequence comprises a preamble frame set, the current video frame and a subsequent frame set, the preamble frame set comprises a first number of preamble video frames before the current video frame, and the subsequent frame set comprises a second number of subsequent video frames after the current video frame;

calculating a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set, wherein the first similarity coefficient is used for indicating the similarity between the current video frame and the preamble frame set, and the second similarity coefficient is used for indicating the similarity between the current video frame and the subsequent frame set;

converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, wherein the similarity between the target frame sequence and the current video frame is higher than a similarity threshold;

and inputting the target frame sequence into a video image processing model to obtain a processing result corresponding to the current video frame, wherein the video image processing model is used for carrying out video image processing on the current video frame according to the target frame sequence.

Optionally, converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient comprises:

comparing the first and second similarity coefficients with a similarity coefficient threshold, respectively;

and determining the target frame sequence according to the relation between the first similarity coefficient and a similarity coefficient threshold value and the relation between the second similarity coefficient and a similarity coefficient threshold value.

Optionally, determining the sequence of target frames according to the relationship between the first similarity coefficient and a similarity coefficient threshold and the relationship between the second similarity coefficient and a similarity coefficient threshold comprises:

determining the initial frame sequence as a target frame sequence if the first similarity coefficient and the second similarity coefficient are both greater than the similarity coefficient threshold;

and under the condition that at least one of the first similarity coefficient and the second similarity coefficient is smaller than the similarity coefficient threshold value, replacing the frame set corresponding to the smaller similarity coefficient in the initial frame sequence with the frame set corresponding to the larger similarity coefficient in the first similarity coefficient and the second similarity coefficient to obtain the target frame sequence.

Optionally, calculating a first similarity coefficient between the current video frame and the set of preceding frames, and a second similarity coefficient between the current video frame and the set of following frames comprises:

calculating the similarity between the current video frame and each preorder video frame in the preorder frame set;

determining the average value of the similarity of the current video frame and all the preamble video frames in the preamble frame set as the first similarity coefficient;

calculating the similarity between the current video frame and each subsequent video frame in the subsequent frame set;

and determining the average value of the similarity of the current video frame and all subsequent video frames in the subsequent frame set as the second similarity coefficient.

Optionally, before the target frame sequence is input into a video image processing model to obtain a processing result corresponding to the current video frame, the method further includes:

determining a plurality of similarity coefficient ranges;

obtaining a plurality of frame sequence samples with similarity coefficients falling into each similarity coefficient range in the plurality of similarity coefficient ranges and an original map corresponding to each frame sequence sample, wherein the frame sequence samples comprise preamble frame set samples, current frame samples and subsequent frame set samples, the preamble frame set samples comprise the first number of preamble frame samples before the current frame samples, the subsequent frame set samples comprise the second number of subsequent frame samples after the current frame samples, and the original map is the original map corresponding to the current frame samples;

and training an initial video image processing model by using the frame sequence samples with the corresponding relation and the original image to obtain the video image processing model.

Optionally, obtaining a plurality of samples of frame sequences with similarity coefficients falling within each of the plurality of similarity coefficient ranges comprises:

determining a sample weight corresponding to each of the plurality of similarity coefficient ranges, wherein the sample weight is proportional to the magnitude of each similarity coefficient range;

and acquiring a plurality of frame sequence samples falling into each similarity coefficient range according to the sample proportion corresponding to each similarity coefficient range.

Optionally, obtaining a plurality of frame sequence samples falling into each similarity coefficient range according to the sample proportion corresponding to each similarity coefficient range includes:

determining a total number of frame sequence samples;

determining the product of the sample proportion corresponding to each similarity coefficient range and the total number as the number of samples corresponding to each similarity coefficient range;

obtaining the number of frame sequence samples falling within the range of each similarity coefficient.

According to another aspect of the embodiments of the present application, there is also provided a video image processing apparatus, including:

a first obtaining module, configured to obtain an initial frame sequence corresponding to a current video frame to be processed, where the initial frame sequence includes a preamble frame set, the current video frame and a subsequent frame set, the preamble frame set includes a first number of preamble video frames before the current video frame, and the subsequent frame set includes a second number of subsequent video frames after the current video frame;

a calculating module, configured to calculate a first similarity coefficient between the current video frame and the preamble frame set, and a second similarity coefficient between the current video frame and the subsequent frame set, where the first similarity coefficient is used to indicate a similarity between the current video frame and the preamble frame set, and the second similarity coefficient is used to indicate a similarity between the current video frame and the subsequent frame set;

a conversion module, configured to convert the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, where a similarity between the target frame sequence and the current video frame is higher than a similarity threshold;

and the input module is used for inputting the target frame sequence into a video image processing model to obtain a processing result corresponding to the current video frame, wherein the video image processing model is used for carrying out video image processing on the current video frame according to the target frame sequence.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program which, when executed, performs the above-described method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method through the computer program.

In the embodiment of the application, an initial frame sequence corresponding to a current video frame to be processed is obtained, wherein the initial frame sequence comprises a preamble frame set, the current video frame and a subsequent frame set, the preamble frame set comprises a first number of preamble video frames before the current video frame, and the subsequent frame set comprises a second number of subsequent video frames after the current video frame; calculating a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set, wherein the first similarity coefficient is used for indicating the similarity between the current video frame and the preamble frame set, and the second similarity coefficient is used for indicating the similarity between the current video frame and the subsequent frame set; converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, wherein the similarity between the target frame sequence and the current video frame is higher than a similarity threshold value; inputting a target frame sequence into a video image processing model to obtain a processing result corresponding to a current video frame, wherein the video image processing model is used for acquiring a preamble frame set and a subsequent frame set of the current video frame as initial reference frames of the current video frame according to a mode of performing video image processing on the current video frame by the target frame sequence, respectively calculating a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set, converting the initial frame sequence into the target frame sequence according to the first similarity coefficient and the second similarity coefficient, so that the similarity between the target frame sequence and the current video frame is higher than a similarity threshold value, and performing video image processing by using the target frame sequence, wherein the similarity between each video frame in the target frame sequence meets a certain requirement, thereby ensuring the similarity between the reference frames in the target frame sequence and the current video frame, the processing of the current video frame by the reference frame is more consistent with the situation of the current video frame, and the purpose of eliminating the difference between the reference frame and the current video frame is achieved, so that the technical effect of improving the processing efficiency of the video image is realized, and the technical problem of low processing efficiency of the video image in the related technology is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of a hardware environment of a method of processing a video image according to an embodiment of the present application;

FIG. 2 is a flow chart of an alternative method of processing video images according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative video image processing apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of embodiments of the present application, there is provided an embodiment of a method of processing a video image.

Alternatively, in the present embodiment, the processing method of the video image can be applied to a hardware environment formed by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide services (such as game services, application services, etc.) for the terminal or a client installed on the terminal, and a database may be provided on the server or separately from the server for providing data storage services for the server 103, and the network includes but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, and the like. The video image processing method according to the embodiment of the present application may be executed by the server 103, the terminal 101, or both the server 103 and the terminal 101. The terminal 101 may execute the video image processing method according to the embodiment of the present application by a client installed thereon.

Fig. 2 is a flow chart of an alternative video image processing method according to an embodiment of the present application, which may include the following steps, as shown in fig. 2:

step S202, obtaining an initial frame sequence corresponding to a current video frame to be processed, wherein the initial frame sequence comprises a preamble frame set, the current video frame and a subsequent frame set, the preamble frame set comprises a first number of preamble video frames before the current video frame, and the subsequent frame set comprises a second number of subsequent video frames after the current video frame;

step S204, calculating a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set, wherein the first similarity coefficient is used for indicating the similarity between the current video frame and the preamble frame set, and the second similarity coefficient is used for indicating the similarity between the current video frame and the subsequent frame set;

step S206, converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, wherein the similarity between the target frame sequence and the current video frame is higher than a similarity threshold;

step S208, inputting the target frame sequence into a video image processing model to obtain a processing result corresponding to the current video frame, wherein the video image processing model is used for performing video image processing on the current video frame according to the target frame sequence.

Through the above steps S202 to S208, a preamble frame set and a subsequent frame set of a current video frame are obtained as initial reference frames of the current video frame, a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set are respectively calculated, the initial frame sequence is converted into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, so that the similarity between the target frame sequence and the current video frame is higher than a similarity threshold, and the target frame sequence is used for video image processing, the similarity between each video frame in the target frame sequence meets a certain requirement, thereby ensuring the similarity between a reference frame in the target frame sequence and the current video frame, and enabling the processing of the current video frame by using the reference frame to better conform to the situation of the current video frame, the purpose of eliminating the difference between the reference frame and the current video frame is achieved, the technical effect of improving the processing efficiency of the video image is achieved, and the technical problem of low processing efficiency of the video image in the related technology is solved.

In the technical solution provided in step S202, the first number and the second number may be the same or different, for example: it can take 15 preamble video frames of the current video frame as the preamble frame set and 10 subsequent video frames of the current video frame as the subsequent frame set. Alternatively, 10 of the preceding video frame and 10 of the following video frame may be used.

Optionally, in this embodiment, the following takes the respective N frames before and after the time sequence as the reference frames. That is, the number of channels input by the video image processing model is 2N + 1.

Optionally, in this embodiment, both the preceding frame set and the following frame set may be referred to as a reference frame set of the current video frame. The current video frame can be processed using the information in the set of reference frames.

In the technical solution provided in step S204, the larger the value of the similarity coefficient obtained by the similarity calculation method is, the higher the similarity between two pictures is, the more similar the content is, and the smaller the difference is.

Alternatively, in the present embodiment, the calculation algorithm of the SIMilarity coefficient may include, but is not limited to, PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural SIMilarity), and the like.

In the above step S204, the first similarity coefficient and the second similarity coefficient may be calculated in the following manner, but not limited thereto:

s11, calculating the similarity between the current video frame and each preamble video frame in the preamble frame set;

s12, determining the average value of the similarity of the current video frame and all the preamble video frames in the preamble frame set as the first similarity coefficient;

s13, calculating the similarity between the current video frame and each subsequent video frame in the subsequent frame set;

s14, determining an average value of the similarity between the current video frame and all subsequent video frames in the subsequent frame set as the second similarity coefficient.

Alternatively, in this embodiment, for the calculation of the similarity coefficient between the current video frame and each reference frame set, the similarity between the current video frame and each frame in the reference frame set may be calculated first, and an average value of the similarities between the current video frame and all frames in the reference frame set is counted as the similarity coefficient between the current video frame and each reference frame set.

Optionally, in this embodiment, the average value of the similarity may include, but is not limited to: arithmetic mean, geometric mean, squared mean (root mean square, rms), harmonic mean, weighted mean, and the like.

In the technical solution provided in step S206, the initial frame sequence may be converted into the target frame sequence according to a first similarity coefficient between the current video frame and the set of preceding frames and a second similarity coefficient between the current video frame and the set of following frames, so that a similarity between the target frame sequence and the current video frame is higher than a similarity threshold.

In the above step S206, the initial frame sequence may be converted into the target frame sequence by, but not limited to:

s21, comparing the relationship between the first and second similarity coefficients and a similarity coefficient threshold value, respectively;

s22, determining the target frame sequence according to the relation between the first similarity coefficient and the similarity coefficient threshold value and the relation between the second similarity coefficient and the similarity coefficient threshold value.

Optionally, in this embodiment, the setting of the similarity threshold thresh may be dynamically adjusted according to different application scenarios. In one example, the parameter may be considered to indicate whether a content switch or a scene with intense motion occurs in the sequence. If the average similarity coefficient of the current prediction frame (namely the current video frame) and the reference frame sequence is larger than thresh, the corresponding sequence is highly related in content similarity; on the contrary, if the similarity coefficient is smaller than thresh, it indicates that the corresponding sequences have low content similarity and great diversity.

As an alternative embodiment, the target frame sequence may be determined, but is not limited to, by:

s31, determining the initial frame sequence as a target frame sequence under the condition that the first similarity coefficient and the second similarity coefficient are both larger than the similarity coefficient threshold value;

s32, when at least one of the first similarity coefficient and the second similarity coefficient is smaller than the similarity coefficient threshold, replacing the set of frames corresponding to the smaller similarity coefficient in the initial frame sequence with the set of frames corresponding to the larger similarity coefficient in the first similarity coefficient and the second similarity coefficient to obtain the target frame sequence.

Optionally, in this embodiment, if both the first similarity coefficient and the second similarity coefficient are greater than the similarity coefficient threshold, which indicates that the content of the preceding frame set and the subsequent frame set is highly related to the current video frame, the initial frame sequence is not adjusted, and the initial frame sequence is directly determined as the target frame sequence.

Optionally, in this embodiment, if at least one of the first similarity coefficient and the second similarity coefficient is smaller than the similarity coefficient threshold, it indicates that one of the preceding frame set and the subsequent frame set has a lower content correlation with the current video frame, or both of the two frame sets have a lower content correlation with the current video frame, and then the frame set corresponding to the greater similarity coefficient of the first similarity coefficient and the second similarity coefficient is used to replace the frame set corresponding to the smaller similarity coefficient of the initial frame sequence, so as to obtain the target frame sequence.

In an alternative embodiment, a process for determining a sequence of target frames is provided, in which a similarity threshold is set to thresh, and for a video sequence, reasoning is performed for each frame in the video according to the following steps:

step 11: respectively selecting N frames before and after a time sequence for a current frame to form an initial frame sequence with the length of 2N + 1;

step 12: respectively calculating a similarity coefficient c1 between the current frame (predicted frame) and the N frame before time sequence and a similarity coefficient c2 between the current frame (predicted frame) and the N frame after time sequence, and respectively comparing the relation between c1 and c2 and a similarity threshold thresh:

step 121: if c1 and c2 are both greater than thresh, which indicates that the content similarity of the initial frame sequence with the length of 2N +1 is large, the difference is small, and no scene with content switching or severe motion exists in the frame sequence, the initial frame sequence is directly used as the target frame sequence.

Step 122: if there is at least one value of c1 and c2 less than thresh, the similarity coefficients of the reference frames including both sides are less than thresh and the similarity coefficient of the reference frame only one side is less than thresh. This situation indicates that there is a scene with content switching or intense motion for the 2N +1 length sequence. Comparing the sizes of c1 and c2, and selecting the reference frame corresponding to the larger similarity coefficient to replace the reference frame with the smaller similarity coefficient in the original frame sequence to form a new frame sequence, so that the content of the reference frame in the new frame sequence is more similar to and less different from the content of the predicted frame.

For example, the initial frame sequence is represented in a time-sequential manner as [ t-N, t- (N-1), …, t-1, t, t +1, …, t + (N-1), t + N ], where t is the current predicted frame. If the similarity coefficient of the two side reference frames is less than the threshold thresh, the two side similarity coefficients are compared, if the similarity coefficient of the left side reference frame is greater than the similarity coefficient of the right side reference frame, the right side reference frame is replaced by the left side reference frame, and the new frame sequence is expressed as [ t-N, t- (N-1), …, t-1, t, t-1, …, t- (N-1), t-N ] in a time sequence mode.

Step 13: and repeating the step 11 and the step 12 until all the frames of the video are processed.

In the technical solution provided in step S208, the video image processing model may use different model structures, and the same purpose can be achieved by using the above solution when selecting the reference frame.

Optionally, in this embodiment, while the video image is processed through a model, the model training process and the model input improvement method provided by the present invention can be used not only for denoising, but also in the fields of video enhancement and video super-segmentation. Only the model required by the corresponding task needs to be modified, and if the input is the same, the purpose required by the corresponding task can be achieved.

Optionally, in this embodiment, the video image processing model may be, but is not limited to, used for denoising, enhancing, super-dividing, and the like of the video image. The video image processing can not only directly process the video frame by frame, but also analyze the frame in the reference time sequence when processing the current frame, in this case, the input of the video processing is usually a sequence composed of multiple frames of pictures, including the current frame to be processed and the multiple frame reference frame in the time sequence. Therefore, the video image processing model can be analyzed not only by using the relation of spatial pixels, but also by using time-series information.

As an alternative embodiment, before the step S208, the video image processing model may also be trained, but not limited to, by the following process:

s41, determining a plurality of similarity coefficient ranges;

s42, obtaining a plurality of frame sequence samples with similarity coefficients falling into each similarity coefficient range of the plurality of similarity coefficient ranges and an original map corresponding to each frame sequence sample, where the frame sequence samples include a preamble frame set sample, a current frame sample and a subsequent frame set sample, the preamble frame set sample includes the first number of preamble frame samples before the current frame sample, the subsequent frame set sample includes the second number of subsequent frame samples after the current frame sample, and the original map is an original map corresponding to the current frame sample;

and S43, training an initial video image processing model by using the frame sequence samples with the corresponding relation and the original image to obtain the video image processing model.

Optionally, in this embodiment, the multiple similarity coefficient ranges can enrich the diversity of the similarity coefficients in the training samples, so that the training samples do not have the phenomenon of generally higher similarity or generally lower similarity. Such as: the plurality of similarity coefficient ranges may include, but is not limited to: 0.95, 1, 0.8,0.95 and 0.6,0.8, thus ensuring that the training samples can be distributed more uniformly between the similarity coefficients of 0.6 and 1. Wherein the plurality of similarity coefficient ranges may be continuous, discontinuous, have an overlap, and the like.

Optionally, in this embodiment, the frame sequence samples and the original diagrams corresponding to the frame sequence samples are obtained according to the similarity coefficient range, the original diagrams corresponding to the frame sequence samples may be used as labels of the frame sequence samples, and the frame sequence samples with the original diagrams labeled are obtained and used as training samples to train the initial video image processing model, so as to obtain the video image processing model.

In the above step S42, a plurality of frame sequence samples corresponding to each similarity coefficient range may be obtained, but is not limited to, by the following method:

s51, determining a sample proportion corresponding to each similarity coefficient range in the similarity coefficient ranges, wherein the sample proportion is in direct proportion to the numerical value of each similarity coefficient range;

and S52, obtaining a plurality of frame sequence samples falling into each similarity coefficient range according to the sample proportion corresponding to each similarity coefficient range.

Optionally, in this embodiment, a sample specific gravity is determined for each similarity coefficient range, the sample specific gravity being proportional to the magnitude of the value of each similarity coefficient range. That is, the larger the value of the similarity coefficient range is, the larger the sample proportion thereof is, and the more frame sequence samples falling into the similarity coefficient range are obtained, thereby avoiding excessive differences in the training set. Such as: the above-mentioned plurality of similarity coefficient ranges [0.95, 1], [0.8,0.95] and [0.6,0.8], allocated sample specific gravities may be, but are not limited to, 45%, 30% and 25%, respectively.

In the above step S52, the obtaining of the plurality of frame sequence samples falling within each similarity coefficient range may include, but is not limited to, the following:

s61, determining the total number of frame sequence samples;

s62, determining the product of the sample proportion corresponding to each similarity coefficient range and the total number as the sample number corresponding to each similarity coefficient range;

s63, obtaining the number of frame sequence samples of the sample number falling into each similarity coefficient range.

Optionally, in this embodiment, each training sample of the model training includes an input sample (i.e., a frame sequence sample) composed of 2N +1 pictures and an original real picture (i.e., an original picture) corresponding to an intermediate frame (i.e., a current frame sample), where a frame in the middle of the input sample is a predicted frame, and each of the left and right N frames of the predicted frame is a reference frame. The purpose of model training is to adjust model parameters, so that the model infers the middle predicted frame through the left and right N frame reference frames, and the inference result is as close to a real picture as possible. When constructing input samples for 2N +1 channels, sequences of different similarity differences may be selected for consideration. Such as: without loss of generality, 3 sets of similarity coefficients a1, a2 and a3 are selected, the sizes of the similarity coefficients are all between [0 and 1], and the sizes meet the requirements that a1> is a2> is a 3. The larger the similarity coefficient is, the larger the similarity of 2N +1 pictures in the sample is, and the smaller the content difference is. The similarity of each sample is the average of the similarities of the current frame and the 2N reference frames within that sample. Samples corresponding to different similarity coefficients can be added into a training set in different proportions, the specific gravities are b1, b2 and b3 respectively, the sum of the specific gravities is 1, and the size can meet the requirement that b1> is b2> is b 3. The training steps may be as follows:

step 21: constructing corresponding samples by calculating the average similarity coefficient of each subsequence with the length of 2N +1 according to the 3 groups of similarity coefficients, wherein the corresponding samples comprise input samples consisting of 2N +1 frame samples and a real original image corresponding to a predicted frame;

step 22: and respectively selecting corresponding sample sets according to the proportion b1, b2 and b3 corresponding to the similarity coefficient. That is, if the total number of samples in the training set is M, the number of sample sets corresponding to the similarity coefficient a1 is b1 × M, the number of sample sets corresponding to the similarity coefficient a2 is b2 × M, and the number of sample sets corresponding to the similarity coefficient a3 is b3 × M.

Step 23: and setting initial parameters and a loss function of the model, and starting model training. And after each round of training is finished, the model parameters are adjusted, so that the loss of model training is minimized. And finally obtaining the trained model.

Optionally, in this embodiment, multiple sets of similarity coefficients are selected to construct training samples, so as to increase diversity of the training set, and make the model adapt to the difference between the predicted frame and the reference frame as much as possible, and samples corresponding to smaller similarity coefficients have a smaller proportion, so as to avoid too many differences in the training set, so that the model training has a negative result.

Through the process, firstly, when a model training data set is selected, sample diversity is constructed for model training samples as much as possible, and a plurality of samples with different similarity coefficients are selected, so that the model obtained by training can adapt to differences as much as possible; and secondly, during model reasoning, dynamically adjusting the reference frame according to the similarity of the left and right adjacent frames of the predicted frame, so that the reference frame and the predicted frame are in the same or similar scene as much as possible (the performance can be represented by a similarity coefficient, and the larger the similarity coefficient is, the higher the possibility of representing in the same scene is), thereby ensuring the usefulness of the reference frame and avoiding the occurrence of motion artifact.

In this embodiment, a data set trained by a model is selected by judging the difference of a predicted frame and a reference frame in content; and judging whether scene change occurs on the left side and the right side or not according to the difference of the contents, replacing the reference frame with large content difference by the reference frame with small content difference, and performing model reasoning.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling an electronic device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is also provided a video image processing apparatus for implementing the above video image processing method. Fig. 3 is a schematic diagram of an alternative video image processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the apparatus may include:

a first obtaining module 32, configured to obtain an initial frame sequence corresponding to a current video frame to be processed, where the initial frame sequence includes a preamble frame set, the current video frame and a subsequent frame set, the preamble frame set includes a first number of preamble video frames before the current video frame, and the subsequent frame set includes a second number of subsequent video frames after the current video frame;

a calculating module 34, configured to calculate a first similarity coefficient between the current video frame and the preamble frame set, and a second similarity coefficient between the current video frame and the subsequent frame set, where the first similarity coefficient is used to indicate a similarity between the current video frame and the preamble frame set, and the second similarity coefficient is used to indicate a similarity between the current video frame and the subsequent frame set;

a conversion module 36, configured to convert the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, where a similarity between the target frame sequence and the current video frame is higher than a similarity threshold;

an input module 38, configured to input the target frame sequence into a video image processing model to obtain a processing result corresponding to the current video frame, where the video image processing model is configured to perform video image processing on the current video frame according to the target frame sequence.

It should be noted that the first obtaining module 32 in this embodiment may be configured to execute step S202 in this embodiment, the calculating module 34 in this embodiment may be configured to execute step S204 in this embodiment, the converting module 36 in this embodiment may be configured to execute step S206 in this embodiment, and the input module 38 in this embodiment may be configured to execute step S208 in this embodiment.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.

Through the modules, a preamble frame set and a subsequent frame set of a current video frame are obtained as initial reference frames of the current video frame, a first similarity coefficient between the current video frame and the preamble frame set and a second similarity coefficient between the current video frame and the subsequent frame set are respectively calculated, the initial frame sequence is converted into a target frame sequence according to the first similarity coefficient and the second similarity coefficient, so that the similarity between the target frame sequence and the current video frame is higher than a similarity threshold value, the target frame sequence is used for video image processing, the similarity between the video frames in the target frame sequence meets certain requirements, therefore, the similarity between a reference frame in the target frame sequence and the current video frame is ensured, the processing of the current video frame by using the reference frame is more consistent with the condition of the current video frame, and the purpose of eliminating the difference between the reference frame and the current video frame is achieved, therefore, the technical effect of improving the processing efficiency of the video image is achieved, and the technical problem of low processing efficiency of the video image in the related technology is solved.

As an alternative embodiment, the conversion module comprises:

a comparison unit for comparing the relationship between the first and second similarity coefficients and a similarity coefficient threshold value, respectively;

a first determining unit, configured to determine the target frame sequence according to a relationship between the first similarity coefficient and a similarity coefficient threshold and a relationship between the second similarity coefficient and a similarity coefficient threshold.

As an alternative embodiment, the first determining unit is configured to:

As an alternative embodiment, the calculation module comprises:

the first calculating unit is used for calculating the similarity between the current video frame and each preamble video frame in the preamble frame set;

a second determining unit, configured to determine an average value of similarities between the current video frame and all preamble video frames in the preamble frame set as the first similarity coefficient;

a second calculating unit, configured to calculate a similarity between the current video frame and each subsequent video frame in the subsequent frame set;

a third determining unit, configured to determine an average value of similarities between the current video frame and all subsequent video frames in the subsequent frame set as the second similarity coefficient.

As an alternative embodiment, the apparatus further comprises:

a determining module, configured to determine a plurality of similarity coefficient ranges before inputting the target frame sequence into a video image processing model and obtaining a processing result corresponding to the current video frame;

a second obtaining module, configured to obtain a plurality of frame sequence samples and an original map corresponding to each frame sequence sample, where a similarity coefficient of each frame sequence sample falls in each similarity coefficient range of the plurality of similarity coefficient ranges, where the frame sequence samples include a preamble frame set sample, a current frame sample, and a subsequent frame set sample, the preamble frame set sample includes the first number of preamble frame samples before the current frame sample, the subsequent frame set sample includes the second number of subsequent frame samples after the current frame sample, and the original map is an original map corresponding to the current frame sample;

and the training module is used for training an initial video image processing model by using the frame sequence samples with the corresponding relation and the original image to obtain the video image processing model.

As an alternative embodiment, the second obtaining module includes:

a fourth determining unit, configured to determine a sample weight corresponding to each of the similarity coefficient ranges, where the sample weight is proportional to a numerical value of each of the similarity coefficient ranges;

and the acquisition unit is used for acquiring a plurality of frame sequence samples falling into each similarity coefficient range according to the sample proportion corresponding to each similarity coefficient range.

As an alternative embodiment, the obtaining unit is configured to:

determining a total number of frame sequence samples;

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.

According to a further aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the video image processing method, as shown in fig. 4, the electronic device includes a memory 402 and a processor 404, the memory 402 stores a computer program, and the processor 404 is configured to execute the steps in any one of the method embodiments through the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 4 is a diagram illustrating the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.

The memory 402 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for displaying a data form in the embodiment of the present invention, and the processor 404 executes various functional applications and data processing by running the software programs and modules stored in the memory 402, that is, implements the above-described method for displaying a data form. The memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 402 may further include memory located remotely from the processor 404, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 402 may be, but not limited to, specifically configured to store information such as feature information and probability result of an account to be processed. As an example, as shown in fig. 4, the memory 402 may include, but is not limited to, a first obtaining module 4022, a calculating module 4024, a converting module 4026, and an input module 4028 in the video image processing apparatus. In addition, the display device may further include, but is not limited to, other module units in the display device of the data form, which is not described in detail in this example.

Optionally, the transmission device 406 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 406 includes a network adapter (NIC) that can be connected to a router via a network cable and other network devices to communicate with the internet or a local area network. In one example, the transmission device 406 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In addition, the electronic device further includes: a display 408, configured to display the characteristic information and the probability result of the account to be processed; and a connection bus 410 for connecting the respective module parts in the above-described electronic apparatus.

Embodiments of the present application also provide a storage medium. Alternatively, in the present embodiment, the storage medium may be a program code for executing a processing method of a video image.

Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for processing video images, comprising:

2. The method of claim 1, wherein converting the initial frame sequence into a target frame sequence according to the first similarity coefficient and the second similarity coefficient comprises:

3. The method of claim 2, wherein determining the sequence of target frames based on the relationship between the first similarity coefficient and a similarity coefficient threshold and the relationship between the second similarity coefficient and a similarity coefficient threshold comprises:

4. The method of claim 1, wherein calculating a first similarity coefficient between the current video frame and the set of preceding frames and a second similarity coefficient between the current video frame and the set of subsequent frames comprises:

5. The method of claim 1, wherein before inputting the sequence of target frames into a video image processing model to obtain a corresponding processing result for the current video frame, the method further comprises:

determining a plurality of similarity coefficient ranges;

6. The method of claim 5, wherein obtaining a plurality of samples of the sequence of frames having similarity coefficients falling within each of the plurality of similarity coefficient ranges comprises:

7. The method of claim 6, wherein obtaining a plurality of frame sequence samples falling within each similarity coefficient range according to the sample proportion corresponding to the each similarity coefficient range comprises:

determining a total number of frame sequence samples;

8. A video image processing apparatus, comprising:

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program when executed performs the method of any of the preceding claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method of any of the preceding claims 1 to 7 by means of the computer program.