CN110650370B

CN110650370B - Video coding parameter determination method and device, electronic equipment and storage medium

Info

Publication number: CN110650370B
Application number: CN201910995614.0A
Authority: CN
Inventors: 赵明菲; 于冰; 郑云飞; 闻兴; 王晓楠; 黄晓政; 张元尊; 陈敏; 陈宇聪; 黄跃; 郭磊
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2021-09-24
Anticipated expiration: 2039-10-18
Also published as: CN110650370A

Abstract

The present disclosure relates to a method, an apparatus, an electronic device and a storage medium for determining video encoding parameters, wherein the method comprises: cutting a video to be encoded into a plurality of video segments; calculating complexity characteristics of each video clip, wherein the complexity characteristics are used for representing the time complexity and the space complexity of the video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip. According to the technical scheme provided by the embodiment of the invention, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.

Description

Video coding parameter determination method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video technologies, and in particular, to a method and an apparatus for determining video encoding parameters, an electronic device, and a storage medium.

Background

With the rapid development of video technology, a video distribution platform receives a large number of videos uploaded by a large number of terminals every day, before the videos are uploaded by the terminals, the videos to be uploaded need to be encoded, and then the encoded videos are uploaded to the video distribution platform.

In the related art, in order to ensure the video quality of the coded video, a higher coding rate is determined for each video to be uploaded, so that the terminal may exceed the current network bandwidth in the process of uploading the coded video, thereby causing a failure in uploading the video.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device and a storage medium for determining video encoding parameters, and the technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a method for determining video encoding parameters, the method comprising:

cutting a video to be encoded into a plurality of video segments;

calculating complexity characteristics of each video clip, wherein the complexity characteristics are used for representing the time complexity and the space complexity of the video clip;

inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip;

wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;

the convex hull coding parameters are used to characterize: and encoding the sample video clips by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clips is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clips is less than the current network bandwidth.

Optionally, the process of training the pre-trained neural network model includes:

slicing the sample video into a plurality of first sample video segments;

decoding a sample video coded by a plurality of preset coding parameters, and dividing the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;

for each second sample video segment, calculating the video quality of the second sample video segment according to the second sample video segment and the first sample video segment corresponding to the second sample video segment;

determining a video quality difference matrix corresponding to each preset coding parameter according to the video quality of each second sample video clip;

determining convex hull coding parameters corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix and the current network bandwidth;

for each first sample video clip, inputting the complexity characteristics and the current network bandwidth of the first sample video clip into a neural network model, and training the neural network model until the coding parameters output from the neural network model are as follows: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.

Optionally, the determining, according to the video quality of each second sample video segment, a video quality difference matrix corresponding to each preset encoding parameter includes:

for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;

determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to the target preset coding parameter;

and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.

Optionally, the determining, according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth, the convex hull coding parameter corresponding to each first sample video segment includes:

for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;

for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;

determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;

and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.

Optionally, the calculating the complexity characteristic of each video segment includes:

and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.

and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.

Optionally, the complexity feature comprises at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.

Optionally, the encoding parameter includes at least one of the following parameters: code rate, resolution, and frame rate.

According to a second aspect of the embodiments of the present disclosure, there is provided a video coding parameter determination apparatus, the apparatus including:

a video slicing module configured to perform slicing of a video to be encoded into a plurality of video segments;

the feature calculation module is configured to perform calculation on complexity features of the video segments, wherein the complexity features are used for representing the time complexity and the space complexity of the video segments;

the encoding parameter acquisition module is configured to input the complexity characteristics and the current network bandwidth of each video segment into a pre-trained neural network model to obtain the encoding parameters of each video segment;

Optionally, the included model training module includes:

a first video slicing unit configured to perform slicing of the sample video into a plurality of first sample video segments;

the second video segmentation unit is configured to decode the sample video coded by the preset coding parameters and segment the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;

a video quality calculation unit configured to perform, for each second sample video segment, calculating a video quality of the second sample video segment from the second sample video segment and a first sample video segment corresponding to the second sample video segment;

a difference matrix determining unit configured to determine a video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment;

a convex hull coding parameter determining unit configured to determine a convex hull coding parameter corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix, and a current network bandwidth;

a model training unit configured to perform, for each first sample video segment, inputting the complexity characteristics and the current network bandwidth of the first sample video segment into a neural network model, and training the neural network model until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.

Optionally, the difference matrix determining unit is configured to perform:

Optionally, the convex hull coding parameter determining unit is configured to perform:

Optionally, the feature calculating module is configured to perform:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video coding parameter determination method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the video coding parameter determination method of the first aspect.

According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to implement the video coding parameter determination method of the first aspect.

According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.

The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.

Drawings

Fig. 1 is a flow diagram illustrating a method of video coding parameter determination in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a process of training a neural network model in accordance with an exemplary embodiment;

FIG. 3 is a flowchart illustrating one particular implementation of S25, according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating the acquisition of complexity characteristics of a sample video in accordance with an exemplary embodiment;

FIG. 5 is a diagram illustrating obtaining a video quality difference matrix according to an exemplary embodiment;

fig. 6 is a block diagram illustrating a video encoding parameter determination apparatus according to an example embodiment;

FIG. 7 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;

fig. 8 is a block diagram illustrating a video encoding parameter determination apparatus according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a video coding parameter determining method according to an exemplary embodiment, where the method is used for a video coding parameter determining apparatus, and the video coding parameter determining apparatus is operated in an electronic device, and the electronic device has a video coding function, and the electronic device is not particularly limited by the embodiment of the present disclosure.

As shown in fig. 1, the method may include the following steps.

In step S11, the video to be encoded is sliced into a plurality of video segments.

The video to be encoded may be sliced into a plurality of video segments before it is encoded. For example, the video to be encoded may be sliced into a plurality of video segments, each having a duration of 3 seconds. Of course, the number of video segments into which the video to be encoded is specifically divided may be determined according to actual situations, and this is not specifically limited in the embodiment of the present disclosure.

In step S12, the complexity characteristics of each video segment are calculated.

Wherein the complexity feature is used to characterize the temporal complexity and spatial complexity of the video segment.

After the video to be encoded is sliced into a plurality of video segments, the complexity characteristics of each video segment may be calculated. The complexity feature may include a temporal complexity feature for characterizing a temporal complexity of the video segment and a spatial complexity feature for characterizing a spatial complexity of the video segment.

In particular, the complexity feature may comprise at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding. Of course, the embodiments of the present disclosure do not specifically limit the complexity feature.

In one embodiment, calculating the complexity characteristic of each video segment may include:

In this embodiment, when calculating the complexity feature of each video segment, the complexity feature of each video frame of the video segment may be calculated, so that the calculated complexity feature of the video segment is more accurate.

In another embodiment, calculating the complexity characteristic of each video segment may include:

In this embodiment, when calculating the complexity feature of each video segment, only the complexity feature of the key video frame of the video segment may be calculated, that is, the complexity feature of each video frame in the video segment is not calculated, but the key video frame of the video segment is selected to calculate the complexity feature, so that the amount of calculation may be reduced.

In step S13, the complexity characteristics and the current network bandwidth of each video segment are input into a pre-trained neural network model to obtain the encoding parameters of each video segment.

The pre-trained neural network model is as follows: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;

the convex hull coding parameters are used to characterize: and encoding the sample video fragments by using the convex hull encoding parameters, wherein the video quality of the encoded sample video fragments is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video fragments is less than the current network bandwidth.

The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip is high, namely the video quality is higher than the preset video quality, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. It can be understood that the preset video quality may be determined according to actual situations, and the preset video quality is not specifically limited in the embodiments of the present disclosure.

Therefore, after the complexity characteristics of each video segment are obtained through calculation, the complexity characteristics of each video segment and the current network bandwidth can be input into the pre-trained neural network model, and the coding parameters of each video segment are output from the pre-trained neural network model, so that the coding parameters of each video segment can be obtained. For example, the complexity characteristic of the ith video segment and the current network bandwidth are input into a pre-trained neural network model, so as to obtain the coding parameters of the (i + 1) th video segment.

As can be seen, for each video segment, the video segment is encoded by using the encoding parameters output from the pre-trained neural network model, and the video quality of the encoded video segment is higher, that is, the video quality is higher than the preset video quality; and the network bandwidth required for uploading the coded sample video is less than the current network bandwidth. Therefore, the video quality of the coded video can be guaranteed, the network bandwidth required by uploading the coded video can be guaranteed to be smaller than the current network bandwidth, and the coded video can be successfully uploaded.

It should be noted that the encoding parameter includes at least one of the following parameters: code rate, resolution, and frame rate. Of course, the encoding parameters are not specifically limited in the embodiments of the present disclosure.

For clarity of description of the scheme, the process of training the above-mentioned pre-trained neural network model will be explained in detail in the following embodiments.

In one embodiment, the process of training the pre-trained neural network model may include the following steps, as shown in fig. 2.

In step S21, the sample video is cut into a plurality of first sample video segments.

In step S22, the sample video encoded by the preset encoding parameters is decoded, and the decoded sample video is respectively cut into a plurality of second sample video segments.

Wherein the number of the first sample video segments is the same as the number of the second sample video segments.

Specifically, the sample video may be encoded by using a plurality of preset encoding parameters, so as to obtain a plurality of encoded sample videos. For example, assuming that the preset encoding parameters are preset encoding rates, the sample videos may be encoded by using different preset encoding rates. Specifically, the sample video may be encoded using 450k (kbps, kilobits/second), 500k, 550k, 600k, 650k, … …, 1000 k.

The sample video is sliced into a plurality of first sample video segments. Moreover, after a plurality of sample videos coded by using the preset coding parameters are obtained, the sample videos coded by using the preset coding parameters can be decoded, and the decoded sample videos are respectively divided into a plurality of second sample video segments, wherein the number of the first sample video segments is the same as that of the second sample video segments.

In step S23, for each second sample video segment, the video quality of the second sample video segment is calculated according to the second sample video segment and the first sample video segment corresponding to the second sample video segment.

The sample video is segmented to obtain a plurality of first sample video segments, and the decoded sample video is segmented to obtain a plurality of second sample video segments. Since the number of video segments of the first sample is the same as the number of video segments of the second sample. That is, each first sample video segment corresponds to one second sample video segment.

For example, there are three first sample video clips, which are: a first sample video clip 1, a first sample video clip 2, and a first sample video clip 3. The second sample video clip has three video clips: a second sample video segment 1, a second sample video segment 2, and a second sample video segment 3. Then, the first sample video segment 1 and the second sample video segment 1 correspond; the first sample video segment 2 corresponds to the second sample video segment 2; the first sample video segment 3 corresponds to the second sample video segment 3.

For each second sample video segment, the video quality of the second sample video segment may be calculated based on the second sample video segment and the first sample video segment corresponding to the second sample video segment. The video quality can be measured by VMAF (video multi-method Assessment Fusion). Of course, the embodiment of the present disclosure is not particularly limited thereto.

In step S24, a video quality difference matrix corresponding to each preset encoding parameter is determined according to the video quality of each second sample video segment.

In one embodiment, determining the video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment may include the following two steps, namely step a1 and step a 2:

step a1, for a target second sample video segment, calculating a video quality difference value between the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment.

Step a2, determining the vector formed by the video quality difference values as the video quality difference value matrix corresponding to the target preset coding parameter.

Specifically, for each second sample video segment, that is, the target second sample video segment, there is a second sample video segment whose preset encoding parameter is adjacent to the target preset encoding parameter of the target second sample video segment. That is, there is a second sample video segment corresponding to the target second sample video segment. The video quality of the target second sample video segment may be made worse than the video quality of the target sample video segment. Thus, each target second sample video segment corresponds to a video quality difference. And determining a vector formed by the video quality difference values corresponding to the target second sample video clips as a video quality difference value matrix corresponding to the target preset coding parameters.

In step S25, convex hull coding parameters corresponding to each first sample video segment are determined according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth.

In one embodiment, determining the convex hull coding parameters corresponding to each first sample video segment according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth may include the following steps, as shown in fig. 3, which are step S251 to step S254:

s251, for each video quality difference matrix, comparing each video quality in the video quality difference matrix with the video quality constant corresponding to the video quality difference matrix to obtain a comparison value.

Wherein the video quality constant is used to characterize: the minimum acceptable video quality in the video quality difference matrix.

It should be noted that each preset encoding parameter corresponds to a video quality parameter, and the video quality parameter is used to characterize: and in the video quality difference value matrix corresponding to the preset coding parameters, the minimum value which can be received by each video quality. The video quality constant is trained based on a predetermined training set, that is, the video quality constant is an empirical value.

And for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value. Specifically, if the video quality in the video quality difference matrix is smaller than the video quality constant corresponding to the video quality difference matrix, the comparison value may be determined to be 0; if the video quality in the video quality difference matrix is greater than the video quality constant corresponding to the video quality difference matrix, the comparison value may be determined to be 1.

And S252, for each first sample video segment, sequencing the comparison values of the second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector.

The number of the first sample video clips is the same as that of the second sample video clips, and the first sample video clips have correspondence with the second sample video clips. It can be understood that, when there are a plurality of preset encoding parameters, the first sample video segment corresponds to a plurality of second sample video segments, and each second sample video segment corresponds to a comparison value. The comparison values of the second sample video segments corresponding to the first sample video segments can be sequenced according to the sequence of the corresponding preset coding parameters from small to large, so as to obtain comparison value vectors.

For example, the vector of comparison values may be [0,0,0,1,1,1 …,1]^T。

And S253, determining the preset coding parameter corresponding to the comparison value which is the preset comparison value for the first time in the comparison value vector as the target coding parameter.

Wherein the preset comparison value may be 1. For example, the vector of comparison values is [0,0,0,1,1,1 …,1]^T. The preset coding parameter is a preset coding rate. Assuming that the preset coding rate corresponding to the first 0 is 450k, the preset coding rate corresponding to the second 0 is 500k, the preset coding rate corresponding to the third 0 is 550k, and the preset coding rate corresponding to the first 1 is 600k, then the target coding rate corresponding to the comparison value being 1 for the first time is 600 k.

And S254, determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.

After the target encoding parameter is determined, the target encoding parameter may be compared with the current network bandwidth, and a minimum value between the target encoding parameter and the current network bandwidth is determined as a convex hull encoding parameter corresponding to the first sample video segment.

In step S26, for each first sample video segment, the complexity characteristics and the current network bandwidth of the first sample video segment are input into the neural network model, and the neural network model is trained until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.

Specifically, the complexity characteristic of the ith first sample video segment and the current network bandwidth are input into a neural network model, the neural network model is trained, and the encoding parameters output by the neural network model are as close as possible to the convex hull encoding rate of the (i + 1) th first sample video segment.

For clarity of description, the embodiments of the present disclosure will be described below with reference to specific examples. In the specific example, the coding parameter is taken as an example of the coding rate.

Firstly, elaborating the complexity characteristics and the convex hull coding rate obtaining method required by the pre-training process of the neural network model in detail.

As shown in fig. 4, any sample video may be divided into n +1 sample video segments, where n is a natural number, and the size of n may be determined according to actual situations, where the n +1 sample video segments are: OrgChunk₀，OrgChunk₁，……，OrgChunk_n。

Respectively calculating the complexity characteristics of the n +1 sample video clips, wherein the complexity characteristics of each sample video clip comprise: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.

For example, for OrgChunk₀The calculated complexity characteristics of the sample video segment include: MaxTI₀，AvgTI₀，IntraComplexity₀(STAD values used to characterize Intra coding), Intercomplexity₀(STAD value used to characterize interframe coding), Bcomplexity₀(for characterizing STAD values when encoding B frames), Pcomplexity₀(for characterizing STAD values when encoding P frames), and then obtaining complexity Feature of OrgChunk0₀。

For OrgChunk_nThe calculated complexity characteristics of the sample video segment include: MaxTI_n，AvgTI_n，IntraComplexity_n(STAD values used to characterize Intra coding), Intercomplexity_n(STAD values for characterizing interframe coding), Bcomplexity_n(for characterizing STAD values when encoding B frames), Pcomplexity_n(for characterizing STAD values when encoding P frames), and then get OrgChunk_nFeature of complexity Feature of_n。

As shown in fig. 5, the sample video is encoded at different preset encoding rates. Wherein, the different preset coding rates are respectively: rateA, rateB, rateC, rateD, … …, rateL. The difference value between two adjacent preset coding code rates is 50 k. rateA 400k, rateB 450k, rateC 500k, rateB 550k, … …, and rateL 1000 k.

rateA encoded sample video is streamma, rateB encoded sample video is streamB, rateC encoded sample video is streamC, rateD encoded sample video is streamD, … …, and rateL encoded sample video is streamml.

Decoding streamA to obtain a decoded sample video recA, decoding streamB to obtain a decoded sample video recB, decoding stream C to obtain a decoded sample video recC, decoding stream D to obtain a decoded sample video recD, … …, and decoding stream L to obtain a decoded sample video recL.

And respectively cutting the decoded video into n +1 second sample video segments. The n +1 first sample video clips into which recA is cut are respectively: recChunkA₀，recChunkA₁，……，recChunkA_n. The n +1 sample video clips into which recB is cut are respectively as follows: recChunkB₀，recChunkB₁，……，recChunkB_n. The n +1 second sample video clips into which recC is cut are: recChunkC₀，recChunkC₁，……，recChunkC_n. The n +1 second sample video segments into which recD is cut are: recChunkd₀，recChunkD₁，……，recChunkD_n. By analogy, the n +1 second sample video clips into which the recL is cut are respectively: recChunkL₀，recChunkL₁，……，recChunkL_n。

Next, for each second sample video segment, the rootAnd calculating the video quality of the second sample video segment according to the first sample video segment corresponding to the second sample video segment. For example, for recChunkA₀For example, the corresponding first sample video segment is: OrgChunk₀recChunkA obtained by calculation₀Has a video quality of VMAFA₀. For recChunkA_nFor example, the corresponding first sample video segment is: OrgChunk_nrecChunkA obtained by calculation_nHas a video quality of VMAFA_n. By analogy, recChunkB can be obtained through calculation₀，……，recChunkB_n；recChunkC₀，……，recChunKC_n；……recChunKL₀，……，recChunKL_n。

Will recChunkA₀，……，recChunkA_nRespectively with recChunkB₀，……，recChunkB_nMaking a difference to obtain a video quality difference matrix corresponding to the rateA, wherein the matrix is [ Vdiff _ AB ]₀,…,Vdiff_AB_n]. By analogy, the video quality difference matrix corresponding to rateB is [ Vdiff _ BC ]₀,…,Vdiff_BC_n]. The video quality difference matrix corresponding to the rateL is [ Vdiff _ KL₀,…,Vdiff_KL_n]。

And for each video quality difference matrix, comparing each value in the video quality difference matrix with the corresponding video quality constant to obtain a comparison value. The comparison value is 0 or 1, and each first sample video corresponds to a vector of Boolean type BOOl. For example, the BOOl-type vector corresponding to OrgChunk0 is [ BOOl _ AB₀，BOOl_BC₀，BOOl_CD₀，BOOl_KL₀]^T. This vector is called BOOlVec₀. By analogy, BOOlVec can be obtained₁，……，BOOlVec_n。

After determining the BOOl-type vector corresponding to each first sample video segment, the coding rate corresponding to the first non-zero-degree lower table in the vector may be compared with the current network bandwidth, and the minimum value of the coding rate and the current network bandwidth is the convex hull rate corresponding to the first sample video segment.

For example, BOOlVec ═ 0,0,1,1,1,…,1]^TIt can be seen that the coding rate corresponding to the first non-zero subscript is 600 k. The current network bandwidth is denoted by Budget, and then the Convex hull code rate Convex ═ min (600k, Budget) |. As can be seen, each first sample video clip corresponds to a convex hull code rate, OrgChunk₀The corresponding Convex hull code rate is Convex₀，…，OrgChunk_nThe corresponding Convex hull code rate is Convex_n。

After obtaining the convex hull code rate corresponding to each first sample video clip, the OrgChunk is applied_iThe complexity characteristic and the current network bandwidth are input into a neural network model, and the neural network model is trained until the coding rate output from the neural network model is the (i + 1) th OrgChunk_i+1Corresponding convex hull coding rate.

After the convolutional neural network model is trained, the convolutional neural network model can be used online. Specifically, a video to be encoded is divided into a plurality of video segments; and calculating the complexity characteristic of each video clip, inputting the complexity characteristic of each video clip and the current network bandwidth into a pre-trained neural network model to obtain the coding rate of each video clip, and further obtaining the coding rate of the video to be coded.

Fig. 6 is a block diagram illustrating a video coding parameter determination apparatus according to an example embodiment, the apparatus comprising:

a video slicing module 610 configured to perform slicing of a video to be encoded into a plurality of video segments;

a feature calculating module 620 configured to perform calculating complexity features of each video segment, wherein the complexity features are used for representing the temporal complexity and the spatial complexity of the video segments;

a coding parameter obtaining module 630, configured to input the complexity characteristic and the current network bandwidth of each video segment into a pre-trained neural network model to obtain a coding parameter of each video segment;

Optionally, the included model training module includes:

Optionally, the difference matrix determining unit is configured to perform:

Optionally, the feature calculating module is configured to perform:

FIG. 7 is a block diagram of an electronic device shown in accordance with an example embodiment. Referring to fig. 7, the electronic device includes:

a processor 710;

a memory 720 for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video coding parameter determination method provided by the present disclosure.

Fig. 8 is a block diagram illustrating an apparatus 800 for use in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, the apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in memory 404 or transmitted via communications component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In yet another aspect of the disclosed embodiments, the disclosed embodiments also provide a storage medium, and when executed by a processor of an electronic device, enable the electronic device to execute the video coding parameter determination method provided by the disclosed embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for video coding parameter determination, the method comprising:

cutting a video to be encoded into a plurality of video segments;

the convex hull coding parameters are used to characterize: and encoding the sample video clip by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clip is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clip is less than the current network bandwidth.

2. The method of claim 1, wherein training the pre-trained neural network model comprises:

slicing the sample video into a plurality of first sample video segments;

3. The method of claim 2, wherein determining the video quality difference matrix corresponding to each of the predetermined coding parameters according to the video quality of each of the second sample video segments comprises:

determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to a target preset coding parameter;

4. The method of claim 2, wherein determining the convex hull coding parameters for each first sample video segment according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth comprises:

5. The method according to any one of claims 1 to 4, wherein the calculating the complexity characteristic of each video segment comprises:

6. The method according to any one of claims 1 to 4, wherein the calculating the complexity characteristic of each video segment comprises:

7. The method of any of claims 1 to 4, wherein the complexity features comprise at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.

8. The method according to any of claims 1 to 4, wherein the encoding parameters comprise at least one of the following parameters: code rate, resolution, and frame rate.

9. An apparatus for video coding parameter determination, the apparatus comprising:

10. The apparatus of claim 9, wherein the model training module comprises:

11. The apparatus of claim 10, wherein the difference matrix determining unit is configured to perform:

12. The apparatus according to claim 9, wherein the convex hull coding parameter determining unit is configured to perform:

13. The apparatus of any of claims 9 to 12, wherein the feature calculation module is configured to perform:

14. The apparatus of any of claims 9 to 12, wherein the feature calculation module is configured to perform:

15. The apparatus of any of claims 9 to 12, wherein the complexity features comprise at least one of: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.

16. The apparatus according to any one of claims 9 to 12, wherein the encoding parameters comprise at least one of the following parameters: code rate, resolution, and frame rate.

17. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 8.

18. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-8.