CN110650370B - Video coding parameter determination method and device, electronic equipment and storage medium - Google Patents

Video coding parameter determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110650370B
CN110650370B CN201910995614.0A CN201910995614A CN110650370B CN 110650370 B CN110650370 B CN 110650370B CN 201910995614 A CN201910995614 A CN 201910995614A CN 110650370 B CN110650370 B CN 110650370B
Authority
CN
China
Prior art keywords
video
sample
sample video
segment
complexity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910995614.0A
Other languages
Chinese (zh)
Other versions
CN110650370A (en
Inventor
赵明菲
于冰
郑云飞
闻兴
王晓楠
黄晓政
张元尊
陈敏
陈宇聪
黄跃
郭磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910995614.0A priority Critical patent/CN110650370B/en
Publication of CN110650370A publication Critical patent/CN110650370A/en
Application granted granted Critical
Publication of CN110650370B publication Critical patent/CN110650370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to a method, an apparatus, an electronic device and a storage medium for determining video encoding parameters, wherein the method comprises: cutting a video to be encoded into a plurality of video segments; calculating complexity characteristics of each video clip, wherein the complexity characteristics are used for representing the time complexity and the space complexity of the video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip. According to the technical scheme provided by the embodiment of the invention, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.

Description

Video coding parameter determination method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of video technologies, and in particular, to a method and an apparatus for determining video encoding parameters, an electronic device, and a storage medium.
Background
With the rapid development of video technology, a video distribution platform receives a large number of videos uploaded by a large number of terminals every day, before the videos are uploaded by the terminals, the videos to be uploaded need to be encoded, and then the encoded videos are uploaded to the video distribution platform.
In the related art, in order to ensure the video quality of the coded video, a higher coding rate is determined for each video to be uploaded, so that the terminal may exceed the current network bandwidth in the process of uploading the coded video, thereby causing a failure in uploading the video.
Disclosure of Invention
The present disclosure provides a method, an apparatus, an electronic device and a storage medium for determining video encoding parameters, and the technical scheme of the present disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a method for determining video encoding parameters, the method comprising:
cutting a video to be encoded into a plurality of video segments;
calculating complexity characteristics of each video clip, wherein the complexity characteristics are used for representing the time complexity and the space complexity of the video clip;
inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip;
wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video clips by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clips is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clips is less than the current network bandwidth.
Optionally, the process of training the pre-trained neural network model includes:
slicing the sample video into a plurality of first sample video segments;
decoding a sample video coded by a plurality of preset coding parameters, and dividing the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;
for each second sample video segment, calculating the video quality of the second sample video segment according to the second sample video segment and the first sample video segment corresponding to the second sample video segment;
determining a video quality difference matrix corresponding to each preset coding parameter according to the video quality of each second sample video clip;
determining convex hull coding parameters corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix and the current network bandwidth;
for each first sample video clip, inputting the complexity characteristics and the current network bandwidth of the first sample video clip into a neural network model, and training the neural network model until the coding parameters output from the neural network model are as follows: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
Optionally, the determining, according to the video quality of each second sample video segment, a video quality difference matrix corresponding to each preset encoding parameter includes:
for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;
determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to the target preset coding parameter;
and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
Optionally, the determining, according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth, the convex hull coding parameter corresponding to each first sample video segment includes:
for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;
for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;
determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;
and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
Optionally, the calculating the complexity characteristic of each video segment includes:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
Optionally, the calculating the complexity characteristic of each video segment includes:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
Optionally, the complexity feature comprises at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
Optionally, the encoding parameter includes at least one of the following parameters: code rate, resolution, and frame rate.
According to a second aspect of the embodiments of the present disclosure, there is provided a video coding parameter determination apparatus, the apparatus including:
a video slicing module configured to perform slicing of a video to be encoded into a plurality of video segments;
the feature calculation module is configured to perform calculation on complexity features of the video segments, wherein the complexity features are used for representing the time complexity and the space complexity of the video segments;
the encoding parameter acquisition module is configured to input the complexity characteristics and the current network bandwidth of each video segment into a pre-trained neural network model to obtain the encoding parameters of each video segment;
wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video clips by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clips is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clips is less than the current network bandwidth.
Optionally, the included model training module includes:
a first video slicing unit configured to perform slicing of the sample video into a plurality of first sample video segments;
the second video segmentation unit is configured to decode the sample video coded by the preset coding parameters and segment the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;
a video quality calculation unit configured to perform, for each second sample video segment, calculating a video quality of the second sample video segment from the second sample video segment and a first sample video segment corresponding to the second sample video segment;
a difference matrix determining unit configured to determine a video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment;
a convex hull coding parameter determining unit configured to determine a convex hull coding parameter corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix, and a current network bandwidth;
a model training unit configured to perform, for each first sample video segment, inputting the complexity characteristics and the current network bandwidth of the first sample video segment into a neural network model, and training the neural network model until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
Optionally, the difference matrix determining unit is configured to perform:
for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;
determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to the target preset coding parameter;
and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
Optionally, the convex hull coding parameter determining unit is configured to perform:
for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;
for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;
determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;
and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
Optionally, the feature calculating module is configured to perform:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
Optionally, the feature calculating module is configured to perform:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
Optionally, the complexity feature comprises at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
Optionally, the encoding parameter includes at least one of the following parameters: code rate, resolution, and frame rate.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video coding parameter determination method of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the video coding parameter determination method of the first aspect.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to implement the video coding parameter determination method of the first aspect.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
Drawings
Fig. 1 is a flow diagram illustrating a method of video coding parameter determination in accordance with an exemplary embodiment;
FIG. 2 is a flowchart illustrating a process of training a neural network model in accordance with an exemplary embodiment;
FIG. 3 is a flowchart illustrating one particular implementation of S25, according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating the acquisition of complexity characteristics of a sample video in accordance with an exemplary embodiment;
FIG. 5 is a diagram illustrating obtaining a video quality difference matrix according to an exemplary embodiment;
fig. 6 is a block diagram illustrating a video encoding parameter determination apparatus according to an example embodiment;
FIG. 7 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;
fig. 8 is a block diagram illustrating a video encoding parameter determination apparatus according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a video coding parameter determining method according to an exemplary embodiment, where the method is used for a video coding parameter determining apparatus, and the video coding parameter determining apparatus is operated in an electronic device, and the electronic device has a video coding function, and the electronic device is not particularly limited by the embodiment of the present disclosure.
As shown in fig. 1, the method may include the following steps.
In step S11, the video to be encoded is sliced into a plurality of video segments.
The video to be encoded may be sliced into a plurality of video segments before it is encoded. For example, the video to be encoded may be sliced into a plurality of video segments, each having a duration of 3 seconds. Of course, the number of video segments into which the video to be encoded is specifically divided may be determined according to actual situations, and this is not specifically limited in the embodiment of the present disclosure.
In step S12, the complexity characteristics of each video segment are calculated.
Wherein the complexity feature is used to characterize the temporal complexity and spatial complexity of the video segment.
After the video to be encoded is sliced into a plurality of video segments, the complexity characteristics of each video segment may be calculated. The complexity feature may include a temporal complexity feature for characterizing a temporal complexity of the video segment and a spatial complexity feature for characterizing a spatial complexity of the video segment.
In particular, the complexity feature may comprise at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding. Of course, the embodiments of the present disclosure do not specifically limit the complexity feature.
In one embodiment, calculating the complexity characteristic of each video segment may include:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
In this embodiment, when calculating the complexity feature of each video segment, the complexity feature of each video frame of the video segment may be calculated, so that the calculated complexity feature of the video segment is more accurate.
In another embodiment, calculating the complexity characteristic of each video segment may include:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
In this embodiment, when calculating the complexity feature of each video segment, only the complexity feature of the key video frame of the video segment may be calculated, that is, the complexity feature of each video frame in the video segment is not calculated, but the key video frame of the video segment is selected to calculate the complexity feature, so that the amount of calculation may be reduced.
In step S13, the complexity characteristics and the current network bandwidth of each video segment are input into a pre-trained neural network model to obtain the encoding parameters of each video segment.
The pre-trained neural network model is as follows: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video fragments by using the convex hull encoding parameters, wherein the video quality of the encoded sample video fragments is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video fragments is less than the current network bandwidth.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip is high, namely the video quality is higher than the preset video quality, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. It can be understood that the preset video quality may be determined according to actual situations, and the preset video quality is not specifically limited in the embodiments of the present disclosure.
Therefore, after the complexity characteristics of each video segment are obtained through calculation, the complexity characteristics of each video segment and the current network bandwidth can be input into the pre-trained neural network model, and the coding parameters of each video segment are output from the pre-trained neural network model, so that the coding parameters of each video segment can be obtained. For example, the complexity characteristic of the ith video segment and the current network bandwidth are input into a pre-trained neural network model, so as to obtain the coding parameters of the (i + 1) th video segment.
As can be seen, for each video segment, the video segment is encoded by using the encoding parameters output from the pre-trained neural network model, and the video quality of the encoded video segment is higher, that is, the video quality is higher than the preset video quality; and the network bandwidth required for uploading the coded sample video is less than the current network bandwidth. Therefore, the video quality of the coded video can be guaranteed, the network bandwidth required by uploading the coded video can be guaranteed to be smaller than the current network bandwidth, and the coded video can be successfully uploaded.
It should be noted that the encoding parameter includes at least one of the following parameters: code rate, resolution, and frame rate. Of course, the encoding parameters are not specifically limited in the embodiments of the present disclosure.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
For clarity of description of the scheme, the process of training the above-mentioned pre-trained neural network model will be explained in detail in the following embodiments.
In one embodiment, the process of training the pre-trained neural network model may include the following steps, as shown in fig. 2.
In step S21, the sample video is cut into a plurality of first sample video segments.
In step S22, the sample video encoded by the preset encoding parameters is decoded, and the decoded sample video is respectively cut into a plurality of second sample video segments.
Wherein the number of the first sample video segments is the same as the number of the second sample video segments.
Specifically, the sample video may be encoded by using a plurality of preset encoding parameters, so as to obtain a plurality of encoded sample videos. For example, assuming that the preset encoding parameters are preset encoding rates, the sample videos may be encoded by using different preset encoding rates. Specifically, the sample video may be encoded using 450k (kbps, kilobits/second), 500k, 550k, 600k, 650k, … …, 1000 k.
The sample video is sliced into a plurality of first sample video segments. Moreover, after a plurality of sample videos coded by using the preset coding parameters are obtained, the sample videos coded by using the preset coding parameters can be decoded, and the decoded sample videos are respectively divided into a plurality of second sample video segments, wherein the number of the first sample video segments is the same as that of the second sample video segments.
In step S23, for each second sample video segment, the video quality of the second sample video segment is calculated according to the second sample video segment and the first sample video segment corresponding to the second sample video segment.
The sample video is segmented to obtain a plurality of first sample video segments, and the decoded sample video is segmented to obtain a plurality of second sample video segments. Since the number of video segments of the first sample is the same as the number of video segments of the second sample. That is, each first sample video segment corresponds to one second sample video segment.
For example, there are three first sample video clips, which are: a first sample video clip 1, a first sample video clip 2, and a first sample video clip 3. The second sample video clip has three video clips: a second sample video segment 1, a second sample video segment 2, and a second sample video segment 3. Then, the first sample video segment 1 and the second sample video segment 1 correspond; the first sample video segment 2 corresponds to the second sample video segment 2; the first sample video segment 3 corresponds to the second sample video segment 3.
For each second sample video segment, the video quality of the second sample video segment may be calculated based on the second sample video segment and the first sample video segment corresponding to the second sample video segment. The video quality can be measured by VMAF (video multi-method Assessment Fusion). Of course, the embodiment of the present disclosure is not particularly limited thereto.
In step S24, a video quality difference matrix corresponding to each preset encoding parameter is determined according to the video quality of each second sample video segment.
In one embodiment, determining the video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment may include the following two steps, namely step a1 and step a 2:
step a1, for a target second sample video segment, calculating a video quality difference value between the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment.
And the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
Step a2, determining the vector formed by the video quality difference values as the video quality difference value matrix corresponding to the target preset coding parameter.
Specifically, for each second sample video segment, that is, the target second sample video segment, there is a second sample video segment whose preset encoding parameter is adjacent to the target preset encoding parameter of the target second sample video segment. That is, there is a second sample video segment corresponding to the target second sample video segment. The video quality of the target second sample video segment may be made worse than the video quality of the target sample video segment. Thus, each target second sample video segment corresponds to a video quality difference. And determining a vector formed by the video quality difference values corresponding to the target second sample video clips as a video quality difference value matrix corresponding to the target preset coding parameters.
In step S25, convex hull coding parameters corresponding to each first sample video segment are determined according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth.
In one embodiment, determining the convex hull coding parameters corresponding to each first sample video segment according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth may include the following steps, as shown in fig. 3, which are step S251 to step S254:
s251, for each video quality difference matrix, comparing each video quality in the video quality difference matrix with the video quality constant corresponding to the video quality difference matrix to obtain a comparison value.
Wherein the video quality constant is used to characterize: the minimum acceptable video quality in the video quality difference matrix.
It should be noted that each preset encoding parameter corresponds to a video quality parameter, and the video quality parameter is used to characterize: and in the video quality difference value matrix corresponding to the preset coding parameters, the minimum value which can be received by each video quality. The video quality constant is trained based on a predetermined training set, that is, the video quality constant is an empirical value.
And for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value. Specifically, if the video quality in the video quality difference matrix is smaller than the video quality constant corresponding to the video quality difference matrix, the comparison value may be determined to be 0; if the video quality in the video quality difference matrix is greater than the video quality constant corresponding to the video quality difference matrix, the comparison value may be determined to be 1.
And S252, for each first sample video segment, sequencing the comparison values of the second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector.
The number of the first sample video clips is the same as that of the second sample video clips, and the first sample video clips have correspondence with the second sample video clips. It can be understood that, when there are a plurality of preset encoding parameters, the first sample video segment corresponds to a plurality of second sample video segments, and each second sample video segment corresponds to a comparison value. The comparison values of the second sample video segments corresponding to the first sample video segments can be sequenced according to the sequence of the corresponding preset coding parameters from small to large, so as to obtain comparison value vectors.
For example, the vector of comparison values may be [0,0,0,1,1,1 …,1]T
And S253, determining the preset coding parameter corresponding to the comparison value which is the preset comparison value for the first time in the comparison value vector as the target coding parameter.
Wherein the preset comparison value may be 1. For example, the vector of comparison values is [0,0,0,1,1,1 …,1]T. The preset coding parameter is a preset coding rate. Assuming that the preset coding rate corresponding to the first 0 is 450k, the preset coding rate corresponding to the second 0 is 500k, the preset coding rate corresponding to the third 0 is 550k, and the preset coding rate corresponding to the first 1 is 600k, then the target coding rate corresponding to the comparison value being 1 for the first time is 600 k.
And S254, determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
After the target encoding parameter is determined, the target encoding parameter may be compared with the current network bandwidth, and a minimum value between the target encoding parameter and the current network bandwidth is determined as a convex hull encoding parameter corresponding to the first sample video segment.
In step S26, for each first sample video segment, the complexity characteristics and the current network bandwidth of the first sample video segment are input into the neural network model, and the neural network model is trained until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
Specifically, the complexity characteristic of the ith first sample video segment and the current network bandwidth are input into a neural network model, the neural network model is trained, and the encoding parameters output by the neural network model are as close as possible to the convex hull encoding rate of the (i + 1) th first sample video segment.
For clarity of description, the embodiments of the present disclosure will be described below with reference to specific examples. In the specific example, the coding parameter is taken as an example of the coding rate.
Firstly, elaborating the complexity characteristics and the convex hull coding rate obtaining method required by the pre-training process of the neural network model in detail.
As shown in fig. 4, any sample video may be divided into n +1 sample video segments, where n is a natural number, and the size of n may be determined according to actual situations, where the n +1 sample video segments are: OrgChunk0,OrgChunk1,……,OrgChunkn
Respectively calculating the complexity characteristics of the n +1 sample video clips, wherein the complexity characteristics of each sample video clip comprise: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
For example, for OrgChunk0The calculated complexity characteristics of the sample video segment include: MaxTI0,AvgTI0,IntraComplexity0(STAD values used to characterize Intra coding), Intercomplexity0(STAD value used to characterize interframe coding), Bcomplexity0(for characterizing STAD values when encoding B frames), Pcomplexity0(for characterizing STAD values when encoding P frames), and then obtaining complexity Feature of OrgChunk00
For OrgChunknThe calculated complexity characteristics of the sample video segment include: MaxTIn,AvgTIn,IntraComplexityn(STAD values used to characterize Intra coding), Intercomplexityn(STAD values for characterizing interframe coding), Bcomplexityn(for characterizing STAD values when encoding B frames), Pcomplexityn(for characterizing STAD values when encoding P frames), and then get OrgChunknFeature of complexity Feature ofn
As shown in fig. 5, the sample video is encoded at different preset encoding rates. Wherein, the different preset coding rates are respectively: rateA, rateB, rateC, rateD, … …, rateL. The difference value between two adjacent preset coding code rates is 50 k. rateA 400k, rateB 450k, rateC 500k, rateB 550k, … …, and rateL 1000 k.
rateA encoded sample video is streamma, rateB encoded sample video is streamB, rateC encoded sample video is streamC, rateD encoded sample video is streamD, … …, and rateL encoded sample video is streamml.
Decoding streamA to obtain a decoded sample video recA, decoding streamB to obtain a decoded sample video recB, decoding stream C to obtain a decoded sample video recC, decoding stream D to obtain a decoded sample video recD, … …, and decoding stream L to obtain a decoded sample video recL.
And respectively cutting the decoded video into n +1 second sample video segments. The n +1 first sample video clips into which recA is cut are respectively: recChunkA0,recChunkA1,……,recChunkAn. The n +1 sample video clips into which recB is cut are respectively as follows: recChunkB0,recChunkB1,……,recChunkBn. The n +1 second sample video clips into which recC is cut are: recChunkC0,recChunkC1,……,recChunkCn. The n +1 second sample video segments into which recD is cut are: recChunkd0,recChunkD1,……,recChunkDn. By analogy, the n +1 second sample video clips into which the recL is cut are respectively: recChunkL0,recChunkL1,……,recChunkLn
Next, for each second sample video segment, the rootAnd calculating the video quality of the second sample video segment according to the first sample video segment corresponding to the second sample video segment. For example, for recChunkA0For example, the corresponding first sample video segment is: OrgChunk0recChunkA obtained by calculation0Has a video quality of VMAFA0. For recChunkAnFor example, the corresponding first sample video segment is: OrgChunknrecChunkA obtained by calculationnHas a video quality of VMAFAn. By analogy, recChunkB can be obtained through calculation0,……,recChunkBn;recChunkC0,……,recChunKCn;……recChunKL0,……,recChunKLn
Will recChunkA0,……,recChunkAnRespectively with recChunkB0,……,recChunkBnMaking a difference to obtain a video quality difference matrix corresponding to the rateA, wherein the matrix is [ Vdiff _ AB ]0,…,Vdiff_ABn]. By analogy, the video quality difference matrix corresponding to rateB is [ Vdiff _ BC ]0,…,Vdiff_BCn]. The video quality difference matrix corresponding to the rateL is [ Vdiff _ KL0,…,Vdiff_KLn]。
And for each video quality difference matrix, comparing each value in the video quality difference matrix with the corresponding video quality constant to obtain a comparison value. The comparison value is 0 or 1, and each first sample video corresponds to a vector of Boolean type BOOl. For example, the BOOl-type vector corresponding to OrgChunk0 is [ BOOl _ AB0,BOOl_BC0,BOOl_CD0,BOOl_KL0]T. This vector is called BOOlVec0. By analogy, BOOlVec can be obtained1,……,BOOlVecn
After determining the BOOl-type vector corresponding to each first sample video segment, the coding rate corresponding to the first non-zero-degree lower table in the vector may be compared with the current network bandwidth, and the minimum value of the coding rate and the current network bandwidth is the convex hull rate corresponding to the first sample video segment.
For example, BOOlVec ═ 0,0,1,1,1,…,1]TIt can be seen that the coding rate corresponding to the first non-zero subscript is 600 k. The current network bandwidth is denoted by Budget, and then the Convex hull code rate Convex ═ min (600k, Budget) |. As can be seen, each first sample video clip corresponds to a convex hull code rate, OrgChunk0The corresponding Convex hull code rate is Convex0,…,OrgChunknThe corresponding Convex hull code rate is Convexn
After obtaining the convex hull code rate corresponding to each first sample video clip, the OrgChunk is appliediThe complexity characteristic and the current network bandwidth are input into a neural network model, and the neural network model is trained until the coding rate output from the neural network model is the (i + 1) th OrgChunki+1Corresponding convex hull coding rate.
After the convolutional neural network model is trained, the convolutional neural network model can be used online. Specifically, a video to be encoded is divided into a plurality of video segments; and calculating the complexity characteristic of each video clip, inputting the complexity characteristic of each video clip and the current network bandwidth into a pre-trained neural network model to obtain the coding rate of each video clip, and further obtaining the coding rate of the video to be coded.
Fig. 6 is a block diagram illustrating a video coding parameter determination apparatus according to an example embodiment, the apparatus comprising:
a video slicing module 610 configured to perform slicing of a video to be encoded into a plurality of video segments;
a feature calculating module 620 configured to perform calculating complexity features of each video segment, wherein the complexity features are used for representing the temporal complexity and the spatial complexity of the video segments;
a coding parameter obtaining module 630, configured to input the complexity characteristic and the current network bandwidth of each video segment into a pre-trained neural network model to obtain a coding parameter of each video segment;
wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video clips by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clips is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clips is less than the current network bandwidth.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
Optionally, the included model training module includes:
a first video slicing unit configured to perform slicing of the sample video into a plurality of first sample video segments;
the second video segmentation unit is configured to decode the sample video coded by the preset coding parameters and segment the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;
a video quality calculation unit configured to perform, for each second sample video segment, calculating a video quality of the second sample video segment from the second sample video segment and a first sample video segment corresponding to the second sample video segment;
a difference matrix determining unit configured to determine a video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment;
a convex hull coding parameter determining unit configured to determine a convex hull coding parameter corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix, and a current network bandwidth;
a model training unit configured to perform, for each first sample video segment, inputting the complexity characteristics and the current network bandwidth of the first sample video segment into a neural network model, and training the neural network model until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
Optionally, the difference matrix determining unit is configured to perform:
for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;
determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to the target preset coding parameter;
and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
Optionally, the convex hull coding parameter determining unit is configured to perform:
for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;
for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;
determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;
and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
Optionally, the feature calculating module is configured to perform:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
Optionally, the feature calculating module is configured to perform:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
Optionally, the complexity feature comprises at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
FIG. 7 is a block diagram of an electronic device shown in accordance with an example embodiment. Referring to fig. 7, the electronic device includes:
a processor 710;
a memory 720 for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video coding parameter determination method provided by the present disclosure.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
Fig. 8 is a block diagram illustrating an apparatus 800 for use in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 8, the apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in memory 404 or transmitted via communications component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
In yet another aspect of the disclosed embodiments, the disclosed embodiments also provide a storage medium, and when executed by a processor of an electronic device, enable the electronic device to execute the video coding parameter determination method provided by the disclosed embodiments.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to implement the video coding parameter determination method of the first aspect.
According to the technical scheme provided by the embodiment of the disclosure, when a video to be coded is coded, the video to be coded is divided into a plurality of video segments; calculating the complexity characteristics of each video clip; and inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip.
The pre-trained neural network model is obtained by training the convex hull coding parameters of the sample video clip based on the complexity characteristics and the current network bandwidth of the sample video, and the sample video clip is coded by using the convex hull coding parameters, so that the video quality of the coded sample video clip can be ensured to be higher, and the network bandwidth required by uploading the coded sample video clip is less than the current network bandwidth. Therefore, for each video segment obtained by segmenting the video to be coded, the video segment is coded by using the coding parameters output by the pre-trained neural network model, and the video quality of the coded video segment is higher; and the network bandwidth required by uploading the coded sample video is smaller than the current network bandwidth, so that the coded video is successfully uploaded.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (18)

1. A method for video coding parameter determination, the method comprising:
cutting a video to be encoded into a plurality of video segments;
calculating complexity characteristics of each video clip, wherein the complexity characteristics are used for representing the time complexity and the space complexity of the video clip;
inputting the complexity characteristics and the current network bandwidth of each video clip into a pre-trained neural network model to obtain the coding parameters of each video clip;
wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video clip by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clip is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clip is less than the current network bandwidth.
2. The method of claim 1, wherein training the pre-trained neural network model comprises:
slicing the sample video into a plurality of first sample video segments;
decoding a sample video coded by a plurality of preset coding parameters, and dividing the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;
for each second sample video segment, calculating the video quality of the second sample video segment according to the second sample video segment and the first sample video segment corresponding to the second sample video segment;
determining a video quality difference matrix corresponding to each preset coding parameter according to the video quality of each second sample video clip;
determining convex hull coding parameters corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix and the current network bandwidth;
for each first sample video clip, inputting the complexity characteristics and the current network bandwidth of the first sample video clip into a neural network model, and training the neural network model until the coding parameters output from the neural network model are as follows: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
3. The method of claim 2, wherein determining the video quality difference matrix corresponding to each of the predetermined coding parameters according to the video quality of each of the second sample video segments comprises:
for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;
determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to a target preset coding parameter;
and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
4. The method of claim 2, wherein determining the convex hull coding parameters for each first sample video segment according to each video quality difference matrix, the video quality constant corresponding to the video quality difference matrix, and the current network bandwidth comprises:
for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;
for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;
determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;
and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
5. The method according to any one of claims 1 to 4, wherein the calculating the complexity characteristic of each video segment comprises:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
6. The method according to any one of claims 1 to 4, wherein the calculating the complexity characteristic of each video segment comprises:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
7. The method of any of claims 1 to 4, wherein the complexity features comprise at least one of the following features: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
8. The method according to any of claims 1 to 4, wherein the encoding parameters comprise at least one of the following parameters: code rate, resolution, and frame rate.
9. An apparatus for video coding parameter determination, the apparatus comprising:
a video slicing module configured to perform slicing of a video to be encoded into a plurality of video segments;
the feature calculation module is configured to perform calculation on complexity features of the video segments, wherein the complexity features are used for representing the time complexity and the space complexity of the video segments;
the encoding parameter acquisition module is configured to input the complexity characteristics and the current network bandwidth of each video segment into a pre-trained neural network model to obtain the encoding parameters of each video segment;
wherein the pre-trained neural network model is: training the complexity characteristics and the current network bandwidth of a sample video segment of the sample video and convex hull coding parameters of the sample video segment to obtain the complexity characteristics and the current network bandwidth of the sample video;
the convex hull coding parameters are used to characterize: and encoding the sample video clip by using the convex hull encoding parameters, wherein the video quality of the encoded sample video clip is greater than the preset video quality, and the network bandwidth required by uploading the encoded sample video clip is less than the current network bandwidth.
10. The apparatus of claim 9, wherein the model training module comprises:
a first video slicing unit configured to perform slicing of the sample video into a plurality of first sample video segments;
the second video segmentation unit is configured to decode the sample video coded by the preset coding parameters and segment the decoded sample video into a plurality of second sample video segments respectively, wherein the number of the first sample video segments is the same as that of the second sample video segments;
a video quality calculation unit configured to perform, for each second sample video segment, calculating a video quality of the second sample video segment from the second sample video segment and a first sample video segment corresponding to the second sample video segment;
a difference matrix determining unit configured to determine a video quality difference matrix corresponding to each preset encoding parameter according to the video quality of each second sample video segment;
a convex hull coding parameter determining unit configured to determine a convex hull coding parameter corresponding to each first sample video segment according to each video quality difference matrix, a video quality constant corresponding to the video quality difference matrix, and a current network bandwidth;
a model training unit configured to perform, for each first sample video segment, inputting the complexity characteristics and the current network bandwidth of the first sample video segment into a neural network model, and training the neural network model until the coding parameters output from the neural network model are: convex hull coding parameters corresponding to a next first sample video segment adjacent to the first sample video segment.
11. The apparatus of claim 10, wherein the difference matrix determining unit is configured to perform:
for a target second sample video segment, calculating a video quality difference value of the video quality of the target second sample video segment and the video quality of a second sample video segment corresponding to the target second sample video segment;
determining a vector formed by all the video quality difference values as a video quality difference value matrix corresponding to a target preset coding parameter;
and the target preset coding parameter of the target second sample video segment and the preset coding parameter of the second sample video segment corresponding to the target second sample video segment are adjacent in size.
12. The apparatus according to claim 9, wherein the convex hull coding parameter determining unit is configured to perform:
for each video quality difference matrix, comparing each video quality in the video quality difference matrix with a video quality constant corresponding to the video quality difference matrix to obtain a comparison value, wherein the video quality constant is used for representing: minimum acceptable video quality in the video quality difference matrix;
for each first sample video segment, sequencing comparison values of a second sample video segment corresponding to the first sample video segment according to the sequence of the corresponding preset coding parameters from small to large to obtain a comparison value vector;
determining a preset coding parameter corresponding to a preset comparison value which is the comparison value for the first time in the comparison value vector as a target coding parameter;
and determining the minimum value of the target coding parameter and the current network bandwidth as the convex hull coding parameter corresponding to the first sample video segment.
13. The apparatus of any of claims 9 to 12, wherein the feature calculation module is configured to perform:
and for each video clip, calculating the complexity characteristic of each video frame of the video clip, and determining the complexity characteristic of each video frame as the complexity characteristic of the video clip.
14. The apparatus of any of claims 9 to 12, wherein the feature calculation module is configured to perform:
and for each video clip, calculating the complexity characteristics of the key video frames of the video clip, and determining the complexity characteristics of the key video frames as the complexity characteristics of the video clip.
15. The apparatus of any of claims 9 to 12, wherein the complexity features comprise at least one of: the maximum variance value MaxTI of the adjacent frame difference, the average variance value AvgTI of the adjacent frame difference, the sum of absolute values of residuals STAD of intra-frame coding, STAD of inter-frame coding, STAD of B-frame coding and STAD of P-frame coding.
16. The apparatus according to any one of claims 9 to 12, wherein the encoding parameters comprise at least one of the following parameters: code rate, resolution, and frame rate.
17. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 8.
18. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-8.
CN201910995614.0A 2019-10-18 2019-10-18 Video coding parameter determination method and device, electronic equipment and storage medium Active CN110650370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910995614.0A CN110650370B (en) 2019-10-18 2019-10-18 Video coding parameter determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910995614.0A CN110650370B (en) 2019-10-18 2019-10-18 Video coding parameter determination method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110650370A CN110650370A (en) 2020-01-03
CN110650370B true CN110650370B (en) 2021-09-24

Family

ID=68994401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910995614.0A Active CN110650370B (en) 2019-10-18 2019-10-18 Video coding parameter determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110650370B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111246209B (en) * 2020-01-20 2022-08-02 北京字节跳动网络技术有限公司 Adaptive encoding method, apparatus, electronic device, and computer storage medium
CN111510740B (en) * 2020-04-03 2022-08-30 咪咕文化科技有限公司 Transcoding method, transcoding device, electronic equipment and computer readable storage medium
CN112383777B (en) * 2020-09-28 2023-09-05 北京达佳互联信息技术有限公司 Video encoding method, video encoding device, electronic equipment and storage medium
CN112672157B (en) * 2020-12-22 2022-08-05 广州博冠信息科技有限公司 Video encoding method, device, equipment and storage medium
CN113014922B (en) * 2021-02-23 2023-04-07 北京百度网讯科技有限公司 Model training method, video coding method, device, equipment and storage medium
CN113573101B (en) * 2021-07-09 2023-11-28 百果园技术(新加坡)有限公司 Video coding method, device, equipment and storage medium
CN116320529A (en) * 2021-12-10 2023-06-23 深圳市中兴微电子技术有限公司 Video code rate control method and device and computer readable storage medium
CN115225911B (en) * 2022-08-19 2022-12-06 腾讯科技(深圳)有限公司 Code rate self-adaption method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101345867A (en) * 2008-08-22 2009-01-14 四川长虹电器股份有限公司 Code rate control method based on frame complexity
CN101854526A (en) * 2009-03-30 2010-10-06 国际商业机器公司 Code rate control method and code controller
CN105208390A (en) * 2014-06-30 2015-12-30 杭州海康威视数字技术股份有限公司 Code rate control method of video coding and system thereof
CN107371028A (en) * 2017-08-22 2017-11-21 南京惟初信息科技有限公司 A kind of high-quality video coding method for adapting to bandwidth
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN109660795A (en) * 2018-11-09 2019-04-19 建湖云飞数据科技有限公司 A kind of information coding method based on down-sampling
CN109754077A (en) * 2017-11-08 2019-05-14 杭州海康威视数字技术股份有限公司 Network model compression method, device and the computer equipment of deep neural network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950463B2 (en) * 2001-06-13 2005-09-27 Microsoft Corporation Non-compensated transcoding of a video stream
US8976857B2 (en) * 2011-09-23 2015-03-10 Microsoft Technology Licensing, Llc Quality-based video compression

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101345867A (en) * 2008-08-22 2009-01-14 四川长虹电器股份有限公司 Code rate control method based on frame complexity
CN101854526A (en) * 2009-03-30 2010-10-06 国际商业机器公司 Code rate control method and code controller
CN105208390A (en) * 2014-06-30 2015-12-30 杭州海康威视数字技术股份有限公司 Code rate control method of video coding and system thereof
CN107371028A (en) * 2017-08-22 2017-11-21 南京惟初信息科技有限公司 A kind of high-quality video coding method for adapting to bandwidth
CN109754077A (en) * 2017-11-08 2019-05-14 杭州海康威视数字技术股份有限公司 Network model compression method, device and the computer equipment of deep neural network
CN109660795A (en) * 2018-11-09 2019-04-19 建湖云飞数据科技有限公司 A kind of information coding method based on down-sampling
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
H.264码率/分辨率下采样转码的设计与实现;王晓楠;《中国优秀硕士学位论文全文数据库》;20100331;全文 *
视频编码量化预测及多模块的关联性度量;祝建英;《万方数据》;20131210;全文 *

Also Published As

Publication number Publication date
CN110650370A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
CN110650370B (en) Video coding parameter determination method and device, electronic equipment and storage medium
CN110827253A (en) Training method and device of target detection model and electronic equipment
CN110536168B (en) Video uploading method and device, electronic equipment and storage medium
CN106559712B (en) Video playing processing method and device and terminal equipment
CN110708559B (en) Image processing method, device and storage medium
CN106454413B (en) Code switching method, device and equipment is broadcast live
CN109165738B (en) Neural network model optimization method and device, electronic device and storage medium
CN108881952B (en) Video generation method and device, electronic equipment and storage medium
CN109275029B (en) Video stream processing method and device, mobile terminal and storage medium
CN108171222B (en) Real-time video classification method and device based on multi-stream neural network
CN115052150A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN111862995A (en) Code rate determination model training method, code rate determination method and device
CN112948704A (en) Model training method and device for information recommendation, electronic equipment and medium
CN110941727A (en) Resource recommendation method and device, electronic equipment and storage medium
CN108629814B (en) Camera adjusting method and device
CN105392056A (en) Method and device for determining television scene modes
CN104539497B (en) Method for connecting network and device
CN111953980B (en) Video processing method and device
CN109120929A (en) A kind of Video coding, coding/decoding method, device, electronic equipment and system
CN109068138B (en) Video image processing method and device, electronic equipment and storage medium
CN111860552A (en) Model training method and device based on nuclear self-encoder and storage medium
CN108024005B (en) Information processing method and device, intelligent terminal, server and system
CN110798721B (en) Episode management method and device and electronic equipment
CN114422854A (en) Data processing method and device, electronic equipment and storage medium
CN108154092B (en) Face feature prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant