CN114554211A - Content adaptive video coding method, device, equipment and storage medium - Google Patents

Content adaptive video coding method, device, equipment and storage medium Download PDF

Info

Publication number
CN114554211A
CN114554211A CN202210043241.9A CN202210043241A CN114554211A CN 114554211 A CN114554211 A CN 114554211A CN 202210043241 A CN202210043241 A CN 202210043241A CN 114554211 A CN114554211 A CN 114554211A
Authority
CN
China
Prior art keywords
coding
video
encoding
parameter
code rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210043241.9A
Other languages
Chinese (zh)
Inventor
刘芳
袁子逸
洪旭东
崔同兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202210043241.9A priority Critical patent/CN114554211A/en
Publication of CN114554211A publication Critical patent/CN114554211A/en
Priority to PCT/CN2023/070555 priority patent/WO2023134523A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the invention discloses a content self-adaptive video coding method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images; determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters; and coding the image set according to the coding characteristics and the code rate control parameters. The scheme improves the video coding efficiency and is suitable for real-time video scenes.

Description

Content adaptive video coding method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of video processing, in particular to a content adaptive video coding method, a content adaptive video coding device, content adaptive video coding equipment and a storage medium.
Background
With the rapid development of mobile internet technology, video has become the mainstream medium used by users, and live video, on-demand video, short video and video chat have become a part of people's lives. However, since the amount of video data is very large compared to text and pictures, and transmission and storage of video also face a great challenge, the video encoding and decoding technology is to achieve the highest compression ratio and the highest video reconstruction quality within available computing resources to meet the requirements of storage capacity and bandwidth. Early video service providers typically used a predetermined universal coding scheme to process almost all video content, which may have insufficient coding rate for highly moving video and low coding quality, and may have wasted coding rate for low-speed moving video. The content adaptive coding sets different coding configurations for different videos according to video contents, finds out the lowest code rate meeting the requirements of definition and subjective sensitivity for each video or video segment, and achieves the purpose of saving bandwidth.
When video coding is carried out, coded data are extracted as features by coding training video data in advance, and a machine learning model is trained by combining corresponding constant code rate coefficient values. By using the model to predict the coding parameters according to the video characteristics in a production environment and then using the predicted values to code, the balance between the coding bit rate and the coding quality is achieved, and the viewing experience of most audiences is improved. However, the coding method obtains the coding constant bitrate coefficient value of the whole video by coding and extracting features of the whole video and then predicting by using a machine learning model, and for a long video containing complex and mixed contents, the method can cause poor coding quality of a complex part of the video and waste of bitrate of a simple part. Meanwhile, in the encoding process, the whole video is encoded to extract features and predict a constant code rate coefficient value, and then a large amount of time is consumed for encoding according to the predicted value, so that the method is not suitable for a live broadcast scene.
Disclosure of Invention
The embodiment of the invention provides a content adaptive video coding method, a content adaptive video coding device, content adaptive video coding equipment and a storage medium, solves the problem that in the prior art, video coding is not ideal in coding effect under a complex scene, improves video coding efficiency, and is simultaneously suitable for a real-time video scene.
In a first aspect, an embodiment of the present invention provides a content adaptive video coding method, where the method includes:
acquiring video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
and coding the image set according to the coding characteristics and the code rate control parameters.
In a second aspect, an embodiment of the present invention further provides a content adaptive video coding apparatus, including:
the device comprises an image set determining module, a coding module and a decoding module, wherein the image set determining module is used for acquiring video data to be coded and dividing the video data into a plurality of image sets containing continuous frame images;
the code rate parameter determination module is used for determining the coding characteristics of the image set, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model and outputting code rate control parameters;
and the coding module is used for coding the image set according to the coding characteristics and the code rate control parameters.
In a third aspect, an embodiment of the present invention further provides a content adaptive video coding apparatus, where the apparatus includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the content adaptive video coding method according to the embodiment of the present invention.
In a fourth aspect, the present invention further provides a storage medium storing computer-executable instructions, which when executed by a computer processor, are configured to perform the content adaptive video coding method according to the present invention.
In the embodiment of the invention, the video data to be coded is obtained, the video data is divided into a plurality of image sets containing continuous frame images, the coding characteristics of the image sets are determined, the coding characteristics and the set video picture evaluation parameters are input into a pre-trained machine learning model to output the code rate control parameters, and the image sets are coded according to the coding characteristics and the code rate control parameters, so that the problem that the coding effect of video coding in the prior art on a complex scene is not ideal is solved, the video coding efficiency is improved, and the method is suitable for a real-time video scene.
Drawings
Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for performing secondary encoding based on a primary encoding result according to an embodiment of the present invention;
FIG. 3 is a flow chart of another content adaptive video coding method according to an embodiment of the present invention;
FIG. 4 is a flow chart of another content adaptive video coding method according to an embodiment of the present invention;
fig. 5 is a block diagram illustrating a content adaptive video coding apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a content adaptive video coding apparatus according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention, which can be applied to coding video data, and the method can be executed by a computing device such as a notebook, a desktop, a smart phone, a server, and a tablet computer, and specifically includes the following steps:
step S101, video data to be coded are obtained, and the video data are divided into a plurality of image sets containing continuous frame images.
The video data to be encoded includes recorded video data and video data generated in real time and required to be transmitted and displayed, such as live video data.
In one embodiment, when encoding video data, for a segment of video data, the video data is first divided into a plurality of image sets containing consecutive frame images. I.e. when video coding, a separate video coding is performed for each subdivided set of images. Illustratively, the video data may be divided into a plurality of consecutive GOPs (Group of pictures), each GOP representing a Group of consecutive pictures in an encoded video stream. For example, each GOP contains 15 frames or 20 frames of pictures, that is, the video data to be encoded is divided into a plurality of continuous picture sets, and each picture set contains 15 to 20 frames of pictures, that is, the video data is encoded by taking the GOP as a coding unit.
And S102, determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.
In one embodiment, the manner of determining the encoding characteristics of the image set may be implemented by using pre-encoding to obtain the encoding characteristics of the image set. Such as encoding a set of images using an encoder to obtain corresponding encoding characteristics.
In one embodiment, the encoding characteristics of the image set are obtained by performing feature extraction and analysis on each frame of image in the image set. Optionally, the encoding feature includes a motion vector feature, a distortion parameter, a complexity parameter, and the like for describing each frame of image in the image set. The motion vector features are used for representing the change degree of the images, wherein the motion vector is relatively larger when the change among the frame images is more severe, and the motion vector is relatively smaller if each frame image is described as a still picture; the distortion degree parameter is used for representing the distortion degree of the image, the larger the distortion degree of the image is, the higher the parameter value is, otherwise, if the distortion degree of the image is low, the corresponding parameter value is relatively lower; the complexity parameter is used to characterize the complexity of an image, for example, if the image includes a plurality of different objects, the complexity is higher when the pixel difference between each object is larger. Alternatively, the above-mentioned identification and determination of the encoding characteristics can be implemented by an existing encoder module, an image processing algorithm, and the like.
The video picture evaluation parameters are comprehensive evaluation indexes used for representing image quality. Alternatively, the Video frame evaluation parameter may be characterized by VMAF (Video multi-method Assessment Fusion). The VMAF is an objective evaluation index combining human visual modeling and machine learning and proposed by Netflix. VMAF uses a large amount of subjective data as a training set, and algorithms of different evaluation dimensions are fused by means of machine learning, so that the VMAF is an objective evaluation index which is mainstream at present. Generally, the higher the VMAF score is, the better the video quality is, but from the perception of human eyes, when the VMAF score of the same video is increased to a certain threshold value, the human eyes cannot perceive the improvement of the image quality, so that different VMAF values can be designed for different videos to save the coding bit rate on the premise of not changing the subjective quality of the video.
In one embodiment, the determined coding characteristics of the image set and the set video picture evaluation parameters are input to a pre-trained machine learning model to output the rate control parameters, wherein the set video picture evaluation parameters can be customized and set according to different picture quality requirements, different playing devices and the like, and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, and corresponding code rate control parameters can be output by inputting the coding characteristics of the image set and the set video picture evaluation parameters. Alternatively, the Rate control parameter may be a CRF (Constant Rate Factor) or a CQF (Constant Quality Factor). CRF is one kind of code rate control, and the smaller the CRF value is, the higher the video quality is brought, and the larger the file volume is; the larger the CRF value, the higher the video compression rate, but the lower the video quality. Optionally, different CRF values correspond to different code rates, and different CRF values and corresponding code rates may be recorded in a mapping table manner, or a function curve manner may be used to represent a relationship between CRF and code rate.
And S103, coding the image set according to the coding features and the code rate control parameters.
In one embodiment, after the code rate control parameter is obtained through the machine learning model, the image set is finally secondarily encoded based on the code rate control parameter and the encoding characteristics determined in step S101, so as to output code stream data.
Specifically, fig. 2 is a flowchart of a method for performing secondary encoding based on a primary encoding result according to an embodiment of the present invention, and as shown in fig. 2, the method specifically includes:
and step S1031, determining frame type information and scene information according to the coding features.
Wherein, the coding characteristics record the frame type of each frame, such as different frame types of I frame, P frame and B frame division. Wherein different frame type information requires different quality of coding compression due to different reference relationships. The I frame represents a key frame and is a frame of picture which is completely reserved, and the image decoding can be completed only by the frame data without referring to other frame images during decoding; the P frame represents the difference between the frame and a previous key frame or P frame, and the difference defined by the frame is superposed by using a picture cached before in decoding to generate a final picture; when the B frame represents a bidirectional difference frame, that is, when the B frame records a difference between the current frame and the previous and subsequent frames and decodes the B frame image, not only the previous buffer picture but also the decoded picture are acquired, and the final picture is acquired by superimposing the previous and subsequent pictures on the current frame data.
Among them, the scene information can be exemplarily divided into a moving scene and a still scene. The method can be determined according to the coding characteristics through an integrated scene discrimination module. The coding characteristics record image characteristics related to motion displacement change, such as motion vectors, motion compensation and the like of each frame of image, and scene information of the image is determined through analysis of the data of the motion vectors, the motion compensation and the like.
Step S1032, carrying out predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain a coding parameter.
The Coding parameter is, for example, High Efficiency Video Coding (HEVC), and corresponds to a Quantization Parameter (QP). The quantization parameter QP is the number of the quantization step Qstep, and for the luminance coding, the quantization step Qstep has 52 values, and the QP takes the values of 0-51, and for the chrominance coding, the QP takes the values of 0-39.
The coding parameter is exemplified by a quantization parameter QP, which reflects the spatial detail compression. The smaller the coding parameter value is, the finer the quantization is, the higher the image quality is, and the longer the generated code stream is; if the quantization parameter QP value is small, most details in the image are retained, and if the quantization parameter QP value is increased, some details in the image are correspondingly lost, so that the code rate is reduced. Taking the QP value of 0-51 as an example, when the QP takes the minimum value of 0, the quantization is finest; conversely, QP takes a maximum value of 51, indicating that quantization is coarsest. The quantization is to reduce the image coding length without reducing the visual effect, and reduce unnecessary information in the visual recovery.
Specifically, the process of obtaining the coding parameters by performing prediction analysis based on the frame type information, the scene information, and the code rate control parameters is implemented by using an integrated encoder module of HEVC high efficiency video coding, for example. I.e. different frame type information (I frame, B frame, P frame), scene information (static scene, dynamic scene), rate control parameter (CRF) are determined together to obtain the final coding parameter (frame level QP). Illustratively, the higher the value of the frame type is a key frame, the scene information is a dynamic scene, and the higher the value of the rate control parameter is, the lower the determined frame level QP value is.
Step S1033, encoding the set of images based on the encoding parameter.
In one embodiment, after the coding parameters are obtained, HEVC high efficiency video coding is performed, taking frame level QP parameters in HEVC high efficiency video coding as an example, to achieve code stream output.
In another embodiment, in order to improve the accuracy of the secondary encoding, the performing the predictive analysis to obtain the encoding parameter, and the encoding the set of images based on the encoding parameter includes: performing predictive analysis to obtain a first coding parameter; determining a second encoding parameter based on the first encoding parameter, the encoding feedback information, the cache information, the frame type information and the scene information; adjusting the quantization offset parameter according to the first coding parameter; and coding the image set according to the second coding parameter and the adjusted quantization offset parameter so as to output code stream data. Taking HEVC coding as an example, the first coding parameter may be understood as base QP information (base QP), which determines frame level QP information according to the first coding parameter, coding feedback information, buffer information, frame type information, and scene information. The cache information represents parameters of a buffer memory in the video coding process, and the larger the cache occupation is, the larger the corresponding QP value is, so as to reduce the operation amount and the storage amount of video coding. The coding feedback information may be information obtained in a pre-coding process or feedback obtained after coding the previous round of the image set or video, such as a distortion degree, and if the distortion degree is higher, the QP value needs to be reduced correspondingly, so as to improve the coding quality. The quantization offset parameter is further adjusted in accordance with the first encoding parameter while the second encoding parameter is determined in accordance with the first encoding parameter. The quantization offset parameter, which is exemplified by HEVC video coding, can be characterized by a cut strength, which represents the quantization offset adjustment according to the degree to which the current block is referred to. Specifically, if the current block is referred to, it is further determined whether a certain number of blocks after the current block refer to the current block, and if the number of blocks after the current block is more referred to by subsequent image blocks, it is characterized that the current block belongs to a slowly changing scene, and the QP value is correspondingly adjusted to be low to improve the image quality. And finally, performing image set encoding by comprehensively utilizing the determined second encoding parameter and the determined quantization offset parameter to output code stream data, thereby ensuring the optimal balance of the encoding effect between the image quality and the compression ratio.
According to the scheme, when video coding is carried out, firstly, a video is divided into image sets, the image sets are coded for the first time to obtain coding characteristics, then accurate code rate control parameters are output by using a trained machine learning model, and then secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain a video coding result.
Fig. 3 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, which provides a method for determining an image set coding characteristic, and as shown in fig. 3, the method specifically includes:
step S201, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
Step S202, obtaining a preset number of frame images in the image set, coding the preset number of frame images to obtain coding features, and determining the coding features as the coding features of the image set.
In one embodiment, taking the image set as a GOP image as an example, the preset number of frame images may be miniGOP images in one GOP image, that is, taking the image set as a GOP image of 15 frames as an example, the preset number of frame images may be 5 frame images therein. The process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain the encoding characteristics. And then determining the coding features of a preset number of frame images as the coding features of the image set.
And S203, inputting the coding characteristics and the set video picture evaluation parameters into a machine learning model trained in advance to output code rate control parameters.
And S204, coding the image set according to the coding features and the code rate control parameters.
According to the scheme, the video live broadcast content self-adaptive coding technology of twice coding and machine learning is adopted in the video coding process, the coding configuration is dynamically adjusted according to the complexity of video content, wherein the coding characteristics are obtained by obtaining the frame images with the preset number in the image set and coding the frame images with the preset number, the coding characteristics are determined as the coding characteristics of the image set, the coding speed can be obviously improved, the video coding effect required by real-time performance is outstanding, the data calculation amount is reduced, the content self-adaptive coding is realized, the video smoothness and the definition are better balanced, and the video live broadcast content self-adaptive coding method can be applied to a real-time live broadcast video scene and is good in video coding effect.
Fig. 4 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, which provides a specific method for outputting a rate control parameter through a machine learning model, where the machine learning model includes a joint model composed of a first training model and a second training model, and as shown in fig. 4, the method specifically includes:
step S301, obtaining video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images.
Step S302, determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model.
In one embodiment, the first training model is an XGBoost model and the second training model is a LightGBM model, both of which are decision tree based machine learning algorithms. Illustratively, a first code rate control parameter outputted by the first training model is denoted as CRF1, and a second code rate control parameter outputted by the second training model is denoted as CRF 2.
Step S303, performing weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain a code rate control parameter.
The finally calculated rate control parameter is denoted as CRF3, and optionally, it is calculated by formula CRF3 ═ λ1*CRF1+λ2CRF2 was calculated. Wherein λ is12=1,λ1∈[0,1],λ2∈[0,1]。
And S304, coding the image set according to the coding features and the code rate control parameters.
Therefore, when the code rate control parameters are output through the machine learning model, the final code rate control parameters are obtained by performing weighted average after the corresponding code rate control parameters are output through two different models based on the decision tree, so that the accuracy of the obtained code rate control parameters is higher, and the final video coding effect is better.
In one embodiment, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model, respectively, the method further comprises: acquiring video sample data of different scene types and corresponding different resolutions; dividing video sample data into a training set sample, a test set sample and a verification set sample, and inputting the training set sample, the test set sample and the verification set sample into a first training model and a second training model respectively for training. In the model training process, firstly, distinguishing scene types of video pictures, such as a dynamic scene and a static scene, and simultaneously, respectively training based on the video pictures with different resolutions as sample data, wherein the video sample data is divided into a training set sample, a test set sample and a verification set sample in the training process to obtain a final training model with good prediction effect.
Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, which is configured to execute the content adaptive video coding method according to the foregoing embodiment, and has corresponding functional modules and beneficial effects. As shown in fig. 5, the apparatus specifically includes: an image set determination module 101, a code rate parameter determination module 102 and an encoding module 103, wherein,
an image set determining module 101, configured to obtain video data to be encoded, and divide the video data into a plurality of image sets including consecutive frame images;
a code rate parameter determining module 102, configured to determine coding features of the image set, and input the coding features and the set video picture evaluation parameters to a pre-trained machine learning model to output code rate control parameters;
and the encoding module 103 is configured to encode the image set according to the encoding characteristics and the code rate control parameters.
According to the scheme, when video coding is carried out, firstly, a video is divided into image sets, the image sets are coded for the first time to obtain coding characteristics, then accurate code rate control parameters are output by using a trained machine learning model, and then secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain a video coding result.
In a possible embodiment, the code rate parameter determining module 102 is specifically configured to:
acquiring a preset number of frame images in the image set;
and coding the frame images of the preset number to obtain coding features, and determining the coding features as the coding features of the image set.
In a possible embodiment, the machine learning model includes a joint model formed by a first training model and a second training model, and the bitrate parameter determination module 102 is specifically configured to:
respectively inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model;
and carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain a code rate control parameter.
In one possible embodiment, the code rate parameter determining module 102 is further configured to:
before the coding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, video sample data of different scene types and corresponding different resolutions are obtained;
and dividing the video sample data into a training set sample, a test set sample and a verification set sample, and inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model respectively for training.
In a possible embodiment, the encoding module 103 is specifically configured to:
determining frame type information and scene information according to the coding features;
performing predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain a coding parameter;
encoding the set of images based on the encoding parameters.
In a possible embodiment, the encoding module 103 is specifically configured to:
performing predictive analysis to obtain a first coding parameter;
and determining a second encoding parameter based on the first encoding parameter, the encoding feedback information, the cache information, the frame type information and the scene information.
In a possible embodiment, the encoding module 103 is specifically configured to:
adjusting a quantization offset parameter according to the first encoding parameter;
and coding the image set according to the second coding parameter and the adjusted quantization offset parameter so as to output code stream data.
Fig. 6 is a schematic structural diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, as shown in fig. 6, the apparatus includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of the processors 201 in the device may be one or more, and one processor 201 is taken as an example in fig. 6; the processor 201, the memory 202, the input means 203 and the output means 204 in the device may be connected by a bus or other means, as exemplified by a bus connection in fig. 6. The memory 202 is used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the content adaptive video coding method in the embodiment of the present invention. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 202, i.e., implements the content adaptive video coding method described above. The input device 203 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus. The output device 204 may include a display device such as a display screen.
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a content adaptive video coding method described in the foregoing embodiment, and specifically include:
acquiring video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
and coding the image set according to the coding characteristics and the code rate control parameters.
It should be noted that, in the embodiment of the content adaptive video coding apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
It should be noted that the foregoing is only a preferred embodiment of the present invention and the technical principles applied. Those skilled in the art will appreciate that the embodiments of the present invention are not limited to the specific embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the embodiments of the present invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the concept of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for content adaptive video coding, comprising:
acquiring video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining the coding characteristics of the image set, and inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
and coding the image set according to the coding characteristics and the code rate control parameters.
2. The method of claim 1, wherein the determining the encoding characteristics of the set of images comprises:
acquiring a preset number of frame images in the image set;
and coding the frame images of the preset number to obtain coding features, and determining the coding features as the coding features of the image set.
3. The method according to claim 1, wherein the machine learning model comprises a joint model composed of a first training model and a second training model, and the inputting the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output bitrate control parameters comprises:
respectively inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model;
and carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain a code rate control parameter.
4. The method according to claim 3, further comprising, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model, respectively:
acquiring video sample data of different scene types and corresponding different resolutions;
and dividing the video sample data into a training set sample, a test set sample and a verification set sample, and inputting the training set sample, the test set sample and the verification set sample to the first training model and the second training model respectively for training.
5. The method of claim 1, wherein the encoding the set of images according to the encoding characteristics and the rate control parameters comprises:
determining frame type information and scene information according to the coding features;
performing predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain a coding parameter;
encoding the set of images based on the encoding parameters.
6. The method of claim 5, wherein the performing predictive analysis to obtain coding parameters comprises:
performing predictive analysis to obtain a first coding parameter;
and determining a second encoding parameter based on the first encoding parameter, the encoding feedback information, the cache information, the frame type information and the scene information.
7. The method according to claim 5, wherein said encoding the set of images based on the encoding parameters comprises:
adjusting a quantization offset parameter according to the first encoding parameter;
and coding the image set according to the second coding parameter and the adjusted quantization offset parameter so as to output code stream data.
8. A content adaptive video encoding apparatus, comprising:
the device comprises an image set determining module, a coding module and a coding module, wherein the image set determining module is used for acquiring video data to be coded and dividing the video data into a plurality of image sets containing continuous frame images;
the code rate parameter determination module is used for determining the coding characteristics of the image set, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model and outputting code rate control parameters;
and the coding module is used for coding the image set according to the coding characteristics and the code rate control parameters.
9. A content adaptive video encoding apparatus, the apparatus comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the content adaptive video encoding method of any one of claims 1-7.
10. A storage medium storing computer executable instructions for performing the content adaptive video encoding method of any one of claims 1-7 when executed by a computer processor.
CN202210043241.9A 2022-01-14 2022-01-14 Content adaptive video coding method, device, equipment and storage medium Pending CN114554211A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210043241.9A CN114554211A (en) 2022-01-14 2022-01-14 Content adaptive video coding method, device, equipment and storage medium
PCT/CN2023/070555 WO2023134523A1 (en) 2022-01-14 2023-01-04 Content adaptive video coding method and apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210043241.9A CN114554211A (en) 2022-01-14 2022-01-14 Content adaptive video coding method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114554211A true CN114554211A (en) 2022-05-27

Family

ID=81671210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210043241.9A Pending CN114554211A (en) 2022-01-14 2022-01-14 Content adaptive video coding method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114554211A (en)
WO (1) WO2023134523A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116320429A (en) * 2023-04-12 2023-06-23 瀚博半导体(上海)有限公司 Video encoding method, apparatus, computer device, and computer-readable storage medium
WO2023134523A1 (en) * 2022-01-14 2023-07-20 百果园技术(新加坡)有限公司 Content adaptive video coding method and apparatus, device and storage medium
CN117750080A (en) * 2023-12-28 2024-03-22 广州速启科技有限责任公司 Coding parameter prediction method and server for audio and video streaming
WO2024124911A1 (en) * 2022-12-16 2024-06-20 书行科技(北京)有限公司 Video encoding method and apparatus, electronic device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3808086A1 (en) * 2018-08-14 2021-04-21 Huawei Technologies Co., Ltd. Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection
CN111083473B (en) * 2019-12-28 2022-03-08 杭州当虹科技股份有限公司 Content self-adaptive video coding method based on machine learning
CN112383777B (en) * 2020-09-28 2023-09-05 北京达佳互联信息技术有限公司 Video encoding method, video encoding device, electronic equipment and storage medium
CN114554211A (en) * 2022-01-14 2022-05-27 百果园技术(新加坡)有限公司 Content adaptive video coding method, device, equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023134523A1 (en) * 2022-01-14 2023-07-20 百果园技术(新加坡)有限公司 Content adaptive video coding method and apparatus, device and storage medium
WO2024124911A1 (en) * 2022-12-16 2024-06-20 书行科技(北京)有限公司 Video encoding method and apparatus, electronic device and storage medium
CN116320429A (en) * 2023-04-12 2023-06-23 瀚博半导体(上海)有限公司 Video encoding method, apparatus, computer device, and computer-readable storage medium
CN116320429B (en) * 2023-04-12 2024-02-02 瀚博半导体(上海)有限公司 Video encoding method, apparatus, computer device, and computer-readable storage medium
CN117750080A (en) * 2023-12-28 2024-03-22 广州速启科技有限责任公司 Coding parameter prediction method and server for audio and video streaming

Also Published As

Publication number Publication date
WO2023134523A1 (en) 2023-07-20

Similar Documents

Publication Publication Date Title
CN114554211A (en) Content adaptive video coding method, device, equipment and storage medium
CN113015021B (en) Cloud game implementation method, device, medium and electronic equipment
EP3952307A1 (en) Video processing apparatus and processing method of video stream
CN110620924B (en) Method and device for processing coded data, computer equipment and storage medium
WO2021129007A1 (en) Method and device for determining video bitrate, computer apparatus, and storage medium
CN112437301B (en) Code rate control method and device for visual analysis, storage medium and terminal
CN110740316A (en) Data coding method and device
CN111182300B (en) Method, device and equipment for determining coding parameters and storage medium
CN111385577B (en) Video transcoding method, device, computer equipment and computer readable storage medium
CN114900692A (en) Video stream frame rate adjusting method and device, equipment, medium and product thereof
CN115022629B (en) Method and device for determining optimal coding mode of cloud game video
CN109688407A (en) Reference block selection method, device, electronic equipment and the storage medium of coding unit
US20140254688A1 (en) Perceptual Quality Of Content In Video Collaboration
CN114245209A (en) Video resolution determination method, video resolution determination device, video model training method, video coding device and video coding device
CA3182110A1 (en) Reinforcement learning based rate control
CN111524110A (en) Video quality evaluation model construction method, evaluation method and device
CN115118964A (en) Video encoding method, video encoding device, electronic equipment and computer-readable storage medium
CN110913221A (en) Video code rate prediction method and device
CN115379291B (en) Code table updating method, device, equipment and storage medium
CN111767428A (en) Video recommendation method and device, electronic equipment and storage medium
CN116471262A (en) Video quality evaluation method, apparatus, device, storage medium, and program product
CN116827921A (en) Audio and video processing method, device and equipment for streaming media
WO2020233536A1 (en) Vr video quality evaluation method and device
CN107004018B (en) Data processing method and device
TW202207053A (en) Image quality assessment apparatus and image quality assessment method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination