WO2021147448A1 - 一种视频数据处理方法、装置及存储介质 - Google Patents

一种视频数据处理方法、装置及存储介质 Download PDF

Info

Publication number
WO2021147448A1
WO2021147448A1 PCT/CN2020/126067 CN2020126067W WO2021147448A1 WO 2021147448 A1 WO2021147448 A1 WO 2021147448A1 CN 2020126067 W CN2020126067 W CN 2020126067W WO 2021147448 A1 WO2021147448 A1 WO 2021147448A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video sequence
sequence
encoded
frame
Prior art date
Application number
PCT/CN2020/126067
Other languages
English (en)
French (fr)
Inventor
吴景然
许思焱
赵俊
李�浩
李雅卿
涂承杰
朱子荣
汪亮
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021147448A1 publication Critical patent/WO2021147448A1/zh
Priority to US17/713,205 priority Critical patent/US20220232222A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream

Definitions

  • This application relates to the field of Internet technology, and in particular to a video data processing method, device and storage medium.
  • these video data can be encoded by the same encoder.
  • the default encoding bit rate is usually used to encode the two video data, so that the video quality of the video data obtained after video encoding is less than that of the video data.
  • the embodiments of the present application provide a video data processing method, device, and storage medium, which can improve the accuracy of video coding and can reduce the waste of coding resources.
  • the embodiments of the present application provide a video data processing method, the method includes:
  • One aspect of the embodiments of the present application provides a video data processing device, and the device includes:
  • the quality parameter acquisition module is used to acquire the video sequence to be encoded associated with the video source, and acquire the video quality standard parameters associated with the video sequence to be encoded;
  • a pre-encoding module configured to perform pre-encoding processing on the video sequence to be encoded according to the video quality standard parameters to obtain the pre-encoded video sequence, and determine the video feature corresponding to the video sequence to be encoded according to the pre-encoded video sequence;
  • the bit rate prediction module is used to predict the coding bit rate associated with the video sequence to be encoded according to the video quality standard parameters and video characteristics;
  • the video encoding module is used to perform encoding processing on the to-be-encoded video sequence according to the encoding bit rate to obtain an encoded video sequence associated with the video source.
  • the computer device includes: one or more processors and one or more memories;
  • the one or more memories are used to store program codes, and the one or more processors are used to call and execute the program codes, so that the computer device executes the video data processing method in the embodiments of the present application.
  • the computer storage medium stores a computer program.
  • the computer program includes program instructions.
  • the program instructions are executed by a processor of a computer device, the The computer device executes the video data processing method as in the embodiment of the present application.
  • the video quality standard parameter associated with the to-be-encoded video sequence can be obtained; among them, the embodiment of the application can place each video segment of the video source in The video sequence under the corresponding scaling parameter information is collectively referred to as the video sequence to be coded.
  • the video sequence to be encoded is pre-encoded according to the video quality standard parameters to obtain the pre-encoded video sequence, and the video characteristics corresponding to the video sequence to be encoded are determined according to the pre-encoded video sequence; further, according to the video quality standard parameters and the video characteristics, Predict the encoding rate associated with the video sequence to be encoded; further, perform encoding processing on the to-be-encoded video sequence according to the encoding rate to obtain an encoded video sequence associated with the video source.
  • an analysis of the video content in each video segment of the video source can quickly extract the video features related to each video segment (that is, the video sequence to be encoded), so that the target quality can be set
  • the prediction model is used to accurately predict the encoding bit rate used to encode each video segment, and then the quality index (that is, the set video quality standard parameter) can be set by predicting different
  • the coding rate of the video segment can improve the accuracy of video coding under a specific video quality, and can reduce the waste of coding resources.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a scenario for distributed transcoding provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a video data processing method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a scene for acquiring a video sequence to be encoded in a video-on-demand scene provided by an embodiment of the present application;
  • FIG. 5 is a schematic diagram of a scene of encoding bitrates used for encoding different video clips provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a scene of video quality obtained by encoding different video clips provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of a video data processing method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a scene for acquiring a video sequence to be encoded in a live video scene provided by an embodiment of the present application
  • FIG. 9 is a schematic diagram of an overall flow of obtaining a coded stream provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a video data processing device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the network architecture may include a server cluster and a user terminal cluster.
  • the user terminal cluster may include multiple user terminals. As shown in Figure 1, it may specifically include a user terminal 3000a, a user terminal 3000b, a user terminal 3000c, ..., User terminal 3000n. As shown in Figure 1, the user terminal 3000a, the user terminal 3000b, the user terminal 3000c,..., the user terminal 3000n can respectively connect to any server in the server cluster, so that each user terminal can connect to the corresponding server through the network. Data exchange between servers.
  • the embodiment of the present application may select one user terminal from the multiple user terminals shown in FIG. 1 as the target user terminal, and the target user terminal may include: smart phones, tablet computers, notebook computers, smart TVs, smart watches , Desktop computers and other smart terminals that carry video data collection functions (for example, video data recording functions, etc.).
  • the user terminal 3000a shown in FIG. 1 may be referred to as a target user terminal, and a target client with a video data recording function may be integrated in the target user terminal.
  • the target client integrated in the target user terminal may include an instant messaging client (for example, WeChat client, QQ client), a multimedia client (for example, a video player client), and an entertainment client (for example, a game client).
  • Clients virtual room clients (for example, live broadcast clients), and other clients that have the functions of loading and recording frame sequences (for example, video data).
  • the video data collected by the target user terminal through the browser page or the target client may be collectively referred to as initial video data (for example, the above-mentioned initial video data A), and this may be further combined through the above-mentioned network connection.
  • the initial video data A is uploaded to any server in the server cluster.
  • any server in the server cluster may be collectively referred to as a business server.
  • the embodiment of the present application takes the service server connected to the target user terminal as the server 20d shown in FIG. 1 as an example to illustrate that the server 20d performs multi-channel transcoding on video sources in different service scenarios. The specific process.
  • the initial video data uploaded by the target user terminal to the service server may be on-demand video data (that is, the initial video data may be a complete piece of video data).
  • the initial video data uploaded by the target user terminal to the service server may be continuously recorded live video data, and there is no restriction on the business scenario for obtaining the initial video data.
  • the server 20d when the server 20d receives the foregoing initial video data A, it may store the initial video data A in the service database.
  • the server 20d in the embodiment of the present application may also add the initial video data B to the location of the initial video A when obtaining complete video data (for example, initial video data B) uploaded by other user terminals in the above-mentioned user terminal cluster.
  • Is stored in the first transcoding database so that subsequent slicing processing can be performed on each of the initial video data to obtain a video segment of each initial video data.
  • each initial video data stored in the first transcoding database may be referred to as a video source.
  • the embodiments of the present application may collectively refer to several video fragments (for example, video fragment 1, video fragment 2, ..., video fragment n) obtained after slicing any video source Is the first video sequence.
  • the embodiment of the present application may combine
  • Each first video sequence obtained after slicing in the above-mentioned server 20d is synchronously distributed to other servers in the same distributed network, so that each of the other servers can pass through the video data processing function.
  • the video data processing device 2000 (not shown in FIG. 1 above) further performs multi-channel transcoding on the received corresponding video segment (that is, the first video sequence) to quickly obtain multiple channels associated with the first video sequence. Transcoding stream.
  • the embodiment of the present application may collectively refer to the service server used to obtain the initial video data (that is, the video source) as the second server.
  • the second server may be in a distributed cluster.
  • the distributed server for example, the server 20d in the server cluster of FIG. 1 described above).
  • other servers ie, other service servers
  • the first server can receive data from the second server.
  • the first video sequence obtained by segmenting the video source.
  • the second server When the second server obtains the video source, it can quickly identify whether the business scene to which the video source belongs is the live video scene corresponding to the live broadcast service or the video on demand scene corresponding to the on-demand service. For video sources in different business scenarios, the business server will choose different transcoding methods.
  • the second server recognizes that the acquired initial video data is on-demand video data, it can determine that the business scene to which the on-demand video data belongs is a video-on-demand scene.
  • the second server can perform a segmentation rule (for example The first sub-rule) directly executes the slicing operation on the acquired video source.
  • the second server may divide the video source into multiple video segments according to the first slice segmentation rule such as the length of time or the shot content, and distribute these video segments to the first server in the distributed cluster.
  • the video clips received by each first server and distributed by the second server may be collectively referred to as the first video sequence.
  • the first server may perform scaling processing on the resolution of the first video sequence according to the scaling parameter information associated with the video source, and call the scaled first video sequence The second video sequence.
  • the scaling parameter information in the embodiment of the present application may include one resolution or multiple resolutions, and the specific amount of the scaling parameter information is not limited here. Among them, the multiple resolutions may specifically include 1080p, 720p, 540p, 270p and other resolutions.
  • the transcoding operation of the first video sequence at one resolution may be referred to as one-way transcoding, and the transcoding operation of the first video sequence at the above-mentioned multiple resolutions may be referred to as multiple transcoding operations.
  • Road transcoding In the distributed transcoding system, each first server in the distributed cluster can multi-transcode the acquired first video sequence according to multiple zoom parameter information associated with the video source to obtain the The transcoding stream associated with each scaling parameter information.
  • one scaling parameter information is one resolution.
  • the transcoded stream associated with each resolution may be collectively referred to as an encoded stream (ie, an encoded video sequence).
  • an encoding code stream is obtained by encoding a video sequence to be encoded with an encoding rate predicted by a prediction model associated with a resolution (ie, scaling parameter information).
  • a video sequence to be encoded is obtained by the first server after scaling the resolution of the first video sequence according to a scaling parameter information.
  • other servers in the same distributed network as the server 20d may specifically include the aforementioned server 20a, server 20b, ..., server 20c.
  • the embodiment of the present application may collectively refer to other servers (server 20a, server 20b, ..., server 20c) in the same distributed network as the server 20d (ie, the second server) as the first server.
  • the above-mentioned video data processing device 2000 may be running in these first servers. In this way, these first servers that have received the first video sequence can respectively multiplex the received first video sequence through the video data processing device 2000. Code to quickly obtain a multi-channel transcoding code stream associated with the first video sequence.
  • the embodiment of the present application can obtain other user terminals associated with the second server (for example, the user terminal shown in FIG. 1 above).
  • the code stream for example, code stream 1
  • the search can be found
  • the coded stream 1 of the file is sent to the user terminal 3000n, and the decoded coded stream 1 can be played in the user terminal 3000n, which can effectively ensure the quality of the video data to be played, thereby improving the effect of video playback .
  • the server 20d may determine that the service scene to which the initial video data belongs is the live video scene.
  • the aforementioned server 20d it is difficult for the aforementioned server 20d to directly slice the initial video data (that is, the video source) received continuously. Therefore, in order to improve the efficiency of multi-channel transcoding, the embodiment of the present application can buffer some of the received initial video data (ie, video source) through the buffer in the live video scene, where the The video sequence of a specific sequence length obtained by buffering is collectively called a buffered video sequence.
  • the above-mentioned server 20d can perform scene cut detection on a buffered video sequence of a specific sequence length while acquiring a video source, so as to accurately locate the to-be-encoded video sequence that needs to be video-encoded from the buffered video sequence. Ensure the accuracy of video encoding in live video scenarios.
  • the embodiment of the present application may collectively refer to the first video frame of the buffered video sequence dynamically updated in the buffer as a key video frame.
  • the specific sequence length of the buffered video sequence dynamically updated in the buffer may be collectively referred to as the buffer sequence length (for example, 50 frames).
  • the server 20d ie, the service server
  • the video sequence between the two key video frames can be referred to as the video sequence to be coded, so that the video data processing device 2000 running in the server 20d executes the above-mentioned video sequence to be coded in the live video scene.
  • Video data processing function to continuously output multiple transcoded streams in the server 20d.
  • the embodiment of the present application takes the acquired initial video data as on-demand video data as an example to illustrate that multiple video segments of the on-demand video data (for example, video A) are distributed in a distributed transcoding system.
  • the specific process of the code In the video-on-demand scenario, the initial video data obtained by the second server may be the complete video data uploaded by the user terminal live broadcast, or may be the on-demand video obtained from the service database (for example, the first transcoding database) Data, there is no limitation on the specific method of obtaining on-demand video data.
  • the service server that is, the second server
  • FIG. 2 is a schematic diagram of a distributed transcoding scenario provided by an embodiment of the present application.
  • the server 10a shown in FIG. 2 may be the above-mentioned service server.
  • the service server may be the second server in the above-mentioned video-on-demand scenario.
  • the second server ie, the server 10a shown in FIG. 2 obtains the video A from the service database shown in FIG. 2, the video A can be used as the video source.
  • the server 10a can directly slice the video source according to the above-mentioned first slice segmentation rule to obtain multiple video clips as shown in FIG. First video sequence).
  • the multiple first video sequences may specifically include a video segment 100a, a video segment 200a, and a video segment 300a.
  • the server 10a may also configure a corresponding quality type and video quality parameters corresponding to the quality type for each video clip.
  • the quality type in the embodiment of the present application may include at least one of the following: a first quality type, a second quality type, a third quality type, and a fourth quality type.
  • the four quality types can all be used to evaluate the video quality of the video image in the corresponding video segment, and in the embodiment of the present application, the score obtained by the evaluation may be collectively referred to as the video quality parameter under the corresponding quality type.
  • the first quality type may be a VMAF (Video Multi-Method Assessment Fusion) type.
  • the second server that is, the service server
  • the service server configured for the first video sequence under the VMAF type video quality standard parameter (here refers to the quality evaluation value set under the VMAF type) can be a value of 0-100 Any one of the range, for example, VMAF 90.
  • the second quality type may be an SSIM (Structural Similarity, structural similarity index) type.
  • the second server that is, the service server configured for the first video sequence under the SSIM type video quality standard parameter (here refers to the quality evaluation value set under the SSIM type) can be a value of 0 to 1. Any one of the range, for example, SSIM 0.987. The larger the value of the quality evaluation value set under the SSIM type, the better the video quality of the finally output coded stream.
  • the third quality type may be a PNSR (Peak Signal to Noise Ratio, peak signal to noise ratio) type.
  • the second server that is, the service server
  • the service server configured for the first video sequence under the PNSR type of video quality standard parameters (here refers to the quality evaluation value set under the PNSR type) can be a value of 0-100 Any one of the range, for example, PNSR 40.
  • the fourth quality type may be a MOS (Mean Opinion Score) type.
  • the second server that is, the above-mentioned service server
  • configures the video quality standard parameter under the MOS type for the first video sequence can be a value of 1 to 5 Any one of the range, for example, MOS 4.
  • a quality type may be configured for each video clip.
  • the embodiment of the present application The quality types configured for each video segment can be collectively referred to as target quality types.
  • the target quality type configured by the server 10a for the video clip 100a may be VMAF type
  • the target quality type configured by the server 10a for the video clip 200a may be SSIM type
  • the server 10a may be the target quality type configured by the video clip 300a.
  • the server 10a may also configure the same quality type for each video clip obtained after the slicing process.
  • the video clip 100a, the video clip 200a, and the video clip 300a shown in FIG. 2 may be configured with the same quality type. Any one of multiple quality types.
  • a certain video quality parameter under a certain quality type configured by the server 10a for a certain video segment may be used as the video quality standard parameter of the video sequence to be encoded associated with this video segment, and then Under the video quality standard parameters of the target quality type, the encoding bit rate used for encoding the video sequence to be encoded is predicted by the prediction model corresponding to the target quality type.
  • the video quality parameter under the target quality type configured by the server 10a for the multiple video segments shown in FIG. 2 is VMAF90 as example.
  • the server 10a can start from the distributed cluster where the server 10a is located.
  • Select three servers for example, server 10b, server 10c, and server 10d shown in FIG. 2 as the first server to distribute the three video clips shown in FIG. 2 to these three first servers for multiplexing. Transcoding to improve the efficiency of multi-channel transcoding for each video segment in the video source in a distributed transcoding system.
  • the embodiment of the present application takes the server 10a distributing the sliced video clip 100a to the server 10b shown in FIG. 2 as an example to illustrate the specific process of multi-channel transcoding in the server 10a.
  • the server 10b that is, the first server
  • obtains the first video sequence for example, the video segment 100a
  • the server 10a that is, the second server
  • it can synchronously obtain that the server 10a is configured for the first video sequence.
  • the VMAF type video quality parameter that is, the aforementioned VMAF 90
  • the VMAF 90 can be used as the video quality standard parameter of the video sequence to be encoded.
  • the server 10b obtains the first video sequence (ie, video segment 100a)
  • the resolution (e.g., video segment 100a) of the video segment 100a is based on the zoom parameter information associated with the video source (e.g., 1080p, 720p, 540p, 270p).
  • , 540p perform scaling processing, where the first video sequence after scaling processing can be called the second video sequence.
  • the number of the second video sequence is the same as the number of the scaling parameter information, that is, the resolution of the second video sequence may specifically include 1080p, 720p, 540p, and 270p.
  • the embodiment of the present application may take the resolution of the second video sequence of 720p as an example, and use the second video sequence with the target resolution (ie 720p) as the video sequence to be encoded, and further describe in the server 10b A specific process of transcoding the video sequence to be encoded associated with the target resolution (that is, 720p).
  • the second video sequence associated with the target resolution may be collectively referred to as the video sequence to be coded, a complete pre-coding process is performed on the video sequence to be coded, and the code saved in the pre-coding process
  • the information is called the video feature corresponding to the video sequence to be encoded.
  • the server 10b can also find a prediction model (for example, prediction model 1) that matches the VMAF type in the prediction model library, so as to use the prediction model 1 to encode the coding rate of the video sequence to be coded at a specific video quality. Perform prediction, and then perform encoding processing on the to-be-encoded video sequence according to the predicted encoding rate to obtain the encoded bitstream 100b shown in FIG. 2.
  • Each quality type in the embodiments of this application can correspond to a trained prediction model.
  • the prediction model can predict the coding rate of the video sequence to be coded at a specific resolution and a specific video quality, which can then be based on prediction.
  • the obtained coding bit rate performs coding processing on the to-be-coded video sequence to obtain the coding bit stream associated with the corresponding resolution.
  • the first server obtains the video sequence to be coded, obtains the video features of the video sequence to be coded through precoding, and predicts the coding rate based on the video features for specific implementations, which can be seen in the following embodiments corresponding to FIG. 3 to FIG. 9.
  • FIG. 3 is a schematic flowchart of a video data processing method provided by an embodiment of the present application.
  • the method can be executed by a video data processing device with a video data processing function, and the method can at least include step S101-step S104:
  • Step S101 Obtain a video sequence to be encoded associated with a video source, and acquire a video quality standard parameter associated with the video sequence to be encoded.
  • the video data processing device may receive the first video sequence in the video source distributed by the service server in the video-on-demand scenario; wherein, the first video sequence may be obtained by the service server pair
  • the video source is determined after slicing processing; further, the video data processing device may determine the to-be-encoded video sequence associated with the video source according to the scaling parameter information of the video source and the first video sequence; further, the video data processing device
  • the video quality parameter configured by the service server for the first video sequence may be used as the video quality standard parameter associated with the video sequence to be encoded.
  • the video data processing apparatus in the embodiment of the present application may run in the above-mentioned first server, and the first server may be the server 10b in the embodiment corresponding to FIG. 2 above.
  • the first server may be a distributed server in a distributed transcoding system.
  • the service server may be another distributed server (that is, the second server) in the same distributed network as the first server.
  • the second server in the embodiment of the present application may accurately slice the acquired video source according to the slice segmentation rule, so as to divide the acquired video source into multiple video segments.
  • each video segment may be collectively referred to as a first video sequence.
  • the embodiment of the present application may further Distribute these first video sequences to the first server associated with the second server, so that the video data processing device (for example, the above-mentioned video data processing device 2000) running in the first server compares the acquired first video
  • the sequence is multi-channel transcoded to ensure the accuracy of video encoding for each video segment.
  • the efficiency of multi-channel transcoding on the video source can also be improved.
  • the service server (that is, the second server) can receive a large number of videos uploaded by the user terminal through the browser web page or the target client every day, and these videos can include videos in the video-on-demand scenario.
  • Data 1 (that is, the above-mentioned on-demand video data) may also include video data 2 (that is, the above-mentioned live video data) in a live video scene.
  • the video data 1 and the video data 2 received by the service server may be collectively referred to as the aforementioned initial video data, that is, one initial video data may be one video source.
  • the service server determines that the acquired initial video data is on-demand video data
  • it can directly use the acquired on-demand video data as the video source for slicing processing, and after the slicing is processed
  • the video clips of is distributed to other service servers (that is, the first server) in the same distributed network as the service server.
  • each first service server obtains the first video sequence distributed by the second server, it can perform scaling processing on the first video sequence according to the scaling parameter information of the above-mentioned video source (ie, on-demand video data), and perform scaling processing on the first video sequence.
  • the following first video sequence is determined to be the second video sequence.
  • the number of the second video sequence is the same as the number of the scaling parameter information of the video source. Therefore, in a video-on-demand scenario, the number of video sequences to be encoded acquired by the first server determines the number of subsequent video transcoding that needs to be performed.
  • the embodiment of the present application takes a service server in the distributed transcoding system as an example, and illustrates the multi-channel transcoding of the acquired video clips in the first server running the above-mentioned video data processing device.
  • the first server running the video data processing device may be the server 10c in the embodiment corresponding to FIG. 2 above.
  • the video clip obtained by the first server ie, the server 10c
  • FIG. 4 is a schematic diagram of a scene for acquiring a video sequence to be encoded in a video-on-demand scene provided by an embodiment of the present application.
  • the video clip obtained by the first server may be the aforementioned video clip 300a.
  • this embodiment of the present application may collectively refer to the video segments 300a in the video source distributed by the above-mentioned second server as the first video sequence.
  • the first server may determine the resolution of the video segment 300a (ie, the first video sequence) according to the scaling parameter information associated with the resolution of the video source. Perform zoom processing. As shown in FIG. 4, the first server may scale the resolution (e.g., 540p) of the first video sequence to the multiple resolutions shown in FIG.
  • the multiple resolutions here may specifically be resolution 1, resolution 2, resolution 3, and resolution 4.
  • resolution 1 may be the foregoing 1080p
  • resolution 2 may be the foregoing 720p
  • resolution 3 may be the foregoing 540p
  • resolution 4 may be the foregoing 270p.
  • the scaling parameter information associated with the resolution of the video source may be the aforementioned multiple resolutions, that is, one scaling parameter information may correspond to one resolution.
  • the code streams corresponding to any two of the multiple resolutions can be switched between each other. For example, in a user terminal in a video-on-demand scenario, the above-mentioned second server can quickly find and issue the same video clip with the same video content in different resolutions according to the stream switching request of the on-demand user using the user terminal. Encoding code streams can quickly realize switching between corresponding code streams while ensuring the quality of video playback, so as to improve encoding efficiency and reduce playback delay.
  • the embodiment of the present application may use a second video sequence obtained according to a scaling parameter information (that is, a resolution) as a video sequence to be encoded associated with a video source, so that Under the video quality standard parameters of a specific quality type, the following steps S102 to S104 are executed for each video sequence to be encoded.
  • a scaling parameter information that is, a resolution
  • a video sequence of the same video segment at different resolutions may be referred to as a second video sequence.
  • the second video sequence may include a video sequence 1a obtained when the video clip 300a is scaled to a resolution of 1, a video sequence 2a obtained when the video clip 300a is scaled to a resolution of 2, and a video clip 300a is scaled to a resolution of 3.
  • the obtained video sequence 3a is the video sequence 4a obtained when the video segment 300a is scaled to a resolution of 4.
  • the video sequence 1a, video sequence 2a, video sequence 2a, and video sequence 4a in the second video sequence may be collectively referred to as the video sequence to be encoded, and the first server (that is, the above-mentioned server 10d) performs processing on these video sequences.
  • the encoded video sequence is multi-channel transcoded.
  • the multi-channel transcoding here specifically refers to the 4-channel transcoding associated with the above-mentioned 4 resolutions, so that the transcoding code streams of the same video segment at different resolutions can be obtained subsequently through the following steps S102 to S104.
  • transcoded code streams may specifically include the encoding sequence 1d associated with resolution 1, the encoding sequence 2d associated with resolution 2, the encoding sequence 3d associated with resolution 3, and the encoding sequence 3d associated with resolution 4 shown in FIG. Associated coding sequence 4d.
  • the service server can judge the business scenario to which the acquired initial video data (ie video source) belongs, and then can judge whether the acquired video source can be sliced directly according to the business scenario to which the judged initial video data belongs .
  • the business scenario here may include the above-mentioned video-on-demand scenario, and may also include a live video scene.
  • the service server (that is, the above-mentioned second server) can obtain the initial video data periodically collected and sent by the user terminal (here refers to the terminal device that can perform image collection, such as the anchor terminal).
  • the initial video data obtained by the service server may be live video data.
  • the service server determines that the acquired initial video data is live video data, it can continuously determine the received initial video data (ie, live video data) as the video source, and then can adopt the second slice segmentation rule (for example, scene Detection rule) Perform scene cut detection on the continuously updated buffered video sequence in the buffer to find the transition from one scene (for example, scene 1) to another scene (for example, scene 1) from the buffered video sequence buffered in the current buffer.
  • the scene cut frame of scene 2 the sequence number of the scene cut frame in the current buffered video sequence may be referred to as a scene cut point, and the current buffered video can be divided into multiple scenes according to the found scene cut point.
  • Each scene in the embodiment of the present application may correspond to a key video frame.
  • the video sequence between any two scenes may be referred to as the video sequence to be encoded to be transmitted to the encoder.
  • the to-be-encoded video sequence associated with the video source determined by the service server (ie, the aforementioned second server) through the second slice segmentation rule may include a key video frame.
  • the service server ie, the second server
  • the service server ie, the second server
  • the second server running the video data processing device can directly transcode these continuously acquired video sequences to be encoded according to the above-mentioned second slice segmentation rule, so as to obtain the The to-be-encoded video sequence associated with the encoded video sequence (ie, an encoded bitstream).
  • the coded stream in the live video scene can be continuously distributed by the service server (ie, the second server) to other user terminals (for example, audience terminals) in the same virtual live room as the host terminal to ensure other users
  • the terminal can decode the continuously acquired coded stream through the corresponding decoder, so as to synchronously play the live video data collected by the host terminal in other user terminals.
  • the service server used for multi-channel transcoding of the video source in the live video scene may be any distributed server in the above-mentioned distributed transcoding system.
  • the distributed server used to obtain live video data is not performed here. limit.
  • the embodiment of the present application takes the video source obtained by the service server as an example of on-demand video data in a video-on-demand scenario, and illustrates that the first server running the above-mentioned video data processing device compares the video source with the video source.
  • the specific process of multi-channel transcoding of the associated video sequence to be encoded is the specific process of multi-channel transcoding of the associated video sequence to be encoded.
  • Step S102 Perform pre-encoding processing on the video sequence to be encoded according to the video quality standard parameters to obtain the pre-encoded video sequence, and determine the video feature corresponding to the video sequence to be encoded according to the pre-encoded video sequence.
  • the video data processing device obtains an initial encoder used to pre-encode the video sequence to be coded according to the video quality standard parameters; further, the video data processing device may perform pre-encoding processing on the video sequence to be coded according to the initial encoder to obtain A pre-encoded video sequence; wherein, the pre-encoded video sequence can include key video frames and predicted video frames; further, the video data processing device can be based on the key video frame, the predicted video frame, the resolution of the pre-encoded video sequence, and the pre-encoded video The bit rate of the sequence determines the encoding information of the pre-encoded video sequence; further, the video data processing device may determine the encoding information as the video feature corresponding to the video sequence to be encoded.
  • the embodiment of the present application takes as an example the resolution of the video segment shown in FIG. 4 (for example, the above-mentioned video segment 300a) is scaled to resolution 1, and it is explained that the video sequence 1a shown in FIG. The video sequence) performs a specific process of transcoding all the way.
  • the first server running the video data processing device can obtain the video sequence to be encoded (that is, the video sequence 1a shown in FIG. 4) synchronously to obtain the above-mentioned second server for the video segment 300a.
  • the configured video quality parameter, and the video quality parameter configured by the second server for the video segment 300a may be used as the video quality standard parameter of the video sequence to be encoded (for example, the aforementioned VMAF 90).
  • the first server may obtain the initial encoder for the user to pre-encode the video sequence 1a according to the video quality standard parameter (for example, the aforementioned VMAF 90), and may perform pre-encoding processing on the to-be-encoded video sequence according to the initial encoder to obtain the pre-encoding process.
  • the video quality standard parameter for example, the aforementioned VMAF 90
  • a coded video sequence wherein, a pre-coded video sequence may include a key video frame and a predicted video frame; a pre-coded video sequence may include a key video frame and at least one predicted video frame.
  • the first server can quickly determine the encoding information of the pre-encoded video sequence according to the resolution of the key video frame, the predicted video frame, the pre-encoded video sequence, and the bit rate of the pre-encoded video sequence, and can determine the encoding information as the pending video sequence.
  • Video features corresponding to the encoded video sequence As shown in Figure 4 above, the first server saves the encoding information obtained during the pre-encoding process of the video sequence 1a (that is, the video sequence to be encoded), and can use the saved encoding information of the pre-encoded video sequence as the video Video characteristics of sequence 1a.
  • the video feature of the video sequence 1a may be the video feature 1b shown in FIG. 4 above.
  • the specific process for the first server to obtain the encoding information of the encoded video sequence may be: the first server may obtain the interframe of the forward predicted frames The key video frame selected during compression may be determined as the reference video frame corresponding to the forward prediction frame; further, the first server may determine the total selected number of reference video frames as the first number, The total number of key video frames can be determined as the second number, and the total number of forward prediction frames can be determined as the third number; further, the first server can determine the data capacity and the second number corresponding to the key video frames, Determine the first average data capacity of the key video frame, and determine the second average data capacity of the forward prediction frame according to the data capacity and the third quantity corresponding to the forward prediction frame; further, the first server may obtain the corresponding data from the key video frame Obtain the maximum data capacity from the data capacity, determine the ratio between the first average data capacity and the maximum data capacity as the spatial complexity of the pre-encoded video sequence, and determine the
  • the first server running the above-mentioned video data processing device can perform a complete pre-encoding on the video sequence 1a (that is, the video sequence to be encoded) shown in FIG. 4, and perform a complete pre-encoding on the video sequence.
  • the coding information of the pre-coded video sequence associated with the video sequence 1a is saved.
  • different types of coded video frames can be obtained. For example, I frames (that is, Intra coded frames) can be obtained through intra-frame coding.
  • P frames Predicted frames, forward predicted frames
  • B frames Bi-directional predicted frames, bi-directional predicted frames
  • I frames obtained by intra-frame coding may be collectively referred to as the aforementioned key video frames
  • P frames or B frames may be collectively referred to as the aforementioned predicted video frames.
  • the embodiment of the present application may utilize spatial correlation encoding within a single video frame of the video sequence 1a to output I frames. That is, in the process of intra-frame compression, there is no need to consider time correlation, and no need to consider motion compensation.
  • the I frame obtained by encoding can also be used as a reference frame for subsequent video decoding.
  • the I-frame image may periodically appear in the video sequence 1a, and the appearance frequency may be determined by the insertion period of the initial encoder. According to the insertion period, the frame group associated with the video sequence to be encoded (ie, the video sequence 1a) can be determined, and a frame group can be regarded as a scene.
  • P-frames ie, P-frame images
  • B-frames ie, B-frame images
  • P-frame images can use forward time prediction to improve compression efficiency and image quality.
  • Each macroblock in the P frame image may be obtained after forward prediction according to the I frame closest to the P frame (here, the I frame may be regarded as a reference video frame).
  • the B frame image is obtained through bidirectional time prediction, that is, the B frame image can use the closest I frame image or the closest P frame image to the B frame as another reference video frame for bidirectional prediction.
  • a B frame image may use a future frame (that is, an encoded P frame or I frame that is after the B frame image and is nearest to the B frame) as the reference video frame. Therefore, in the process of pre-encoding the video frames in the to-be-encoded video sequence by the initial encoder, the transmission order and display order of the encoded video frames displayed in each frame group are different.
  • the display order (that is, the encoding order) of the encoded video frames may be: I B B P.
  • the coding information of the precoding video sequence associated with the video sequence 1a may include the key coding information of the precoding video sequence, the spatial complexity of the precoding video sequence, and the time domain of the precoding video sequence. Complexity, etc.
  • the key encoding information of the pre-encoded video sequence may specifically include the resolution, bit rate, number of key video frames, number of predicted video frames, number of reference frames, etc. of the pre-encoded video sequence.
  • the resolution of the pre-encoded video sequence may be the aforementioned resolution 1.
  • the code rate of the pre-encoded video sequence may be the code rate directly counted in the pre-encoding process.
  • the video sequence 1a may include multiple scenes, and each scene may correspond to one key video frame and at least one predicted video frame.
  • the at least one predicted video frame here may be a P frame (that is, a forward predicted frame).
  • the embodiment of the present application may collectively refer to the key video frames used when inter-coding the forward predicted frame (ie, the P frame) as the reference video frame.
  • the pre-encoding process of the embodiment of the present application each time a key video frame is used, the number of reference video frames can be increased by one, and the total number of reference video frames finally counted when the pre-encoding is completed can be added.
  • the selected quantity is determined as the first quantity.
  • the embodiments of the present application may also collectively refer to the number of key video frames (that is, the total number of key video frames) counted in the pre-coding process as the second number, and may also refer to the number of key video frames counted in the pre-coding process.
  • the number of forward predicted frames (that is, the total number of forward predicted frames) is collectively referred to as the third number.
  • the first server may also calculate the spatial complexity of the pre-encoded video sequence through the following formula (1):
  • the average size of the I frame is determined by the data capacity (for example, 100 kB, 90 kB, etc.) corresponding to each key video frame obtained by the first server and the total number of I frames obtained by statistics.
  • the embodiment of the present application can determine the first average of these key video frames based on the data capacity corresponding to each key video frame and the total number of key video frames counted by the first server (that is, the above-mentioned second number)
  • the data capacity, and the first average data capacity is collectively referred to as the aforementioned average I frame size.
  • the embodiment of the present application can also find the key video frame with the largest data capacity from the data capacity corresponding to these key video frames, and the found key video frame with the largest data capacity can be called the maximum I frame.
  • the frame size is the maximum data capacity among the data capacities corresponding to these key video frames. Therefore, in the embodiment of the present application, the ratio between the first average data capacity and the maximum data capacity can be used as the spatial complexity of the pre-encoded video sequence according to the above formula (1).
  • the first server may also calculate the time domain complexity of the pre-encoded video sequence through the following formula (2):
  • Time domain complexity average size of P frame/average size of I frame Formula (2).
  • the average size of the P frame refers to the data capacity (for example, 20 kB, 15 kB, etc.) corresponding to each forward prediction frame obtained by the first server.
  • the embodiment of the present application may determine the first forward prediction frame according to the data capacity corresponding to each forward prediction frame and the total number of forward prediction frames counted by the first server (that is, the third number).
  • the average data capacity In this embodiment of the application, the second average data capacity may be collectively referred to as the above-mentioned average size of P frames. As shown in the above formula (2), in this embodiment of the present application, the ratio between the second average data capacity and the first average data capacity may be used as the time domain complexity of the pre-encoded video sequence.
  • Step S103 Predict the encoding bit rate associated with the video sequence to be encoded according to the video quality standard parameters and the video characteristics.
  • the video data processing device may obtain the target quality type (ie, VMAF type) corresponding to the video quality standard parameter (for example, the aforementioned VMAF 90), and may combine the target quality type (ie, the VMAF type) in the prediction model library associated with multiple quality types.
  • the target quality type ie, VMAF type
  • the video quality standard parameter for example, the aforementioned VMAF 90
  • the prediction model that matches the quality type is used as the target prediction model; further, the video data processing device may input video features into the target prediction model, and output the matching degrees between the video features and multiple reference video features in the target prediction model; further , The video data processing device can use the reference video feature with the highest matching degree with the video feature as the target reference video feature in the matching degree, and then can use the sample bit rate information corresponding to the quality label information associated with the target reference video feature , As the encoding rate associated with the video sequence to be encoded.
  • the first server running the above-mentioned video data processing device may input the video characteristic 1b with the above-mentioned video characteristic 1b after acquiring the video characteristic 1b of the to-be-encoded video sequence (for example, the above-mentioned video sequence 1a) Target prediction model that matches the VMAF type.
  • the target prediction model can predict the coding rate used for coding the video sequence to be coded according to the set specific quality index (that is, the above-mentioned video quality standard parameter), so as to further execute the following step S104.
  • the coding bit rate associated with each video sequence to be coded can be predicted through the same target prediction model.
  • the encoding rate of the video sequence 1a may be the encoding rate 1c shown in FIG. 4
  • the encoding rate of the video sequence 2a may be the encoding rate 2c shown in FIG. 4
  • the encoding rate of the video sequence 3a may be The coding rate 3c shown in FIG. 4 and the coding rate of the video sequence 4a may be the coding rate 4c shown in FIG. 4 above.
  • Step S104 Perform encoding processing on the to-be-encoded video sequence according to the encoding bit rate to obtain an encoded video sequence associated with the video source.
  • the same target prediction model can be used to perform the same target prediction model on the same video segment with the same video content at different resolutions.
  • the following encoding rate is predicted, and then the multiple to-be-encoded video sequences shown in FIG. 4 can be encoded according to the multiple predicted encoding rates to output an encoded video sequence associated with a corresponding resolution.
  • the coded video sequence may specifically include the coded sequence 1d, the coded sequence 2d, the coded sequence 3d, and the coded sequence 4d shown in FIG. 4.
  • the coded video sequence associated with each resolution may be collectively referred to as a coded stream.
  • the target quality type in the embodiment of the present application can be any one of the above-mentioned multiple quality types, and each quality type can correspond to a prediction model, and these prediction models can all be stored in the prediction model of the distributed transcoding system In the library. Therefore, when the first server running the above-mentioned video data processing device obtains the target evaluation value under the above-mentioned set quality index, the target evaluation value under the above-mentioned set quality index may be collectively referred to as the above-mentioned video quality standard parameter. The first server may subsequently directly adjust the output quality of the encoding sequence to be output according to the set video quality standard parameters of the target quality type.
  • each first server in the distributed transcoding system when the video clips obtained by each of these first servers are not the same, it can ensure that the video quality is relatively consistent. , Find the encoding rate for each first server to encode the corresponding video clips as reasonably as possible, so as to solve the problem of bandwidth waste caused by indiscriminate encoding of these video clips using the same encoding rate. The waste of coding rate is reduced, and the purpose of bandwidth saving can be achieved.
  • FIG. 5 is a schematic diagram of a scene of encoding bitrates for encoding different video clips provided by an embodiment of the present application.
  • the video source in the embodiment of the present application may include multiple video clips as shown in FIG. 5, and the multiple video clips here may specifically include the video clip 1, video clip 2, video clip 3, etc. shown in FIG. 5. ., video clip 25.
  • the curve 11 shown in FIG. 5 can be used to represent a schematic diagram of performing indiscriminate video coding on these 25 video segments at a fixed coding rate (for example, 4M).
  • the curve 21 can be used to characterize the schematic diagram of the coding rate predicted by the above target prediction model for encoding different video fragments when the video quality standard parameters of the 25 video fragments are configured as VMAF 90 .
  • the trained target prediction model can accurately predict and obtain the coding rate of different video segments under the same video quality index.
  • These video clips can be multi-channel transcoded in the same service server (for example, a first server), or they can be multi-channel transcoded in different service servers (for example, multiple first servers). This is not correct.
  • the number of first servers that perform multi-channel transcoding processing is specifically limited.
  • the second server can divide the video source into multiple video clips according to the above-mentioned first slice segmentation rule, that is, the second server can compare the videos corresponding to the 25 video clips shown in FIG. 5 .
  • the video source is divided into multiple video segments as shown in FIG. 5 according to the video content characteristics of the video source (for example, scene information, image information, and encoding information, etc.).
  • the scene information may specifically include scene category information, near/distant scene information, camera operating information, prominent area information, etc. contained in the video source.
  • the image information can include the texture detail feature, noise type feature, color feature, color contrast feature, etc. of the video source.
  • the coding information may include the key coding information of the pre-encoded video sequence (for example, resolution information and number information of reference frames, etc.), the spatial complexity, the temporal complexity information, and the like.
  • this embodiment of the present application takes as an example that a sliced video segment (that is, the foregoing first video sequence) is distributed to a first server.
  • the above-mentioned second server can configure the same video quality standard parameters under the same quality type when configuring the video quality parameters for the 25 video clips shown in Figure 5, and can distribute these 25 video clips to the second
  • the servers are located in 25 first servers in the same distributed network to implement distributed transcoding in these first servers, thereby improving the efficiency of multi-channel transcoding in different first servers.
  • a video segment corresponds to a transcoding server (that is, a first server)
  • the encoding rate of each video segment shown in FIG. 5 can be performed by the first servers in the distributed server cluster through the target prediction model. Obtained after prediction.
  • the video content characteristics of these video clips are usually different, so that the encoding bitrates predicted by the target prediction models in these first servers may be different.
  • FIG. 6 is a schematic diagram of a scene of video quality obtained by encoding different video clips provided by an embodiment of the present application.
  • the video clips shown in FIG. 6 may be the coding sequence of 25 video clips in the embodiment corresponding to FIG. 5 above.
  • the curve 22 shown in FIG. 6 shows the video quality of the encoded sequence obtained after encoding the 25 video segments using the predicted different encoding bit rates. That is, the 25 video clips are respectively encoded with different encoding bit rates. After the segment is encoded, it can be ensured that the fluctuation range of the video quality of the 25 video segments is relatively stable.
  • the video quality obtained by encoding the corresponding video segment through the predicted different encoding bit rates (that is, the video quality represented by the curve 22) is compared with the previous fixed encoding code.
  • the video quality (that is, the video quality represented by curve 12) obtained by encoding these 25 video clips indiscriminately at different rates can effectively ensure stable changes in video quality, thereby improving the subsequent output to the above-mentioned on-demand terminal.
  • the playback effect of video clips that is, there will be no drastic fluctuations in video quality.
  • the first server running the above-mentioned data processing device may, after acquiring multiple coded code streams associated with the above-mentioned video segment 300a (ie, the first video sequence), collectively refer to these coded code streams as coded video sequences, and then may refer to them as coded video sequences.
  • the encoded video sequence is returned to the second server as an encoded stream associated with the scaling parameter information, so that when the second server receives the encoded stream returned by all the first servers in the distributed server cluster for the same scaling parameter information , According to the slice identification information associated with the video source after slice processing, all the received coded streams are merged. For example, after the second server obtains the multi-channel transcoding stream returned by the multiple first servers in the embodiment corresponding to FIG.
  • the combined code stream can be delivered to the on-demand terminal for playback processing when the on-demand terminal requests to play the combined code stream corresponding to the resolution 2.
  • the service database corresponding to the service server can store video files that have completed multi-channel transcoding (each video file is an encoded stream associated with the video source), and the on-demand terminal can pass the target
  • the client or browser webpage accesses the above-mentioned second server to obtain an encoded code stream matching the video data requested by the on-demand terminal from the service database associated with the second server.
  • the coded code stream can be decoded by the decoder supported by the on-demand terminal, so as to perform playback processing on the decoded video data in the on-demand terminal.
  • the quality of the video data output in the on-demand terminal can be ensured, and the playback effect of the video data can be improved.
  • the video quality standard parameter associated with the to-be-encoded video sequence can be obtained; among them, the embodiment of the application can place each video segment of the video source in The video sequence under the corresponding scaling parameter information is collectively referred to as the video sequence to be coded.
  • the video sequence to be encoded is pre-encoded according to the video quality standard parameters to obtain the pre-encoded video sequence, and the video characteristics corresponding to the video sequence to be encoded are determined according to the pre-encoded video sequence; further, according to the video quality standard parameters and the video characteristics, Predict the encoding rate associated with the video sequence to be encoded; further, perform encoding processing on the to-be-encoded video sequence according to the encoding rate to obtain an encoded video sequence associated with the video source.
  • an analysis of the video content in each video segment of the video source can quickly extract the video features related to each video segment (that is, the video sequence to be encoded), so that the target quality can be set
  • the prediction model is used to accurately predict the encoding bit rate used to encode each video segment, and then the quality index (that is, the set video quality standard parameters) can be set through the prediction
  • the coding rate of different video segments can improve the accuracy of video coding under a specific video quality, and can reduce the waste of coding resources.
  • FIG. 7 is a schematic diagram of a video data processing method provided by an embodiment of the present application. As shown in FIG. 7, the method may be executed by a video data processing device with a video data processing function, and the method may include the following steps S201 to S208.
  • Step S201 Receive initial video data collected and uploaded by a user terminal, and determine the received initial video data as a video source.
  • the video data processing device may determine whether the initial video data is live video data when acquiring the initial video data, and if it is determined to be yes, it may continuously use the acquired initial video data as the video source.
  • the server for example, the above-mentioned second server running the video data processing device
  • Step S202 Obtain the key video frame from the video source, and determine the buffered video sequence used for scene detection in the video source according to the key video frame and the length of the buffer sequence associated with the key video frame, according to the buffered video sequence and the video source The scaling parameter information for determining the video sequence to be encoded for pre-encoding.
  • the video source includes M video frames associated with the acquisition period, where M is a positive integer; the video data processing device may determine the first video frame as the first key among the M video frames of the video source Video frame; further, the video data processing device may determine the buffered video sequence for scene detection from the M video frames according to the first key video frame and the length of the buffer sequence associated with the first key video frame; further The video data processing device may determine video frames other than the first key video frame as the to-be-detected video frame b i in the buffered video sequence, and compare the to-be-detected video frames b i in the buffered video sequence according to the first key video frame.
  • the video data processing device may detect that the degree of change in video content between the first key video frame and the video frame b i to be detected is greater than that of the scene
  • the video frame b i to be detected is determined as the second key video frame
  • the video data processing device may use the video sequence between the first key video frame and the second key video frame as the initial video sequence, The initial video sequence is scaled according to the scaling parameter information of the video source, and the initial video sequence after the scaling process is determined as a video sequence to be coded for precoding.
  • FIG. 8 is a schematic diagram of a scene for acquiring a video sequence to be encoded in a live video scene provided by an embodiment of the present application.
  • the user terminal 40 shown in FIG. 8 may be the host terminal corresponding to the host user (ie, user A shown in FIG. 8), and the initial video data collected by the host terminal during the collection period may be The live video data shown in Figure 8.
  • the live video data shown in FIG. 8 may include M video frames (that is, may include video frame 1a, video frame 1b, ... video frame 1m), where M may be 60.
  • the embodiment of the present application may collectively refer to the collection period formed from the first moment to the mth moment shown in FIG. 8 as a collection period.
  • the live video data collected by the anchor terminal within the period of time can be used as the video source and continuously uploaded to the service server 30.
  • the service server 30 may perform scene detection on the acquired buffered video sequence associated with the video source according to the second slice segmentation rule in the above-mentioned live video scene.
  • the service server 30 shown in FIG. 8 may be the above-mentioned second server.
  • the second server determines to obtain live video data, it may collectively refer to the continuously obtained live video data as a video source, and include Each video frame is given to the buffer in turn.
  • the length of the buffered video sequence of the buffer may be 50 frames.
  • the buffer in the service server 30 obtains the first video frame in the video source, it may be determined as the first key video frame, and the first key video frame may be the one shown in FIG. 8
  • the key video frame 10a in the video sequence 2 is buffered.
  • the key video frame 10a may be the video frame 1a shown in FIG. 8 described above.
  • the service server 30 When the service server 30 determines the key video frame 10a, it can start from the key video frame 10a and buffer video frames of a specific sequence length to form the buffered video sequence 2 shown in FIG. 8.
  • the specific sequence length of the buffer (for example, the frame length formed by L (for example, 50) video frames) may be collectively referred to as the buffer sequence length.
  • L can be a positive integer less than or equal to M.
  • the service server 30 may use the buffered video sequence 2 to remove the remaining key video frame (that is, the key video frame 10a) in the buffered video sequence 2.
  • Each video frame in is used as a video frame to be detected, and the video frame to be detected currently used for comparison with the video content of the first key video frame may be recorded as the video frame to be detected b i .
  • i can be a positive integer greater than 1 and less than M.
  • the video frames in the buffered video sequence 2 may include: a key video frame 10a, a video frame to be detected b 2 , a video frame to be detected b 3 , ..., a video frame to be detected b L.
  • the service server may determine whether the current video frame b i to be detected is a scene cut frame by determining whether the degree of change of the video content between the first key video frame and the video frame to be detected b i is greater than the scene cut threshold.
  • the service server detects that there are multiple scenes in the buffered video sequence 2
  • the video sequence between the first key video frame and the second key video frame may be referred to as the initial video sequence 400a of FIG. 8.
  • the initial video sequence 400a may comprise from key video frames 10a to be detected of a video sequence of video frames 4 b thereof.
  • the service server may perform scaling processing on the initial video sequence 400a according to the scaling parameter information of the video source (that is, the above-mentioned 1080p, 720p, etc.), and may determine the initial video sequence 400a after the scaling processing as the precoding
  • the to-be-coded video sequence 400b can thus be obtained according to the following steps S203-step S206 to obtain the coding rate 1 associated with the to-be-coded video sequence 400b, and then the to-be-coded video sequence 400b can be encoded according to the coding rate 1, to The encoded video sequence 400c shown in FIG. 8 is obtained.
  • the first video frame of the transition video sequence 3 may be the above-mentioned second key video frame (that is, the key video frame 20a shown in FIG. 8 may be the above-mentioned video frame b 5 to be detected).
  • the video frames in the transitional video sequence 3 may be the remaining video frames to be detected after removing the initial video sequence 400a from the above-mentioned buffered video sequence 2 (ie, the to-be-detected video frames b 5 , ..., the to-be-detected video frames). b L ).
  • the service server will continuously obtain the live video data collected and uploaded by the anchor terminal during the collection period.
  • the embodiment of the present application may also determine the to-be-filled video sequence 4 shown in FIG.
  • the transitional video sequence 3 can be complemented by the to-be-filled video sequence 4 to further ensure that the buffered video sequence 3 in the buffer can be compared with Cached video sequence 2 has the same cache sequence length.
  • the buffered video sequence 3 may be the transitional video sequence after the above-mentioned complement processing.
  • the video frames other than the second key video frame in the buffered video sequence 3 may be determined as the new video frame to be detected (that is, the video to be detected) Frame d j ), so as to continue to perform scene detection on these new video frames to be detected in the buffered video sequence 3 according to the second key video frame.
  • j can be a positive integer greater than 1 and less than or equal to the aforementioned L.
  • the service server 30 performs scene detection on the to-be-detected video frame d j in the buffered video sequence dynamically updated in the buffer (ie, the buffered video sequence 3 shown in FIG. 8), please refer to the above-mentioned buffered video sequence
  • the description of scene detection for the to-be-detected video frame bi in 2 will not be repeated here.
  • Step S203 Configure video quality standard parameters for the video sequence to be encoded based on the configuration information of the user terminal;
  • Step S204 performing pre-encoding processing on the to-be-encoded video sequence according to the video quality standard parameters to obtain the pre-encoded video sequence, and determine the video feature corresponding to the to-be-encoded video sequence according to the pre-encoded video sequence;
  • Step S205 Predict the encoding bit rate associated with the video sequence to be encoded according to the video quality standard parameters and the video characteristics;
  • Step S206 Perform encoding processing on the to-be-encoded video sequence according to the encoding bit rate to obtain an encoded video sequence associated with the video source;
  • Step S207 When the streaming request of the viewer terminal in the virtual live broadcast room is obtained, the playback resolution in the streaming request is obtained;
  • Step S208 Search the coded video sequence for the target coded video sequence corresponding to the scaling parameter information matching the playback resolution, and push the target coded video sequence as the coded stream to the viewer terminal, so that the viewer terminal can perform the coded stream After decoding, the target coded video sequence is obtained.
  • step S201 to step S208 described in the embodiment of the present application may include the above-mentioned live video scene, and may also include the above-mentioned video-on-demand scene.
  • the pull stream described in step S207 refers to a process in which the server already has live content, and the client uses a designated address to pull it. Therefore, the pull request of the viewer terminal is the request of the viewer terminal to pull the live video content.
  • FIG. 9 is a schematic diagram of an overall flow of obtaining a coded stream provided by an embodiment of the present application. Steps S1 to S5 shown in FIG. 9 can be applied to any service server in the above-mentioned distributed transcoding system.
  • step S1 indicates that after obtaining the video clip, the service server can scale the video clip to different resolutions by using fixed coding parameters (for example, setting the above-mentioned target quality type and the video quality standard parameter under the target quality type) ,
  • the pre-encoding in step S2 is performed on the encoded video sequence obtained after the scaling process, and then in the process of pre-encoding the to-be-encoded video sequence, the encoded information generated by the pre-encoding can be counted as the encoding information of the to-be-encoded video sequence.
  • Video characteristics Step S4 shown in FIG.
  • the service server can predict the coding rate used for encoding the video sequence to be coded through the prediction model that matches the target quality type, and then can perform step S5 according to the predicted coding rate. , In order to obtain multiple encoding bitrates associated with the video segment shown in FIG. 9.
  • the process of training prediction models corresponding to multiple quality types may roughly include the selection of sample video sequences, the extraction of sample video features, the extraction of quality label information, and the training of prediction models.
  • the first server in the distributed transcoding system can use the sample video features of the N sample video sequences as reference video features when acquiring N sample video sequences associated with multiple business scenarios, and can acquire Multiple quality types associated with N sample video sequences, where N is a positive integer; a sample video feature is determined after pre-encoding a sample video sequence; further, the first server can obtain information from multiple quality The target quality type is obtained in the type, and the sample code rate parameter associated with the target quality type is obtained; the sample code rate parameter contains K sample code rate information, where K is a positive integer; further, the first server can be based on the K sample codes Rate parameter performs traversal encoding on each sample video sequence in N sample video sequences, and obtains the quality evaluation value of each sample video sequence under K sample rate parameters; among them, a quality evaluation value is a sample video sequence in
  • the selected sample can cover all business scenarios in the actual business type as much as possible to ensure the universality of the prediction model obtained by subsequent training.
  • the actual business types can include news, animation, variety shows, games, movies and TV, etc.
  • the business scenes can include scene information such as complex pictures, simple pictures, intense motion shots, and still shots.
  • the scale of the sample video sequence can generally be about 10,000 video clips.
  • a prediction model can be trained for each quality type. Since the purpose of the embodiment of this application is to predict the bit rate parameter of the video clip under the target index of the target quality type (that is, the above-mentioned video quality standard parameter) through the prediction model obtained by the final training, therefore, the embodiment of this application uses the quality label information
  • the sample video sequence needs to be traversely encoded to obtain the quality evaluation value of the sample video sequence under a specific bit rate parameter (for example, all bit rate points in [0-51]), and then the quality evaluation value can be established The quality evaluation value of the same sample video sequence under all coding parameters of different quality types (here refers to the quality evaluation score).
  • the quantization step ie interval
  • the bit rate value correspondence table that is, a bit rate value can correspond to a quality evaluation value, and these obtained quality evaluation values can all be used as the quality label information of the sample video sequence under the corresponding sample bit rate parameter.
  • a quantization step of 10kbps can be set in the code rate range of 10kbps ⁇ 5Mbps to generate a quality label information.
  • the initial model used is a multilayer neural network model.
  • the extracted sample video features are input and output into and out of the multilayer neural network model.
  • the corresponding table of the bit rate value of each sample video feature under the specified quality index can be output.
  • the slice segmentation rules adopted in the above two application scenarios are slightly different.
  • the service server when the service server obtains the on-demand video data, it may directly use the above-mentioned first slice segmentation rule to divide the obtained video source into several video segments.
  • the service server when the service server obtains the live video data, it cannot directly use the above-mentioned first slice segmentation rule to divide the video source into several video segments, so it needs to use the above-mentioned second slice segmentation rule to obtain the data from the video source first.
  • scene detection is performed on the buffered video sequence, and then a video sequence to be coded for precoding can be determined from the buffered video sequence according to the scene detection result.
  • the video features related to each video segment can be quickly extracted, so that the target can be set
  • the coding rate used to encode each video segment is accurately predicted through the prediction model, and then the quality index (that is, the set video quality standard parameters) can be set through prediction.
  • the coding rate of different video segments can improve the accuracy of video coding under a specific video quality, and can reduce the waste of coding resources.
  • FIG. 10 is a schematic structural diagram of a video data processing apparatus provided by an embodiment of the present application.
  • the video data processing device 1 can run on the server 10a (that is, the second server) in the embodiment corresponding to FIG. 2 to perform multiple operations on the acquired video sequence to be encoded through the server 10a in a live video scenario. Road transcoding.
  • the video data processing device 1 may also run on the server 10b (ie, the first server) in the embodiment corresponding to FIG.
  • the to-be-encoded video sequence is multi-channel transcoded.
  • the video data processing device 1 may include: a quality parameter acquisition module 10, a pre-encoding module 20, a bit rate prediction module 30, and a video encoding module 40; further, the video data processing device 1 may also include: encoding code stream Return module 50, pull stream request acquisition module 60, and code stream push module 70.
  • the quality parameter obtaining module 10 is configured to obtain a video sequence to be coded associated with a video source, and obtain a video quality standard parameter associated with the video sequence to be coded;
  • the quality parameter acquisition module 10 includes: a first sequence receiving unit 101, a sequence to be coded determining unit 102, and a quality parameter determining unit 103; the video data processing device 1 may also include: a video source determining unit 104, a buffer sequence determining unit 105, and Quality parameter configuration unit 106;
  • the first sequence receiving unit 101 is configured to receive a first video sequence in a video source distributed by a service server; the first video sequence is determined by the service server after slicing the video source;
  • the to-be-encoded sequence determining unit 102 is configured to determine the to-be-encoded video sequence associated with the video source according to the zoom parameter information of the video source and the first video sequence;
  • sequence determination unit 102 to be encoded includes: a scaling parameter acquisition subunit 1021, a scaling processing subunit 1022, and a sequence determination subunit 1023;
  • the zoom parameter obtaining subunit 1021 is configured to obtain zoom parameter information associated with the resolution of the video source
  • the zoom processing subunit 1022 is used to perform zoom processing on the resolution of the first video sequence according to the zoom parameter information, and determine the first video sequence after the zoom processing as the second video sequence; the resolution of the first video sequence is determined by the video sequence. Determined by the resolution of the source;
  • the sequence determining subunit 1023 is used to determine the video sequence to be encoded according to the second video sequence and the resolution of the second video sequence.
  • the specific implementation of the scaling parameter acquisition subunit 1021, the scaling processing subunit 1022, and the sequence determination subunit 1023 can refer to the specific process of obtaining the to-be-encoded video sequence in the video-on-demand scenario in the embodiment corresponding to FIG. 3. Here, Do not repeat it.
  • the quality parameter determining unit 103 is configured to use the video quality parameter configured by the service server for the first video sequence as the video quality standard parameter associated with the video sequence to be encoded.
  • the video source determining unit 104 is configured to receive initial video data collected and uploaded by the user terminal, and determine the received initial video data as the video source;
  • the buffer sequence determination unit 105 is used to obtain key video frames from the video source, and determine the buffer sequence in the video source for scene detection according to the key video frame and the length of the buffer sequence associated with the key video frame, and according to the buffer
  • the scaling parameter information of the video sequence and the video source determines the video sequence to be coded for precoding
  • the video source contains M video frames associated with the acquisition period; M is a positive integer;
  • the cache sequence determination unit 105 includes: a first determination subunit 1051, a cache sequence determination subunit 1052, a scene detection determination subunit 1053, a second determination subunit 1054, and a sequence determination subunit 1055; the cache sequence determination unit 105 may further include: Sequence deletion subunit 1056, sequence completion subunit 1057, and shear detection subunit 1058;
  • the first determining subunit 1051 is configured to determine the first video frame as the first key video frame among the M video frames of the video source;
  • the buffer sequence determining subunit 1052 is configured to determine a buffered video sequence for scene detection from M video frames according to the first key video frame and the length of the buffer sequence associated with the first key video frame;
  • the scene detection and determination subunit 1053 is used to determine video frames other than the first key video frame as the to-be-detected video frame b i in the buffered video sequence, and compare the to-be-detected video in the buffered video sequence according to the first key video frame Frame b i is subjected to scene cut detection; i is a positive integer greater than 1 and less than M;
  • Second determining sub-unit 1054 configured to, when the video content is detected degree of change between the first key video frame and a video frame to be detected b i is greater than the threshold shear scene, video frame to be detected is determined to be the second key b i Video frame
  • the sequence determination subunit 1055 is configured to use the video sequence between the first key video frame and the second key video frame as the initial video sequence, perform scaling processing on the initial video sequence according to the scaling parameter information of the video source, and perform scaling processing on the initial video sequence after scaling.
  • the initial video sequence is determined to be a video sequence to be coded for precoding.
  • the sequence deletion subunit 1056 is used to delete the initial video sequence from the buffered video sequence to obtain a transitional video sequence; the first video frame of the transitional video sequence is the second key video frame;
  • the sequence fill-in subunit 1057 is used to obtain the to-be-filled video sequence from the video source according to the transition video sequence and the length of the buffer sequence, and perform the fill-in processing on the transition video sequence according to the to-be-filled video sequence; the transition after the fill-in processing
  • the sequence length of the video sequence is the same as the buffer sequence length;
  • the cut detection sub-unit 1058 is also used to determine video frames other than the second key video frame as the to-be-detected video frame dj in the transition video sequence after the complement processing, and perform the complement processing according to the second key video frame
  • the to-be-detected video frame dj in the subsequent transitional video sequence is subjected to scene cut detection; j is a positive integer greater than 1 and less than M.
  • the first determining subunit 1051 the buffer sequence determining subunit 1052, the scene detection determining subunit 1053, the second determining subunit 1054, the sequence determining subunit 1055, the sequence deleting subunit 1056, the sequence filling subunit 1057, and the switching
  • the change detection subunit 1058 reference may be made to the description of step S202 in the embodiment corresponding to FIG. 7, which will not be repeated here.
  • the quality parameter configuration unit 106 is configured to configure video quality standard parameters for the video sequence to be encoded based on the configuration information of the user terminal.
  • the video data processing device 1 may determine the to-be-encoded video sequence associated with the on-demand video data through the first sequence receiving unit 101, the to-be-encoded sequence determination unit 102, and the quality parameter determination unit 103; In a scenario, the video data processing apparatus 1 may determine the video sequence to be encoded that is associated with the live video data through the video source determination unit 104, the buffer sequence determination unit 105, and the quality parameter configuration unit 106.
  • the pre-encoding module 20 is configured to perform pre-encoding processing on the to-be-encoded video sequence according to the video quality standard parameters to obtain the pre-encoded video sequence, and determine the video feature corresponding to the to-be-encoded video sequence according to the pre-encoded video sequence;
  • the pre-encoding module 20 includes: an encoder determining unit 201, a pre-encoding sequence determining unit 202, an encoding information determining unit 203, and a video feature determining unit 204;
  • the encoder determining unit 201 is configured to obtain an initial encoder used to pre-encode the video sequence to be encoded according to the video quality standard parameter;
  • the precoding sequence determining unit 202 is configured to perform precoding processing on the to-be-coded video sequence according to the initial encoder to obtain a precoding video sequence;
  • the precoding video sequence includes key video frames and predicted video frames;
  • the encoding information determining unit 203 is configured to determine the encoding information of the pre-encoded video sequence according to the resolution of the key video frame, the predicted video frame, the pre-encoded video sequence, and the bit rate of the pre-encoded video sequence;
  • the predicted video frame includes a forward predicted frame
  • the encoding information determining unit 203 includes: a reference frame determining subunit 2031, a quantity determining subunit 2032, a capacity determining subunit 2033, a complexity determining subunit 2034, and an information determining subunit 2035;
  • the reference frame determining subunit 2031 is configured to obtain the key video frame selected when performing inter-frame compression on the forward predicted frame, and determine the selected key video frame as the reference video frame corresponding to the forward predicted frame;
  • the quantity determining subunit 2032 is configured to determine the total selected number of reference video frames as the first number, the total number of key video frames as the second number, and the total number of forward prediction frames as the third number;
  • the capacity determination subunit 2033 is used to determine the first average data capacity of the key video frame according to the data capacity and the second quantity corresponding to the key video frame, and to determine the forward prediction according to the data capacity and the third quantity corresponding to the forward prediction frame The second average data capacity of the frame;
  • the complexity determining subunit 2034 is used to obtain the maximum data capacity from the data capacity corresponding to the key video frame, and the ratio between the first average data capacity and the maximum data capacity is used as the spatial complexity of the pre-encoded video sequence, and the first 2.
  • the ratio between the average data capacity and the first average data capacity is determined as the time domain complexity of the pre-encoded video sequence;
  • the information determining subunit 2035 is used to determine the first quantity, the second quantity, the third quantity, the spatial complexity, the time domain complexity, the resolution of the pre-encoded video sequence and the bit rate of the pre-encoded video sequence as pre-encoding The encoding information of the video sequence.
  • the specific implementation of the reference frame determination subunit 2031, the quantity determination subunit 2032, the capacity determination subunit 2033, the complexity determination subunit 2034, and the information determination subunit can refer to the above description of the encoding information, which will not be repeated here. .
  • the video feature determining unit 204 is configured to determine the encoding information as the video feature corresponding to the video sequence to be encoded.
  • the specific implementation of the encoder determining unit 201, the precoding sequence determining unit 202, the encoding information determining unit 203, and the video feature determining unit 204 can refer to the description of obtaining the video features of the video sequence to be encoded in the embodiment corresponding to FIG. 3 above. , I will not repeat them here.
  • the bit rate prediction module 30 is configured to predict the coding bit rate associated with the video sequence to be coded according to the video quality standard parameters and video characteristics;
  • the rate prediction module 30 includes: a target model determination unit 301, a matching degree determination unit 302, and an encoding rate determination unit 303; the rate prediction module 30 may also include: a sample acquisition unit 304, a rate parameter acquisition unit 305, and a traversal Encoding unit 306 and model training unit 307;
  • the target model determining unit 301 is configured to obtain the target quality type corresponding to the video quality standard parameter, and use the prediction model matching the target quality type as the target prediction model in the prediction model library associated with multiple quality types;
  • the matching degree determining unit 302 is configured to input video features into a target prediction model, and output the matching degrees between the video features and multiple reference video features in the target prediction model;
  • the coding rate determining unit 303 is configured to use the reference video feature with the highest matching degree with the video feature as the target reference video feature in the matching degree, and set the sample rate information corresponding to the quality label information associated with the target reference video feature , As the encoding rate associated with the video sequence to be encoded.
  • the sample acquisition unit 304 is configured to acquire N sample video sequences associated with multiple business scenarios, use sample video features of the N sample video sequences as reference video features, and acquire multiple qualities associated with the N sample video sequences Type; N is a positive integer; a sample video feature is determined after pre-encoding a sample video sequence;
  • the code rate parameter obtaining unit 305 is configured to obtain the target quality type from multiple quality types, and obtain the sample code rate parameter associated with the target quality type; the sample code rate parameter includes K sample code rate information; K is a positive integer;
  • the traversal encoding unit 306 is configured to traversely encode each sample video sequence in the N sample video sequences according to the K sample rate parameters to obtain the quality evaluation value of each sample video sequence under the K sample rate parameters; A quality evaluation value is determined by a sample video sequence under a sample bit rate parameter;
  • the model training unit 307 is configured to use all the obtained quality evaluation values as the quality label information of the initial model associated with the target quality type, train the initial model according to the quality label information and N reference video features, and determine according to the training result A prediction model that matches the target quality type.
  • the target model determining unit 301 refers to the description of the encoding rate in the embodiment corresponding to FIG. 3, which will not be repeated here.
  • the sample acquisition unit 304 refers to the description of the bit rate parameter acquisition unit 305, the traversal coding unit 306, and the model training unit 307, refer to the description of the initial training model in the embodiment corresponding to FIG. 9 above, and will not be repeated here.
  • the video encoding module 40 is configured to perform encoding processing on the to-be-encoded video sequence according to the encoding bit rate to obtain an encoded video sequence associated with the video source.
  • the video data processing device 1 may run on the first server in the distributed server cluster; the business server is the second server in the distributed server cluster;
  • the code stream return module 50 is used to return the coded video sequence as the code stream associated with the scaling parameter information to the second server, and the second server receives all the first servers in the distributed server cluster for the same scaling parameter When the coded code stream is returned by the information, all the received coded code streams are merged according to the slice identification information associated with the video source after the slice processing.
  • the user terminal is the anchor terminal in the virtual live broadcast room, and the initial video data is the live video data collected by the anchor terminal;
  • the pull-stream request acquisition module 60 is configured to acquire the playback resolution in the pull-stream request when the pull-stream request from the viewer terminal in the virtual live broadcast room is acquired;
  • the code stream pushing module 70 is used to search for the target coded video sequence corresponding to the scaling parameter information matching the playback resolution in the coded video sequence, and push the target coded video sequence as the code stream to the viewer terminal, so that the viewer terminal After decoding the coded stream, the target coded video sequence is obtained.
  • the specific implementation of the quality parameter acquisition module 10, the pre-encoding module 20, the bit rate prediction module 30, and the video encoding module 40 can refer to the description of steps S101 to S104 in the embodiment corresponding to FIG. 3, which will not be repeated here. Go into details. Further, the code stream return module 50, the pull stream request acquisition module 60, and the code stream push module 70 can refer to the description of the coded video sequences obtained in different business scenarios in the corresponding embodiment of FIG. 7, which will not be repeated here.
  • the video data processing apparatus 1 in the embodiment of the present application can execute the video data processing method in the foregoing embodiment corresponding to FIG. 3 or FIG. 7, which will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • FIG. 11 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1000 may be the server 10a in the embodiment corresponding to FIG. 2, or the server 10b in the embodiment corresponding to FIG. 2, which is not limited here.
  • the computer device 1000 may include a processor 1001, a network interface 1004, and a memory 1005.
  • the computer device 1000 may further include a user interface 1003 and at least one communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display (Display) and a keyboard (Keyboard), and the user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1004 may be a high-speed RAM memory, a non-transitory memory (non-transitory memory) or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located far away from the aforementioned processor 1001. As shown in FIG. 11, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the user interface 1003 in the computer device 1000 may also include a display (Display) and a keyboard (Keyboard).
  • the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; the processor 1001 can be used to call the device control application program stored in the memory 1005, To achieve:
  • the computer device 1000 described in the embodiment of the present application can execute the video data processing method in the embodiment corresponding to FIG. 3 or FIG. 7 above, and can also execute the function of the video data processing apparatus 1 in the embodiment corresponding to FIG. 10 above, I won't repeat them here. In addition, the description of the beneficial effects of using the same method will not be repeated.
  • the embodiment of the present application also provides a computer storage medium, which stores the aforementioned computer program executed by the video data processing device 1, and the computer program includes program instructions.
  • the processor executes the program instructions
  • the video data processing method in the foregoing embodiment corresponding to FIG. 3 or FIG. 7 can be executed, and details are not described herein again.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开一种视频数据处理方法、装置及存储介质,其中,该方法包括:获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数;根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。

Description

一种视频数据处理方法、装置及存储介质
本申请要求于2020年1月22日提交中国专利局、申请号为202010075680.9、发明名称为“一种视频数据处理方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种视频数据处理方法、装置及存储介质。
背景技术
目前,对于一些由帧序列构成的视频数据中,可以通过同一编码器对这些视频数据进行编码处理。比如,对于具有不同视频内容的视频数据A和视频数据B而言,通常会采用默认的编码码率对这两个视频数据进行视频编码,以至于视频编码后所得到的视频数据的视频质量之间会存在较大的波动,进而难以确保视频编码的准确性。
另外,通过对具有不同数据内容的视频数据采用同一编码码率进行无差别的视频编码,将会造成不同程度的编码资源的浪费现象。比如,对一些视频内容比较单一的视频数据(例如,视频数据A)而言,可能会因为编码后的视频数据A的视频质量过高导致编码资源的浪费。
发明内容
本申请实施例提供一种视频数据处理方法、装置及存储介质,可以提高视频编码的准确性,并可以减少编码资源的浪费现象。
本申请实施例一方面提供了一种视频数据处理方法,方法包括:
获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数;
根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;
根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;
根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
本申请实施例一方面提供了一种视频数据处理装置,装置包括:
质量参数获取模块,用于获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数;
预编码模块,用于根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;
码率预测模块,用于根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;
视频编码模块,用于根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
本申请实施例一方面提供了一种计算机设备,计算机设备包括:一个或多个处理器以及一个或多个存储器;
所述一个或多个存储器用于存储程序代码,所述一个或多个处理器用于调用并执行程序代码,以使得所述计算机设备执行如本申请实施例中的视频数据处理方法。
本申请实施例一方面提供了一种非易失性计算机存储介质,所述计算机存储介质存储有计算机程序,计算机程序包括程序指令,当所述程序指令被计算机设备的处理器执行时,使得所述计算机设备执行如本申请实施例的视频数据处理方法。
本申请实施例在获取到与视频源相关联的待编码视频序列时,可以获取与待编码视频序列相关联的视频质量标准参数;其中,本申请实施例可以将视频源的每个视频片段在相应缩放参数信息下的视频序列统称为待编码视频序列。进一步的,根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;进一步的,根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;进一步的,根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。由此可见,对视频源的每个视频片段中的视频内容进行一次分析,可以快速提取出与每个视频片段(即待编码视频序列)相关的视频特征,从而可以在设定好目标质量的情况下,通过预测模型准确预测出用于对每个视频片段分别进行编码的编码码率,进而可以在设定质量指标(即设定好的视频质量标准参数)的情况下,通过预测得到不同视频片段的编码码率,可以在特定的视频质量下提高视频编码的准确性,并可以减少编码资源的浪费。
附图简要说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构的结构示意图;
图2是本申请实施例提供的一种进行分布式转码的场景示意图;
图3是本申请实施例提供的一种视频数据处理方法的流程示意图;
图4是本申请实施例提供的一种在视频点播场景下获取待编码视频序列的场景示意图;
图5是本申请实施例提供的一种用于对不同视频片段进行编码的编码码率的场景示意图;
图6是本申请实施例提供的对不同视频片段进行编码所得到的视频质量的场景示意图;
图7是本申请实施例提供的一种视频数据处理方法的示意图;
图8是本申请实施例提供的一种在视频直播场景下获取待编码视频序列的场景示意图;
图9是本申请实施例提供的一种获取编码码流的整体流程示意图;
图10是本申请实施例提供的一种视频数据处理装置的结构示意图;
图11是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参见图1,是本申请实施例提供的一种网络架构的结构示意图。如图1所示,网络架构可以包括服务器集群和用户终端集群,用户终端集群可以包括多个用户终端,如图1所示,具体可以包括用户终端3000a、用户终端3000b、用户终端3000c、…、用户终端3000n。如图1所示,用户终端3000a、用户终端3000b、用户终端3000c、…、用户终端3000n可以分别与服务器集群中的任意一服务器进行网络连接,以便于每个用户终端可以通过该网络连接与相应服务器之间进行数据交互。
为便于理解,本申请实施例可以在图1所示的多个用户终端中选择一个用户终端作为目标用户终端,该目标用户终端可以包括:智能手机、平板电脑、笔记本电脑、智能电视、智能手表、桌上型电脑等携带视频数据采集功能(例如,视频数据录制功能等)的智能终端。例如,本申请实施例可以将图1所示的用户终端3000a称之为目标用户终端,该目标用户终端中可以集成有具备视频数据录制功能的目标客户端。其中,集成在该目标用户终端中的目标客户端可以包括即时通讯客户端(例如,微信客户端、QQ客户端)、多媒体客户端(例如,视频播放客户端)、娱乐客户端(例如,游戏客户端)、虚拟房间客户端(例如,直播客户端)等具有帧序列(例如,视频数据)加载和录制功能的客户端。
其中,本申请实施例可以将该目标用户终端通过浏览器页面或者目标客户端所采集到的视频数据统称为初始视频数据(例如,上述初始视频数据A),并可以进一步通过上述网络连接将这个初始视频数据A上传给上述服务器集群中的任意一个服务器。其中,本申请实施例可以将该服务器集群中的任意一个服务器统称为业务服务器。为便于理解,本申请实施例以与该目标用户终端相连接的业务服务器为图1所示的服务器20d为例,以阐述在该服务器20d中对不同业务场景下的视频源进行多路转码的具体过程。
其中,在上述视频点播场景下,由该目标用户终端上传至业务服务器的初始视频数据可以为点播视频数据(即初始视频数据可以为一段完整的视频数据)。在上述视频直播场景下,由该目标用户终端上传至业务服务器的初始视频数据可以为持续性录制到的直播视频数据,这里不对获取初始视频数据的业务场景进行限制。
其中,在上述视频点播场景下,服务器20d在接收到上述初始视频数据A时,可以将该初始视频数据A存储至业务数据库。本申请实施例中的服务器20d还可以在获取到上述用户终端集群中的其他用户终端所上传的完整的视频数据(例如,初始视频数据B)时,将初始视频数据B添加至初始视频A所在的第一转码数据库中进行存储,以便于后续可以对这些初始视频数据中的每个初始视频数据进行切片处理,以得到每个初始视频数据的视频片段。本申请实施例可以将存储在第一转码数据库中的每个初始视频数据称之为视频源。另外,在视频点播场景下,本申请实施例可以将对任意一个视频源进行切片处理后所得到的若干个视频片段(例如,视频片段1、视频片段2、...、视频片段n)统称为第一视频序列。
进一步的,在视频点播场景下,为了提高对切片处理后的每个视频片段(即每个第一 视频序列)在不同分辨率上进行多路转码的转码效率,本申请实施例可以将在上述服务器20d中进行切片处理后所得到的每个第一视频序列,同步分发给处于同一分布式网络中的其他服务器,以使其他服务器中的每个服务器均可以通过具备视频数据处理功能的视频数据处理装置2000(未在上述图1示出),进一步对接收到的相应的视频片段(即第一视频序列)进行多路转码,以快速得到与第一视频序列相关联的多路转码码流。
其中,为便于理解,本申请实施例可以将用于获取初始视频数据(即视频源)的业务服务器统称为第二服务器,在分布式转码***中,该第二服务器可以为分布式集群中的分布式服务器(例如,上述图1的服务器集群中的服务器20d)。另外,本申请实施例可以将与该第二服务器处于同一分布式网络中的其他服务器(即其他业务服务器)统称为第一服务器,在视频点播场景下,该第一服务器可以接收由第二服务器对视频源进行切分处理后所得到的第一视频序列。
第二服务器在获取到视频源时,可以快速识别出视频源所属的业务场景是直播业务对应的视频直播场景,还是点播业务对应的视频点播场景。对于不同业务场景下的视频源而言,业务服务器会选用不同的转码方式。
比如,若第二服务器识别出获取到的初始视频数据为点播视频数据,则可以确定该点播视频数据所属的业务场景为视频点播场景,此时,该第二服务器可以根据切片分割规则(比如,第一切分规则)直接对获取到的视频源执行切片处理操作。比如,该第二服务器可以根据时间长度或者镜头内容等第一切片分割规则,将视频源切分成多个视频片段,并将这些视频片段分发给分布式集群中的第一服务器。本申请实施例可以将每个第一服务器所接收到的由第二服务器所分发的视频片段统称为第一视频序列。
第一服务器在接收到第一视频序列之后,可以根据与视频源相关联的缩放参数信息,对该第一视频序列的分辨率进行缩放处理,并将缩放处理后的第一视频序列称之为第二视频序列。本申请实施例中的缩放参数信息可以包含一个分辨率或者多个分辨率,这里不对缩放参数信息的具体数量进行限制。其中,多个分辨率可以具体包含1080p、720p、540p、270p等分辨率。
其中,本申请实施例可以将第一视频序列在一个分辨率上的转码操作称之为一路转码,而将这个第一视频序列在上述多个分辨率上的转码操作称之为多路转码。在分布式转码***中,该分布式集群中的每个第一服务器均可以根据与视频源相关联的多个缩放参数信息对获取到的第一视频序列进行多路转码,以得到与每个缩放参数信息相关联的转码码流。其中,一个缩放参数信息即为一个分辨率。进一步的,本申请实施例可以将与每个分辨率相关联的转码码流统称为编码码流(即编码视频序列)。其中,一个编码码流是由与一个分辨率(即缩放参数信息)相关联的预测模型所预测得到的编码码率对待编码视频序列进行编码处理后所得到的。其中,一个待编码视频序列为第一服务器根据一个缩放参数信息对第一视频序列的分辨率进行缩放处理后所得到的。
其中,与该服务器20d处于同一分布式网络中的其他服务器具体可以包含上述服务器20a、服务器20b、...、服务器20c。此时,本申请实施例可以将与该服务器20d(即第二服务器)处于同一分布式网络中的其他服务器(服务器20a、服务器20b、...、服务器20c)统称为第一服务器。这些第一服务器中可以运行有上述视频数据处理装置2000,这样,这些接收到第一视频序列的第一服务器,可以通过该视频数据处理装置2000分别对接收到的第一视频序列进行多路转码,以快速得到与该第一视频序列相关联的多路转码码流。
在视频点播场景下,本申请实施例通过提供与不同分辨率相关联的多路转码码流,可以在获取到与第二服务器相关联的其他用户终端(例如,上述图1所示的用户终端3000n)的点播请求时,根据该用户终端3000n所支持的分辨率,快速查找到与该用户终端3000n的分辨率相匹配的编码码流(例如,编码码流1),并可以将查找到的编码码流1下发给用户终端3000n,进而可以在用户终端3000n中对解码处理后的编码码流1进行视频播放时,有效地确保播放的视频数据的质量,从而可以提高视频播放的效果。
在本申请实施例中,若上述业务服务器(例如,上述服务器20d)识别出获取到的初始视频数据为直播视频数据,则服务器20d可以确定初始视频数据所属的业务场景为视频直播场景,此时,上述服务器20d难以对持续性接收到的初始视频数据(即视频源)直接进行切片处理。因此,为了提高多路转码的效率,本申请实施例可以在视频直播场景下,通过缓存器对接收到的初始视频数据(即视频源)中的部分直播视频数据进行缓存处理,其中可以将缓存得到的特定序列长度的视频序列统称为缓存视频序列。进而,上述服务器20d可以在一边获取视频源的同时,一边对特定序列长度的缓存视频序列进行场景切变检测,以从缓存视频序列中准确定位到需要进行视频编码的待编码视频序列,进而可以确保视频直播场景下的视频编码的准确度。
其中,本申请实施例可以将在缓存器中动态更新的缓存视频序列的首个视频帧统称为关键视频帧。本申请实施例可以将在该缓存器中动态更新的缓存视频序列的特定序列长度统称为缓存序列长度(例如,50帧)。进一步的,该服务器20d(即业务服务器)可以对具有该缓存序列长度的缓存视频序列进行场景切变检测,以在该缓存视频序列中找到存在场景切变的另一关键视频帧。其中,可以将这两个关键视频帧之间的视频序列称之为待编码视频序列,以通过运行在该服务器20d中的视频数据处理装置2000对该视频直播场景下的待编码视频序列执行上述视频数据处理功能,以在该服务器20d中持续性地输出多路转码码流。
为便于理解,本申请实施例以获取到的初始视频数据为点播视频数据为例,阐述在分布式转码***中对该点播视频数据(例如,视频A)的多个视频片段进行分布式转码的具体过程。在视频点播场景下,第二服务器所获取到的初始视频数据可以为用户终端直播上传的完整的视频数据,还可以为从业务数据库(例如,第一转码数据库)中所获取到的点播视频数据,这里不对获取点播视频数据的具体方式进行限定。在该分布式转码***中,业务服务器(即第二服务器)可以将切片处理后的多个视频片段中的一个视频片段分发给一个转码服务器(即第一服务器),以在一个转码服务器中进行多路转码。
进一步的,请参见图2,是本申请实施例提供的一种进行分布式转码的场景示意图。如图2所示的服务器10a可以为上述业务服务器,在分布式转码***中,该业务服务器可以为上述视频点播场景下的第二服务器。如图2所示,该第二服务器(即图2所示的服务器10a)在从图2所示的业务数据库中获取到视频A时,可以将该视频A作为视频源。由于该视频源为视频点播场景下的点播视频数据,则该服务器10a可以直接根据上述第一切片分割规则对该视频源进行切片处理,以得到图2所示的多个视频片段(即多个第一视频序列)。如图2所示,多个第一视频序列具体可以包含视频片段100a、视频片段200a、视频片段300a。
在对这些第一视频序列(即视频片段)进行多路转码之前,该服务器10a还可以为每个视频片段配置相应的质量类型以及质量类型对应的视频质量参数。其中,本申请实施例 中的质量类型可以包含以下至少一个:第一质量类型、第二质量类型、第三质量类型和第四质量类型。其中,这四种质量类型均可以用于对相应视频片段中的视频图像的视频质量进行评估,且本申请实施例可以将评估所得到的分数统称为相应质量类型下的视频质量参数。
其中,第一质量类型可以为VMAF(Video Multi-Method Assessment Fusion,视频质量多方法融合)类型。其中,第二服务器(即上述业务服务器)为第一视频序列所配置的VMAF类型下的视频质量标准参数(这里指在VMAF类型下所设定的质量评估值)可以为0~100的取值范围中的任意一个,例如,VMAF 90。VMAF类型下所设定的质量评估值的取值越大,表示最终所输出的编码码流的视频质量越好。
其中,第二质量类型可以为SSIM(Structural Similarity,结构相似性指标)类型。其中,第二服务器(即上述业务服务器)为第一视频序列所配置的SSIM类型下的视频质量标准参数(这里指在SSIM类型下所设定的质量评估值)可以为0~1的取值范围中的任意一个,例如,SSIM 0.987。SSIM类型下所设定的质量评估值的取值越大,表示最终所输出的编码码流的视频质量越好。
其中,第三质量类型可以为PNSR(Peak Signal to Noise Ratio,峰值信噪比)类型。其中,第二服务器(即上述业务服务器)为第一视频序列所配置的PNSR类型下的视频质量标准参数(这里指在PNSR类型下所设定的质量评估值)可以为0~100的取值范围中的任意一个,例如,PNSR 40。
其中,第四质量类型可以为MOS(Mean Opinion Score,平均意见分)类型。其中,第二服务器(即上述业务服务器)为第一视频序列所配置的MOS类型下的视频质量标准参数(这里指在MOS类型下所设定的质量评估值)可以为1~5的取值范围中的任意一个,例如,MOS 4。
本申请实施例中的服务器10a(即第二服务器)在将图2所示的每个视频片段分发给相应的第一服务器时,可以为每个视频片段分别配置一个质量类型,本申请实施例可以将为每个视频片段所配置的质量类型统称为目标质量类型。例如,服务器10a为视频片段100a所配置的目标质量类型可以为VMAF类型,该服务器10a为视频片段200a所配置的目标质量类型可以为SSIM类型,该服务器10a为视频片段300a所配置的目标质量类型可以为PSNR类型。在本申请实施例中,该服务器10a也可以为切片处理后所得到的每个视频片段配置同一质量类型,例如,可以为图2所示的视频片段100a、视频片段200a、视频片段300a配置上述多个质量类型中的任意一个质量类型。
其中,本申请实施例可以将服务器10a为某个视频片段所配置的某个质量类型下的某个视频质量参数,作为与这个视频片段相关联的待编码视频序列的视频质量标准参数,进而可以在目标质量类型的视频质量标准参数下,通过该目标质量类型所对应的预测模型预测用于对该待编码视频序列进行编码处理的编码码率。
为便于理解,本申请实施例以该服务器10a为图2所示的多个视频片段(即视频片段100a、视频片段200a、视频片段300a)所配置的目标质量类型下的视频质量参数为VMAF90为例。如图2所示,当第二服务器(例如,图2所示的服务器10a)将视频源切分成图2所示的三个视频片段时,该服务器10a可以从该服务器10a所在的分布式集群中选取3个服务器(例如,图2所示的服务器10b、服务器10c、服务器10d)作为第一服务器,以将图2所示的3个视频片段分别分发给这3个第一服务器进行多路转码,以在分布式转码 ***中提高对视频源中的每个视频片段进行多路转码的效率。
为便于理解,本申请实施例以该服务器10a将切片处理后的视频片段100a分发给图2所示的服务器10b为例,阐述在该服务器10a中进行多路转码的具体过程。在服务器10b(即第一服务器)获取到服务器10a(即第二服务器)所分发的第一视频序列(例如,视频片段100a)时,可以同步获取到该服务器10a为该第一视频序列所配置的VMAF类型的视频质量参数(即上述VMAF 90),并可以将该VMAF 90作为待编码视频序列的视频质量标准参数。
其中,该服务器10b可以在获取到第一视频序列(即视频片段100a)时,根据视频源相关联的缩放参数信息(例如1080p、720p、540p、270p)对该视频片段100a的分辨率(例如,540p)进行缩放处理,其中可以将缩放处理后的第一视频序列称之为第二视频序列。其中,第二视频序列的个数与缩放参数信息的个数相同,即第二视频序列的分辨率可以具体包含1080p、720p、540p、270p。为便于理解,本申请实施例可以以第二视频序列的分辨率为720p为例,并将具有目标分辨率(即720p)的第二视频序列作为待编码视频序列,进而阐述在该服务器10b中对与目标分辨率(即720p)相关联的待编码视频序列进行一路转码的具体过程。
换言之,本申请实施例可以将与目标分辨率相关联的第二视频序列统称为待编码视频序列,对该待编码视频序列进行一次完整的预编码处理,并将预编码过程中所保存的编码信息称之为待编码视频序列对应的视频特征。进一步的,该服务器10b还可以在预测模型库中找到与该VMAF类型相匹配的预测模型(例如,预测模型1),以通过该预测模型1对待编码视频序列在特定视频质量下的编码码率进行预测,进而可以根据预测得到的编码码率对该待编码视频序列进行编码处理,以得到图2所示的编码码流100b。
其中,对于该视频片段100a在其他分辨率(例如,1080p等分辨率)上进行转码的具体过程,可以参见该视频片段在目标分辨率(即720p)上进行一路转码的具体过程,这里不再进行赘述。另外,对于图2所示的其他视频片段(例如,视频片段100b、视频片段300a)进行多路转码的具体过程,可以参见对上述视频片段100a进行多路转码的具体过程,这里不再进行赘述。
本申请实施例中的每个质量类型均可以对应一个已经训练好的预测模型,通过该预测模型可以预测得到待编码视频序列在特定分辨率以及特定视频质量下的编码码率,进而可以基于预测得到的编码码率对待编码视频序列进行编码处理,以得到与相应分辨率相关联的编码码流。
其中,第一服务器获取待编码视频序列、通过预编码得到待编码视频序列的视频特征、以及基于视频特征预测编码码率的具体实现方式可以参见如下图3至图9所对应的实施例。
进一步地,请参见图3,是本申请实施例提供的一种视频数据处理方法的流程示意图。如图3所示,该方法可以由具备视频数据处理功能的视频数据处理装置执行,该方法至少可以包括步骤S101-步骤S104:
步骤S101,获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数。
具体的,在视频点播场景下,视频数据处理装置可以接收该视频点播场景下的业务服务器所分发的视频源中的第一视频序列;其中,第一视频序列可以是该业务服务器对获取到的视频源进行切片处理后所确定的;进一步的,视频数据处理装置可以根据视频源的缩 放参数信息以及第一视频序列,确定与视频源相关联的待编码视频序列;进一步的,视频数据处理装置可以将业务服务器为第一视频序列所配置的视频质量参数,作为与待编码视频序列相关联的视频质量标准参数。
其中,本申请实施例中的视频数据处理装置可以运行在上述第一服务器中,该第一服务器可以为上述图2所对应实施例中的服务器10b。第一服务器可以为分布式转码***中的分布式服务器。在该分布式转码***中,业务服务器可以为与该第一服务器处于同一分布式网络中的其他分布式服务器(即第二服务器)。本申请实施例中的第二服务器可以根据切片分割规则准确地对获取到的视频源进行切片处理,以将获取到的视频源切分为多个视频片段。本申请实施例可以将每个视频片段统称为第一视频序列,为了提高对切片处理后的这些视频片段(即这些第一视频序列)进行多路转码的效率,本申请实施例还可以进一步将这些第一视频序列分发给与该第二服务器相关联的第一服务器,以使运行在第一服务器中的视频数据处理装置(例如,上述视频数据处理装置2000)对获取到的第一视频序列进行多路转码,以确保对每个视频片段进行视频编码的准确度。另外,通过在这些第一服务器中同步对每个视频片段的待编码视频序列进行多路转码,还可以提高对视频源进行多路转码的效率。其中,第二服务器将视频源切分为多个视频片段的具体过程可以参见上述图2所对应实施例中对视频点播场景下的第一切片切分规则的描述,这里不再进行赘述。
其中,在分布式转码***中,业务服务器(即第二服务器)每天可以接收到用户终端通过浏览器网页或者目标客户端所上传的大量的视频,这些视频中可以包含视频点播场景下的视频数据1(即上述点播视频数据),还可以包含视频直播场景下的视频数据2(即上述直播视频数据)。本申请实施例可以将该业务服务器所接收到的这些视频数据1和视频数据2统称为上述初始视频数据,即一个初始视频数据可以为一个视频源。
其中,业务服务器(即分布式服务器中的第二服务器)在确定获取到的初始视频数据为点播视频数据时,可以直接将获取到的点播视频数据作为视频源进行切片处理,并将切片处理后的视频片段分发给与该业务服务器处于同一分布式网络中的其他业务服务器(即第一服务器)。每个第一业务服务器在获取到由第二服务器所分发的第一视频序列时,可以根据上述视频源(即点播视频数据)的缩放参数信息对第一视频序列进行缩放处理,并将缩放处理后的第一视频序列确定为第二视频序列。第二视频序列的个数与视频源的缩放参数信息的个数相同。所以,在视频点播场景下,第一服务器所获取到的待编码视频序列的个数决定了后续需要进行视频转码的路数。
其中,为便于理解,本申请实施例以分布式转码***中的一个业务服务器为例,阐述在运行有上述视频数据处理装置的第一服务器中对获取到的视频片段进行多路转码的具体过程。其中,该运行有视频数据处理装置的第一服务器可以为上述图2所对应实施例中的服务器10c。其中,该第一服务器(即服务器10c)所获取到的视频片段可以为上述图2所对应实施例中的服务器10a(即第二服务器)所分发的视频片段300a。进一步的,请参见图4,是本申请实施例提供的一种在视频点播场景下获取待编码视频序列的场景示意图。其中,第一服务器(即服务器10d)获取到的视频片段可以为前述视频片段300a。在视频点播场景下,本申请实施例可以将由上述第二服务器所分发的视频源中的视频片段300a统称为第一视频序列。该第一服务器在获取到该视频片段300a(即第一视频序列)之后,可以根据与视频源的分辨率相关联的缩放参数信息,对该视频片段300a(即第一视频序列)的分辨率进行缩放处理。如图4所示,该第一服务器可以将该第一视频序列的分辨率(例 如,540p)缩放至图4所示的多个分辨率上。这里的多个分辨率具体可以为分辨率1、分辨率2、分辨率3、分辨率4。其中,分辨率1可以为上述1080p,分辨率2可以为上述720p,分辨率3可以为上述540p,分辨率4可以为上述270p。与视频源的分辨率相关联的缩放参数信息可以为上述多个分辨率,即一个缩放参数信息可以对应一个分辨率。多个分辨率中的任意两个分辨率所对应的编码码流之间可以相互进行切换。比如,在视频点播场景下的用户终端中,上述第二服务器可以根据使用该用户终端的点播用户的码流切换请求,快速找到并下发具有相同视频内容的同一视频片段在不同分辨率上的编码码流,进而可以在确保视频播放质量的情况下,快速实现相应编码码流之间的切换,以提高编码效率,减少播放延迟。
其中,在视频点播场景下,本申请实施例可以将根据一个缩放参数信息(即一个分辨率)所得到的一个第二视频序列,作为一路与视频源相关联的待编码视频序列,从而可以在特定质量类型的视频质量标准参数下,对每个待编码视频序列执行下述步骤S102-步骤S104。
其中,本申请实施例可以将同一视频片段在不同分辨率下的视频序列称之为第二视频序列。第二视频序列可以包含将视频片段300a缩放到分辨率1时所得到的视频序列1a、将视频片段300a缩放到分辨率2时所得到的视频序列2a、将视频片段300a缩放到分辨率3时所得到的视频序列3a,将视频片段300a缩放到分辨率4时所得到的视频序列4a。本申请实施例可以将第二视频序列中的视频序列1a、视频序列2a、视频序列2a、视频序列4a统称为待编码视频序列,并在该第一服务器(即上述服务器10d)中对这些待编码视频序列进行多路转码。这里的多路转码具体是指与上述4个分辨率相关联的4路转码,以便于后续可以通过下述步骤S102-步骤S104得到同一视频片段在不同分辨率下的转码码流。这些转码码流具体可以包含图4所示的与分辨率1相关联的编码序列1d、与分辨率2相关联的编码序列2d、与分辨率3相关联的编码序列3d、与分辨率4相关联的编码序列4d。
其中,业务服务器可以对获取到的初始视频数据(即视频源)所属的业务场景进行判断,进而可以根据判断出的初始视频数据所属的业务场景,判断是否能够直接对获取到视频源进行切片处理。这里的业务场景可以包含上述视频点播场景,还可以包含视频直播场景。
其中,在视频直播场景下,业务服务器(即上述第二服务器)可以得到由用户终端(这里指可以进行图像采集的终端设备,比如,主播终端)周期性采集并发送的初始视频数据,此时,该业务服务器所获取到的初始视频数据可以为直播视频数据。
业务服务器在确定获取到的初始视频数据为直播视频数据时,可以持续性地将接收到的初始视频数据(即直播视频数据)确定为视频源,进而可以采用第二切片分割规则(例如,场景检测规则)对缓存器中持续性更新的缓存视频序列进行场景切变检测,以从当前缓存器中所缓存的缓存视频序列中找到从一个场景(例如,场景1)切变至另一个场景(例如,场景2)的场景切变帧。其中,本申请实施例可以将该场景切变帧在当前缓存视频序列中的序列号称之为场景切变点,进而可以根据找到的场景切变点将当前缓存视频划分为多个场景。在本申请实施例中的每个场景均可以对应一个关键视频帧,本申请实施例可以将任意两个场景之间的视频序列称之为待传输给编码器的待编码视频序列。在本申请实施例中,业务服务器(即上述第二服务器)通过第二切片分割规则所确定出的与视频源相关联的待编码视频序列中可以包含一个关键视频帧。此时,该业务服务器(即第二服务器) 可以通过上述视频数据处理装置对待编码视频序列执行下述步骤S102-步骤S104。
由此可见,在视频直播场景下,运行有视频数据处理装置的第二服务器可以根据上述第二切片分割规则直接对这些持续性获取到的待编码视频序列进行转码处理,从而可以得到与待编码视频序列相关联的待编码视频序列(即编码码流)。该视频直播场景下的编码码流可以被该业务服务器(即第二服务器)持续性地分发给与该主播终端处于同一虚拟直播间中的其他用户终端(例如,观众终端),以确保其他用户终端可以通过相应的解码器对持续性获取到的编码码流进行解码处理,从而在其他用户终端中同步播放由主播终端所采集到的直播视频数据。
其中,用于对该视频直播场景下的视频源进行多路转码的业务服务器可以为上述分布式转码***中的任意一个分布式服务器,这里不对用于获取直播视频数据的分布式服务器进行限制。
为便于理解,本申请实施例以上述业务服务器所获取到的视频源为视频点播场景下的点播视频数据为例,阐述在运行有上述视频数据处理装置中的第一服务器中,对与视频源相关联的待编码视频序列进行多路转码的具体过程。
步骤S102,根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征。
具体的,视频数据处理装置根据视频质量标准参数,获取用于对待编码视频序列进行预编码的初始编码器;进一步的,视频数据处理装置可以根据初始编码器对待编码视频序列进行预编码处理,得到预编码视频序列;其中,预编码视频序列中可以包含关键视频帧和预测视频帧;进一步的,视频数据处理装置可以根据关键视频帧、预测视频帧、预编码视频序列的分辨率以及预编码视频序列的码率,确定预编码视频序列的编码信息;进一步的,视频数据处理装置可以将编码信息确定为待编码视频序列对应的视频特征。
为便于理解,本申请实施例以将图4所示的视频片段(例如,上述视频片段300a)的分辨率缩放至分辨率1为例,阐述通过图4所示的视频序列1a(即待编码视频序列)进行一路转码的具体过程。如图4所示,运行有该视频数据处理装置的第一服务器可以在获取到待编码视频序列(即图4所示的视频序列1a)时,同步获取上述第二服务器为该视频片段300a所配置的视频质量参数,并可以将第二服务器为该视频片段300a所配置的视频质量参数,作为该待编码视频序列的视频质量标准参数(例如,上述VMAF 90)。
该第一服务器可以根据该视频质量标准参数(例如,上述VMAF 90),获取用户对视频序列1a进行预编码的初始编码器,并可以根据初始编码器对待编码视频序列进行预编码处理,得到预编码视频序列;其中,预编码视频序列中可以包含关键视频帧和预测视频帧;一个预编码视频序列中可以包含一个关键视频帧和至少一个预测视频帧。进一步的,第一服务器可以根据关键视频帧、预测视频帧、预编码视频序列的分辨率以及预编码视频序列的码率,快速确定预编码视频序列的编码信息,并可以将编码信息确定为待编码视频序列对应的视频特征。如上述图4所示,该第一服务器保存对视频序列1a(即待编码视频序列)进行预编码处理过程中所得到的编码信息,并可以将保存的预编码视频序列的编码信息作为该视频序列1a的视频特征。其中,该视频序列1a的视频特征可以为上述图4所示的视频特征1b。
其中,在预编码视频序列中包含前向预测帧(即P帧)时,该第一服务器获取编码视频序列的编码信息的具体过程可以为:第一服务器可以获取对前向预测帧进行帧间压缩时 所选取的关键视频帧,并可以将选取的关键视频帧确定为前向预测帧对应的参考视频帧;进一步的,第一服务器可以将参考视频帧的总选取数量确定为第一数量,可以将关键视频帧的总数量确定为第二数量,并可以将前向预测帧的总数量确定为第三数量;进一步的,第一服务器可以根据关键视频帧对应的数据容量和第二数量,确定关键视频帧的第一平均数据容量,根据前向预测帧对应的数据容量和第三数量,确定前向预测帧的第二平均数据容量;进一步的,第一服务器可以从关键视频帧对应的数据容量中获取最大数据容量,将第一平均数据容量与最大数据容量之间的比值,确定为预编码视频序列的空域复杂度,将第二平均数据容量与第一平均数据容量之间的比值,确定为预编码视频序列的时域复杂度;进一步的,第一服务器可以将第一数量、第二数量、第三数量、空域复杂度、时域复杂度以及预编码视频序列的分辨率和预编码视频序列的码率,确定为预编码视频序列的编码信息。
其中,在视频点播场景下,运行有上述视频数据处理装置的第一服务器可以对上述图4所示的视频序列1a(即待编码视频序列)进行一次完整的预编码,并在对该视频序列1a进行预编码处理的过程中,保存与该视频序列1a相关联的预编码视频序列的编码信息。其中,在对视频序列1a进行预编码处理的过程中采用不同的压缩方式,就可以得到不同类别的编码视频帧,比如,可以通过帧内编码得到I帧(即Intra coded frames,帧内编码帧),还可以通过帧间编码得到P帧(Predicted frames,前向预测帧)和B帧(Bi-directional predicted frames,双向预测帧)。其中,本申请实施例可以将帧内编码所得到的I帧统称为上述关键视频帧,并可以将P帧或者B帧统称为上述预测视频帧。
其中,本申请实施例可以利用视频序列1a的单个视频帧内的空间相关性编码输出I帧。即在进行帧内压缩的过程中,可以无需考虑时间上的相关性,也不用考虑运动补偿。此外,编码所得到I帧还可以作为后续进行视频解码时的基准帧。I帧图像可以周期性地出现在该视频序列1a中,且出现频率可以由初始编码器的***周期来确定。根据***周期可以确定与该待编码视频序列(即视频序列1a)相关联的帧组,一个帧组可以视为一个场景。
其中,P帧(即P帧图像)和B帧(即B帧图像)可以采用帧间编码方式进行帧间压缩,即可以同时利用空间和时间上的相关性。比如,P帧图像可以采用前向时间预测,以提高压缩效率和图像质量。P帧图像中的每一个宏块可以是根据与P帧最接近的I帧(这里的I帧可以视为参考视频帧)进行前向预测后所得到的。其中,B帧图像是通过双向时间预测所得到的,即B帧图像可以将与B帧最接近的I帧图像或最接近的P帧图像作为进行双向预测的另一种参考视频帧。比如,B帧图像可以采用未来帧(即在B帧图像之后的且与B帧相邻最近的已编码的P帧或者I帧)作为参考视频帧。因此,在通过初始编码器对待编码视频序列中的视频帧进行预编码处理的过程中,显示在每个帧组中的编码视频帧的传输顺序和显示顺序是不同的。比如,在视频序列1a对应的预编码视频序列中,编码视频帧的显示顺序(即编码顺序)可以是:I B B P。但是考虑到前述预测帧P帧在进行解码的过程中需要依赖于I帧,且双向预测帧(即B帧)在解码的过程中需要在解码B帧时知道P帧和I帧的信息,所以,这几帧在预编码视频序列中的解码顺序可能是:I P B B。所以,通过对该视频序列1a进行预编码,可以快速统计得到预编码视频序列的编码信息。其中,与该视频序列1a(待编码视频序列)相关联的预编码视频序列的编码信息可以包含预编码视频序列的关键编码信息、预编码视频序列的空域复杂度、预编码视频序列的时域复杂度等。其中,预编码视频序列的关键编码信息具体可以包含预编码视频序列的分辨率、 码率、关键视频帧的个数、预测视频帧的个数、参考帧的个数等。
其中,该预编码视频序列的分辨率可以为上述分辨率1。其中,该预编码视频序列的码率可以为预编码过程中直接统计到的码率。其中,该视频序列1a中可以包含多个场景、每个场景可以对应一个关键视频帧和至少一个预测视频帧。这里的至少一个预测视频帧可以为P帧(即前向预测帧)。其中,本申请实施例可以将在对前向预测帧(即P帧)进行帧间编码时所采用的关键视频帧统称为参考视频帧。本申请实施例在进行预编码的过程中,每使用一次关键视频帧,则可以将参考视频帧的个数进行加一处理,进而可以将预编码完成时所最终统计到的参考视频帧的总选取数量确定为第一数量。此外,本申请实施例也可以将在预编码过程中所统计到的关键视频帧的个数(即关键视频帧的总数量)统称为第二数量,并可以将在预编码过程中所统计到的前向预测帧的个数(即前向预测帧的总数量)统称为第三数量。
进一步的,该第一服务器还可以通过下述公式(1)计算该预编码视频序列的空域复杂度:
空域复杂度=I帧平均大小/最大I帧大小           公式(1)。
其中,I帧平均大小是由该第一服务器所获取到的每个关键视频帧对应的数据容量(例如,100kB、90kB等)以及统计到的I帧的总数量所确定的。其中,本申请实施例可以通过每个关键视频帧对应的数据容量、和该第一服务器所统计到的关键视频帧的总数量(即上述第二数量),确定这些关键视频帧的第一平均数据容量,并将该第一平均数据容量统称为上述I帧平均大小。另外,本申请实施例还可以从这些关键视频帧对应的数据容量中找到具有最大数据容量的关键视频帧,并可以将找到的具有最大数据容量的关键视频帧称为最大I帧,该最大I帧大小即为这些关键视频帧对应的数据容量中的最大数据容量。所以,本申请实施例可以根据上述公式(1)将第一平均数据容量与最大数据容量之间的比值,作为预编码视频序列的空域复杂度。
进一步的,该第一服务器还可以通过下述公式(2)计算该预编码视频序列的时域复杂度:
时域复杂度=P帧平均大小/I帧平均大小           公式(2)。
其中,P帧平均大小是指该第一服务器所获取到的每个前向预测帧对应的数据容量(例如,20kB、15kB等)。其中,本申请实施例可以通过每个前向预测帧对应的数据容量和该第一服务器所统计到的前向预测帧的总数量(即上述第三数量),确定这些前向预测帧的第二平均数据容量,本申请实施例可以将该第二平均数据容量统称为上述P帧平均大小。如上述公式(2)所示,本申请实施例可以将第二平均数据容量与第一平均数据容量之间的比值,作为预编码视频序列的时域复杂度。
该第一服务器对上述图4所示的视频序列2a、视频序列3a以及视频序列4a进行预编码处理的具体过程,可以参见对上述对视频序列1a进行预编码处理的具体过程,这里不再进行赘述。
步骤S103,根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率。
具体的,视频数据处理装置可以获取视频质量标准参数(例如,上述VMAF 90)对应的目标质量类型(即VMAF类型),并可以在与多个质量类型相关联的预测模型库中,将与目标质量类型相匹配的预测模型作为目标预测模型;进一步的,视频数据处理装置可以 将视频特征输入目标预测模型,输出视频特征分别与目标预测模型中的多个参考视频特征之间的匹配度;进一步的,视频数据处理装置可以在匹配度中将与视频特征具有最高匹配度的参考视频特征作为目标参考视频特征,进而可以将与目标参考视频特征相关联的质量标签信息所对应的样本码率信息,作为与待编码视频序列相关联的编码码率。
其中,在分布式转码***中,运行有上述视频数据处理装置的第一服务器可以在获取到待编码视频序列(例如,上述视频序列1a)的视频特征1b之后,将视频特征1b输入与上述VMAF类型相匹配的目标预测模型。此时,该目标预测模型可以根据设定的具体的质量指标(即上述视频质量标准参数),预测出用于对待编码视频序列进行编码处理的编码码率,以进一步执行下述步骤S104。
对于上述图4所示的具有相同视频内容的同一视频片段在不同分辨率下的待编码视频序列而言,可以通过同一目标预测模型预测得到与每个待编码视频序列相关联的编码码率。比如,视频序列1a的编码码率可以为上述图4所示的编码码率1c、视频序列2a的编码码率可以为上述图4所示的编码码率2c、视频序列3a的编码码率可以为上述图4所示的编码码率3c、视频序列4a的编码码率可以为上述图4所示的编码码率4c。
步骤S104,根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
如上述图4所示,本申请实施例在对不同分辨率下的待编码视频序列进行转码处理的过程中,可以通过同一目标预测模型,对具有相同视频内容的同一视频片段在不同分辨率下的编码码率进行预测,进而可以根据预测得到的多个编码码率分别上述图4所示的多个待编码视频序列进行编码处理,以输出与相应分辨率相关联的编码视频序列。编码视频序列具体可以包含图4所示的编码序列1d、编码序列2d、编码序列3d、编码序列4d。本申请实施例可以将与每个分辨率相关联的编码视频序列统称为一路编码码流。
本申请实施例中的目标质量类型可以为上述多种质量类型中的任意一种质量类型,每个质量类型可以对应一个预测模型,这些预测模型均可以存储在该分布式转码***的预测模型库中。所以,运行有上述视频数据处理装置的第一服务器在获取到上述设定质量指标下的目标评估值时,可以将该设定质量指标下的目标评估值统称为上述视频质量标准参数,以便于该第一服务器后续可以根据设定的目标质量类型的视频质量标准参数来直接调整所需输出的编码序列的输出质量。换言之,对于分布式转码***中的每个第一服务器而言,当这些第一服务器中的每个第一服务器所获取到的视频片段不相同时,可以在确保视频质量相对一致的情况下,尽可能合理地为每个第一服务器找到用于对相应视频片段进行编码处理的编码码率,进而可以解决对这些视频片段采用同一编码码率进行无差别编码所造成的带宽浪费的问题,减少编码码率的浪费现象,进而可以达到节省带宽的目的。
进一步的,请参见图5,是本申请实施例提供的一种用于对不同视频片段进行编码的编码码率的场景示意图。其中,本申请实施例中的视频源可以包含图5所示的多个视频片段,这里的多个视频片段具体可以包含图5所示的视频片段1、视频片段2、视频片段3、...、视频片段25。如图5所示的曲线11可以用于表征在固定编码码率(例如,4M)对这25个视频片段进行无差别视频编码的示意图。另外,曲线21可以用于表征在将这25个视频片段的视频质量标准参数配置为VMAF 90时,通过上述目标预测模型所预测得到的用于对不同视频片段分别进行编码的编码码率的示意图。本申请实施例通过训练好的目标预测模型可以准确预测得到不同视频片段在同一视频质量指标下的编码码率。这些视频片段可 以在同一业务服务器(比如,一个第一服务器)中进行多路转码处理,也可以在不同业务服务器(比如,多个第一服务器)中分别进行多路转码处理,这里不对进行多路转码处理的第一服务器的数量进行具体限制。
在上述视频点播场景下,第二服务器可以按照上述第一切片分割规则将视频源划分为多个视频片段,即第二服务器可以在对图5所示的这25个视频片段所对应的视频源进行切片处理的过程中,根据这个视频源的视频内容特征(例如,场景信息、图像信息和编码信息等),将视频源划分成图5所示的多个视频片段。其中,场景信息具体可以包含这个视频源所包含的场景类别信息、近/远景信息、相机运行信息、显著区域信息等。其中,图像信息可以包含这个视频源的纹理细节特征、噪声类型特征、色彩特征、颜色对比度特征等。其中,编码信息可以包含上述预编码视频序列的关键编码信息(比如,分辨率信息以及参考帧的个数信息等)、上述空域复杂度以及上述时域复杂度信息等。当采用上述第一切片分割规则对视频点播场景下的视频源的视频内容进行一次分析之后,可以准确得到具有不同的视频内容的视频片段。
其中,为便于理解,本申请实施例以将切片处理后的一个视频片段(即上述第一视频序列)分发给一个第一服务器为例。其中,上述第二服务器在为图5所示的这25个视频片段配置视频质量参数时可以配置同一质量类型下的同一视频质量标准参数,并可以将这25个视频片段分发给与该第二服务器处于同一分布式网络中的25个第一服务器,以在这些第一服务器中实现分布式转码,进而可以提高在不同第一服务器中进行多路转码的效率。当一个视频片段对应一个转码服务器(即一个第一服务器)时,图5所示的每个视频片段的编码码率可以是由分布式服务器集群中的这些第一服务器通过上述目标预测模型进行预测后所得到的。所以,对于具有不同视频内容的视频片段而言,由于这些视频片段的视频内容特征通常是不相同的,以至于由这些第一服务器中的目标预测模型所预测得到的编码码率可以是不相同的,具体可以参见图5所示的多个视频片段与相应编码码率之间的拟合曲线图。
进一步的,请参见图6,是本申请实施例提供的对不同视频片段进行编码所得到的视频质量的场景示意图。如图6所示的视频片段可以为上述图5所对应实施例中的25个视频片段的编码序列。如图6所示的曲线22为采用预测到的不同编码码率分别对这25个视频片段进行编码处理之后所得到的编码序列的视频质量情况,即通过不同编码码率分别对这25个视频片段进行编码处理之后可以确保这25个视频片段的视频质量的波动幅度比较稳定。通过对比图6所示的曲线12和曲线22可知,通过预测出的不同编码码率对相应视频片段进行编码所得到视频质量(即曲线22所表征的视频质量),相对于之前采用固定编码码率无差别地对这25个视频片段进行编码所得到的视频质量(即曲线12所表征的视频质量),可以有效地确保视频质量的稳定变化,进而可以改善后续输出至上述点播终端中的这些视频片段的播放效果,即不会存在视频质量的剧烈波动。
运行有上述数据处理装置的第一服务器可以在获取到与上述视频片段300a(即第一视频序列)相关联的多个编码码流之后,将这些编码码流统称为编码视频序列,进而可以将编码视频序列作为与缩放参数信息相关联的编码码流返回给第二服务器,以使第二服务器在接收到分布式服务器集群中的所有第一服务器针对同一缩放参数信息所返回的编码码流时,根据与切片处理后的视频源相关联的切片标识信息对接收到的所有编码码流进行合并处理。比如,第二服务器在获取到由上述图2所对应实施例中的多个第一服务器所返回 的多路转码码流之后,可以根据上述图4所示的多个分辨率中的某个目标分辨率(例如,上述分辨率2),分别对上述服务器10b所返回的编码码流100b、上述服务器10c所返回的编码码流200b、上述服务器10d所返回的编码码流300b进行合并处理,以得到与上述分辨率2相关联的合并码流,进而可以在点播终端请求播放该分辨率2对应的合并码流的情况下,将该合并码流下发给点播终端进行播放处理。
其中,在视频点播场景下,业务服务器所对应的业务数据库中可以存放已完成多路转码的视频文件(每个视频文件均为与视频源相关联的编码码流),点播终端可通过目标客户端或者浏览器网页接入上述第二服务器,以从与该第二服务器相关联的业务数据库获取与该点播终端所请求的视频数据相匹配的编码码流。这样,该点播终端后续在获取到编码码流时,可以通过该点播终端所支持的解码器对编码码流进行解码处理,以在该点播终端中对解码处理后的视频数据进行播放处理。本申请实施例通过设定上述视频质量标准参数,可以确保在点播终端中所输出的视频数据的质量,进而可以提高视频数据的播放效果。
本申请实施例在获取到与视频源相关联的待编码视频序列时,可以获取与待编码视频序列相关联的视频质量标准参数;其中,本申请实施例可以将视频源的每个视频片段在相应缩放参数信息下的视频序列统称为待编码视频序列。进一步的,根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;进一步的,根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;进一步的,根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
由此可见,对视频源的每个视频片段中的视频内容进行一次分析,可以快速提取出与每个视频片段(即待编码视频序列)相关的视频特征,从而可以在设定好目标质量的情况下,通过预测模型准确预测出用于对每个视频片段分别进行编码的编码码率,进而可以在设定质量指标(即设定好的视频质量标准参数)的情况下,通过预测得到的不同视频片段的编码码率,在特定的视频质量下提高视频编码的准确性,并可以减少编码资源的浪费。
进一步地,请参见图7,是本申请实施例提供的一种视频数据处理方法的示意图。如图7所示,该方法可以由具备视频数据处理功能的视频数据处理装置执行,该方法可以包含以下步骤S201-步骤S208。
步骤S201,接收用户终端采集并上传的初始视频数据,将接收到的初始视频数据确定为视频源。
具体的,视频数据处理装置可以在获取到初始视频数据时,判断该初始视频数据是否为直播视频数据,若确定为是,则可以持续性地将获取到的初始视频数据作为视频源,在业务服务器(例如,上述运行有视频数据处理装置的第二服务器)中直接对获取到的视频源(即直播视频数据)进行多路转码处理,以在该业务服务器中持续性的输出编码码流。
步骤S202,从视频源中获取关键视频帧,根据关键视频帧和与关键视频帧相关联的缓存序列长度,在视频源中确定用于进行场景检测的缓存视频序列,根据缓存视频序列和视频源的缩放参数信息,确定用于进行预编码的待编码视频序列。
其中,具体的,视频源包含与采集周期相关联的M个视频帧,其中M为正整数;视频数据处理装置可以在视频源的M个视频帧中,将首个视频帧确定为第一关键视频帧;进一步的,视频数据处理装置可以根据第一关键视频帧和与第一关键视频帧相关联的缓存序列长度,从M个视频帧中确定用于进行场景检测的缓存视频序列;进一步的,视频数据处 理装置可以在缓存视频序列中将除第一关键视频帧之外的视频帧确定为待检测视频帧b i,根据第一关键视频帧对缓存视频序列中的待检测视频帧b i进行场景切变检测,其中i为大于1且小于M的正整数;进一步的,视频数据处理装置可以在检测到第一关键视频帧与待检测视频帧b i之间的视频内容变化度大于场景切变阈值时,将待检测视频帧b i确定为第二关键视频帧;进一步的,视频数据处理装置可以将第一关键视频帧与第二关键视频帧之间的视频序列作为初始视频序列,根据视频源的缩放参数信息对初始视频序列进行缩放处理,将缩放处理后的初始视频序列确定为用于进行预编码的待编码视频序列。
进一步的,请参见图8,是本申请实施例提供的一种在视频直播场景下获取待编码视频序列的场景示意图。在视频直播场景下,图8所示的用户终端40可以为主播用户(即图8所示的用户A)所对应的主播终端,该主播终端在采集周期内所采集到的初始视频数据可以为图8所示的视频直播数据。如图8所示的视频直播数据可以包含M个视频帧(即可以包含视频帧1a、视频帧1b、...视频帧1m),其中,M可以为60。其中,本申请实施例可以将图8所示的第1时刻到第m时刻所构成的采集时长统称为采集周期。在图8所示的用户终端40与业务服务器30之间存在网络连接关系时,可以将主播终端在周期时长内所采集到的直播视频数据作为视频源,持续性地上传给业务服务器30。业务服务器30可以根据上述视频直播场景下的第二切片分割规则对获取到的与视频源相关联的缓存视频序列进行场景检测。
其中,图8所示的业务服务器30可以为上述第二服务器,该第二服务器在确定获取到视频直播数据时,可以将持续性获取到的视频直播数据统称为视频源,并将视频源中的每个视频帧依次给到缓存器。其中,该缓存器的缓存视频序列长度可以为50帧。如图8所示,业务服务器30中的缓存器在获取到视频源中的首个视频帧时,可以将其确定为第一关键视频帧,该第一关键视频帧可以为图8所示的缓存视频序列2中的关键视频帧10a。该关键视频帧10a可以为上述图8所示的视频帧1a。该业务服务器30在确定出关键视频帧10a时,可以从该关键视频帧10a开始,缓存特定序列长度的视频帧构成图8所示的缓存视频序列2。其中,本申请实施例可以将该缓存器的特定序列长度(例如,L(比如,50)个视频帧所构成的帧长度)统称为缓存序列长度。L可以为小于或者等于M的正整数。
进一步的,业务服务器30从视频源中确定出用于进行场景检测的缓存视频序列2时,可以在该缓存视频序列2中将除第一关键视频帧(即关键视频帧10a)之外的剩余的每个视频帧作为待检测视频帧,并可以将当前用于与第一关键视频帧的视频内容进行比对的待检测视频帧记为待检测视频帧b i。其中,i可以为大于1且小于M的正整数。例如,缓存视频序列2中的视频帧可以包含:关键视频帧10a、待检测视频帧b 2、待检测视频帧b 3、...、待检测视频帧b L。该业务服务器可以通过判断第一关键视频帧与待检测视频帧b i之间的视频内容变化度是否大于场景切变阈值,来判断当前的待检测视频帧b i是否为场景切变帧。
若该业务服务器30确定图8所示的关键视频帧10a与待检测视频帧b 5(此时,i=5)之间的视频内容变化度大于场景切变阈值,则可以将该待检测视频帧b 5确定为第二关键视频帧,这里的第二关键视频帧即为上述用于表征从一个场景切变至另一个场景的场景切变帧。该业务服务器在检测到缓存视频序列2中存在多个场景时,可以将第一关键视频帧与第二关键视频帧之间的视频序列称之为图8的初始视频序列400a,该初始视频序列400a中可以包含从关键视频帧10a到待检测视频帧b 4所构成的视频序列。进一步的,该业务服务器可以根据视频源的缩放参数信息(即上述1080p、720p等)对初始视频序列400a进行 缩放处理,并可以将缩放处理后的初始视频序列400a确定为用于进行预编码的待编码视频序列400b,从而可以根据下述步骤S203-步骤S206,得到与该待编码视频序列400b相关联的编码码率1,进而可以根据编码码率1对待编码视频序列400b进行编码处理,以得到图8所示的编码视频序列400c。
当缓存器将初始视频序列400a给到上述初始编码器进行预编码之后,可以在缓存器中删除该初始视频序列400a,并可以将删除初始视频序列400a之后的缓存视频序列称之为图8所示的过渡视频序列3。该过渡视频序列3的首个视频帧可以为上述第二关键视频帧(即图8所示的关键视频帧20a可以为上述待检测视频帧b 5)。换言之,该过渡视频序列3中的视频帧可以为在上述缓存视频序列2中去除初始视频序列400a后所剩余的待检测视频帧(即待检测视频帧b 5、...、待检测视频帧b L)。由于在视频直播场景下,业务服务器会不断的获取到主播终端在采集周期内所采集并上传的视频直播数据,为便于进一步对缓存器中所缓存的特定视频长度的缓存视频序列进行场景检测,本申请实施例还可以根据过渡视频序列3和缓存序列长度,从视频源中确定出图8所示的与初始视频序列400a具有相同帧数的待补齐视频序列4。这样,在将该待补齐视频序列4给到缓存器之后,可以通过待补齐视频序列4对过渡视频序列3进行补齐处理,以进一步确保缓存器中所缓存的缓存视频序列3可以与缓存视频序列2具有相同的缓存序列长度。其中,缓存视频序列3可以为上述补齐处理后的过渡视频序列。本申请实施例在将关键视频帧20a作为第二关键视频帧时,可以将在缓存视频序列3中除第二关键视频帧之外的视频帧确定为新的待检测视频帧(即待检测视频帧d j),以便于能够继续根据该第二关键视频帧对该缓存视频序列3中的这些新的待检测视频帧进行场景检测。其中,j可以为大于1且小于或者等于上述L的正整数。
其中,业务服务器30对缓存器中动态更新的缓存视频序列(即图8所示的缓存视频序列3)中的待检测视频帧d j进行场景检测的具体实现方式,可以参见对上述缓存视频序列2中的待检测视频帧b i进行场景检测的描述,这里不再进行赘述。
步骤S203,基于用户终端的配置信息,为待编码视频序列配置视频质量标准参数;
步骤S204,根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;
步骤S205,根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;
步骤S206,根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列;
步骤S207,在获取到虚拟直播间中的观众终端的拉流请求时,获取拉流请求中的播放分辨率;
步骤S208,在编码视频序列中查找与播放分辨率相匹配的缩放参数信息所对应的目标编码视频序列,将目标编码视频序列作为编码码流推送给观众终端,以使观众终端对编码码流进行解码处理后得到目标编码视频序列。
本申请实施例所描述的步骤S201-步骤S208的应用场景可以包含上述视频直播场景,还可以包含上述视频点播场景。步骤S207所述的拉流是指服务器已有直播内容,客户端用指定地址进行拉取的过程,因此观众终端的拉流请求即为观众终端拉取直播视频内容的请求。
进一步的,请参见图9,是本申请实施例提供的一种获取编码码流的整体流程示意图。如图9所示的步骤S1-步骤S5可以应用于上述分布式转码***中的任意一个业务服务器。其中,步骤S1指出业务服务器在获取到视频片段之后,可以通过固定的编码参数(例如,设定上述目标质量类型以及该目标质量类型下的视频质量标准参数)将视频片段缩放至不同分辨率之后,将缩放处理后所得到的编码视频序列执行步骤S2中的预编码,进而可以在对待编码视频序列进行预编码的过程中,将统计到的预编码所产生的编码信息作为待编码视频序列的视频特征。如图9所示的步骤S4表明,业务服务器可以通过与上述目标质量类型相匹配的预测模型预测用于对待编码视频序列进行编码的编码码率,进而可以根据预测得到的编码码率执行步骤S5,以得到与图9所示的视频片段相关联的多个编码码率。
其中,在对与多个质量类型对应的预测模型进行训练的过程中,可以大致包含样本视频序列的选取、样本视频特征的提取、质量标签信息的提取以及预测模型的训练。具体的,分布式转码***中的第一服务器可以在获取到与多个业务场景相关联的N个样本视频序列时,将N个样本视频序列的样本视频特征作为参考视频特征,并可以获取与N个样本视频序列相关联的多个质量类型,其中N为正整数;一个样本视频特征是对一个样本视频序列进行预编码处理后所确定的;进一步的,第一服务器可以从多个质量类型中获取目标质量类型,获取与目标质量类型相关联的样本码率参数;样本码率参数包含K个样本码率信息,其中K为正整数;进一步的,第一服务器可以根据K个样本码率参数对N个样本视频序列中的每个样本视频序列进行遍历编码,得到每个样本视频序列在K个样本码率参数下的质量评估值;其中,一个质量评估值是一个样本视频序列在一个样本码率参数下所确定的;进一步的,第一服务器可以将得到的所有质量评估值作为与目标质量类型相关联的初始模型的质量标签信息,根据质量标签信息和N个参考视频特征,对初始模型进行训练,根据训练结果确定与目标质量类型相匹配的预测模型。
在样本视频序列的选取过程中,需要确保所选取的样本能够尽量覆盖实际业务类型中所有业务场景,以保证后续训练得到的预测模型的普遍性。实际业务类型可以包括新闻、动漫、综艺、游戏、电影电视等,业务场景上可以包含一些复杂画面、简单画面,运动剧烈镜头和静止镜头等场景信息。其中,样本视频序列的规模一般可以在1万个视频片段左右。
其中,在样本视频特征的特征提取过程中,可以参照对上述图9所对应实施例中的提取视频片段的视频特征的具体过程,这里不再进行赘述。
在样本视频序列的质量标签信息的提取过程中,可以对每个质量类型训练一个预测模型。由于本申请实施例的目的是要通过最终训练得到的预测模型预测视频片段在目标质量类型的目标指标(即上述视频质量标准参数)下的码率参数,因此,本申请实施例在质量标签信息提取的过程中,需要通过对样本视频序列进行遍历编码的方式,得到样本视频序列在特定码率参数(例如,在[0-51]的所***率点)下的质量评估值,进而可以建立同一样本视频序列在不同质量类型的所有编码参数下的质量评估值(这里是指质量评估分数)。比如,对于样本码率参数为第一类编码参数(例如,crf编码参数)而言,可以将量化步长(即间隔)设置为1,从而可以得到某个样本视频序列在特定质量类型下的码率值对应表,即一个码率值可以对应一个质量评估值,且这些得到的质量评估值均可以作为这个样本视频序列在相应样本码率参数下的质量标签信息。在本申请实施例中,对于样本码率参数为第二类编码参数(例如,bitrate编码参数)而言,则可以在10kbps~5Mbps的码率范围内 设置10kbps的量化步长来产生一个质量标签信息。
其中,在通过上述样本视频特征对与目标质量类型相关联的初始模型进行模型训练的过程中,所采用的初始模型为多层神经网络模型,当将提取到的样本视频特征出入该多层神经网络模型之后,可以输出每个样本视频特征在指定质量指标下的码率值对应表。
其中,在上述两个应用场景下所采用的切片分割规则会存在些许的不同。比如,业务服务器在获取到点播视频数据时,可以直接采用上述第一切片分割规则将获取到的视频源切分为若干个视频片段。但是,该业务服务器在获取到直播视频数据时,无法直接采用上述第一切片分割规则将视频源切分为若干个视频片段,故而需要采用上述第二切片分割规则先从视频源中获取到具有特定序列长度的缓存视频序列,再对缓存视频序列进行场景检测,进而可以根据场景检测结果从缓存视频序列确定用于进行预编码的待编码视频序列。
本申请实施例中通过对视频源的每个视频片段中的视频内容进行一次分析,可以快速提取出与每个视频片段(即待编码视频序列)相关的视频特征,从而可以在设定好目标质量的情况下,通过预测模型准确预测出用于对每个视频片段进行编码的编码码率,进而可以在设定质量指标(即设定好的视频质量标准参数)的情况下,通过预测得到不同视频片段的编码码率,可以在特定的视频质量下提高视频编码的准确性,并可以减少编码资源的浪费。
进一步地,请参见图10,是本申请实施例提供的一种视频数据处理装置的结构示意图。该视频数据处理装置1可以运行在上述图2所对应实施例中的服务器10a(即上述第二服务器)中,以在视频直播场景下,通过该服务器10a对获取到的待编码视频序列进行多路转码。在本申请实施例中,该视频数据处理装置1也可以运行在上述图2所对应实施例中的服务器10b(即第一服务器)中,以在视频点播场景下,通过该服务器10b对获取到的待编码视频序列进行多路转码。进一步地,该视频数据处理装置1可以包括:质量参数获取模块10、预编码模块20、码率预测模块30和视频编码模块40;进一步的,该视频数据处理装置1还可以包括:编码码流返回模块50、拉流请求获取模块60和码流推送模块70。
质量参数获取模块10,用于获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数;
其中,质量参数获取模块10包括:第一序列接收单元101,待编码序列确定单元102,质量参数确定单元103;视频数据处理装置1还可以包含:视频源确定单元104、缓存序列确定单元105和质量参数配置单元106;
第一序列接收单元101,用于接收业务服务器分发的视频源中的第一视频序列;第一视频序列是业务服务器对视频源进行切片处理后所确定的;
待编码序列确定单元102,用于根据视频源的缩放参数信息以及第一视频序列,确定与视频源相关联的待编码视频序列;
其中,待编码序列确定单元102包括:缩放参数获取子单元1021、缩放处理子单元1022和序列确定子单元1023;
缩放参数获取子单元1021,用于获取与视频源的分辨率相关联的缩放参数信息;
缩放处理子单元1022,用于根据缩放参数信息对第一视频序列的分辨率进行缩放处理,将缩放处理后的第一视频序列确定为第二视频序列;第一视频序列的分辨率是由视频源的分辨率所确定的;
序列确定子单元1023,用于根据第二视频序列以及第二视频序列的分辨率,确定待编 码视频序列。
其中,缩放参数获取子单元1021、缩放处理子单元1022和序列确定子单元1023的具体实现方式可以参见上述图3所对应实施例中对获取视频点播场景下的待编码视频序列的具体过程,这里不再进行赘述。
质量参数确定单元103,用于将业务服务器为第一视频序列所配置的视频质量参数,作为与待编码视频序列相关联的视频质量标准参数。
视频源确定单元104,用于接收用户终端采集并上传的初始视频数据,将接收到的初始视频数据确定为视频源;
缓存序列确定单元105,用于从视频源中获取关键视频帧,根据关键视频帧和与关键视频帧相关联的缓存序列长度,在视频源中确定用于进行场景检测的缓存视频序列,根据缓存视频序列和视频源的缩放参数信息,确定用于进行预编码的待编码视频序列;
其中,视频源包含与采集周期相关联的M个视频帧;M为正整数;
缓存序列确定单元105包括:第一确定子单元1051、缓存序列确定子单元1052、场景检测确定子单元1053、第二确定子单元1054和序列确定子单元1055;缓存序列确定单元105可以进一步包括:序列删除子单元1056、序列补齐子单元1057和切变检测子单元1058;
第一确定子单元1051,用于在视频源的M个视频帧中,将首个视频帧确定为第一关键视频帧;
缓存序列确定子单元1052,用于根据第一关键视频帧和与第一关键视频帧相关联的缓存序列长度,从M个视频帧中确定用于进行场景检测的缓存视频序列;
场景检测确定子单元1053,用于在缓存视频序列中将除第一关键视频帧之外的视频帧确定为待检测视频帧b i,根据第一关键视频帧对缓存视频序列中的待检测视频帧b i进行场景切变检测;i为大于1且小于M的正整数;
第二确定子单元1054,用于在检测到第一关键视频帧与待检测视频帧b i之间的视频内容变化度大于场景切变阈值时,将待检测视频帧b i确定为第二关键视频帧;
序列确定子单元1055,用于将第一关键视频帧与第二关键视频帧之间的视频序列作为初始视频序列,根据视频源的缩放参数信息对初始视频序列进行缩放处理,将缩放处理后的初始视频序列确定为用于进行预编码的待编码视频序列。
序列删除子单元1056,用于在缓存视频序列中删除初始视频序列,得到过渡视频序列;过渡视频序列的首个视频帧为第二关键视频帧;
序列补齐子单元1057,用于根据过渡视频序列和缓存序列长度,从视频源中获取待补齐视频序列,根据待补齐视频序列对过渡视频序列进行补齐处理;补齐处理后的过渡视频序列的序列长度与缓存序列长度相同;
切变检测子单元1058,还用于在补齐处理后的过渡视频序列中将除第二关键视频帧之外的视频帧确定为待检测视频帧dj,根据第二关键视频帧对补齐处理后的过渡视频序列中的待检测视频帧dj进行场景切变检测;j为大于1且小于M的正整数。
其中,第一确定子单元1051、缓存序列确定子单元1052、场景检测确定子单元1053、第二确定子单元1054、序列确定子单元1055、序列删除子单元1056、序列补齐子单元1057和切变检测子单元1058的具体实现方式可以参见上述图7所对应实施例中对步骤S202的描述,这里不再进行赘述。
质量参数配置单元106,用于基于用户终端的配置信息,为待编码视频序列配置视频质量标准参数。
其中,在视频点播场景下,视频数据处理装置1可以通过第一序列接收单元101、待编码序列确定单元102、质量参数确定单元103确定与点播视频数据相关联的待编码视频序列;在视频直播场景下,视频数据处理装置1可以通过视频源确定单元104、缓存序列确定单元105和质量参数配置单元106确定与直播视频数据相关联的待编码视频序列。
预编码模块20,用于根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;
其中,预编码模块20包括:编码器确定单元201、预编码序列确定单元202、编码信息确定单元203和视频特征确定单元204;
编码器确定单元201,用于根据视频质量标准参数,获取用于对待编码视频序列进行预编码的初始编码器;
预编码序列确定单元202,用于根据初始编码器对待编码视频序列进行预编码处理,得到预编码视频序列;预编码视频序列中包含关键视频帧和预测视频帧;
编码信息确定单元203,用于根据关键视频帧、预测视频帧、预编码视频序列的分辨率以及预编码视频序列的码率,确定预编码视频序列的编码信息;
其中,预测视频帧中包含前向预测帧;
编码信息确定单元203包括:参考帧确定子单元2031、数量确定子单元2032、容量确定子单元2033、复杂度确定子单元2034和信息确定子单元2035;
参考帧确定子单元2031,用于获取对前向预测帧进行帧间压缩时所选取的关键视频帧,将选取到的关键视频帧确定为前向预测帧对应的参考视频帧;
数量确定子单元2032,用于将参考视频帧的总选取数量确定为第一数量,将关键视频帧的总数量确定为第二数量,将前向预测帧的总数量确定为第三数量;
容量确定子单元2033,用于根据关键视频帧对应的数据容量和第二数量,确定关键视频帧的第一平均数据容量,根据前向预测帧对应的数据容量和第三数量,确定前向预测帧的第二平均数据容量;
复杂度确定子单元2034,用于从关键视频帧对应的数据容量中获取最大数据容量,将第一平均数据容量与最大数据容量之间的比值,作为预编码视频序列的空域复杂度,将第二平均数据容量与第一平均数据容量之间的比值,确定为预编码视频序列的时域复杂度;
信息确定子单元2035,用于将第一数量、第二数量、第三数量、空域复杂度、时域复杂度以及预编码视频序列的分辨率和预编码视频序列的码率,确定为预编码视频序列的编码信息。
其中,参考帧确定子单元2031、数量确定子单元2032、容量确定子单元2033、复杂度确定子单元2034和信息确定子单元的具体实现方式可以参见上述对编码信息的描述,这里不再进行赘述。
视频特征确定单元204,用于将编码信息确定为待编码视频序列对应的视频特征。
其中,编码器确定单元201、预编码序列确定单元202、编码信息确定单元203和视频特征确定单元204的具体实现方式可以参见上述图3所对应实施例中获取待编码视频序列的视频特征的描述,这里将不再进行赘述。
码率预测模块30,用于根据视频质量标准参数以及视频特征,预测与待编码视频序列 相关联的编码码率;
其中,码率预测模块30包括:目标模型确定单元301、匹配度确定单元302和编码码率确定单元303;码率预测模块30还可以包含:样本获取单元304、码率参数获取单元305、遍历编码单元306和模型训练单元307;
目标模型确定单元301,用于获取视频质量标准参数对应的目标质量类型,在与多个质量类型相关联的预测模型库中,将与目标质量类型相匹配的预测模型作为目标预测模型;
匹配度确定单元302,用于将视频特征输入目标预测模型,输出视频特征分别与目标预测模型中的多个参考视频特征之间的匹配度;
编码码率确定单元303,用于在匹配度中将与视频特征具有最高匹配度的参考视频特征作为目标参考视频特征,将与目标参考视频特征相关联的质量标签信息所对应的样本码率信息,作为与待编码视频序列相关联的编码码率。
样本获取单元304,用于获取与多个业务场景相关联的N个样本视频序列,将N个样本视频序列的样本视频特征作为参考视频特征,获取与N个样本视频序列相关联的多个质量类型;N为正整数;一个样本视频特征是对一个样本视频序列进行预编码处理后所确定的;
码率参数获取单元305,用于从多个质量类型中获取目标质量类型,获取与目标质量类型相关联的样本码率参数;样本码率参数包含K个样本码率信息;K为正整数;
遍历编码单元306,用于根据K个样本码率参数对N个样本视频序列中的每个样本视频序列进行遍历编码,得到每个样本视频序列在K个样本码率参数下的质量评估值;一个质量评估值是一个样本视频序列在一个样本码率参数下所确定的;
模型训练单元307,用于将得到的所有质量评估值作为与目标质量类型相关联的初始模型的质量标签信息,根据质量标签信息和N个参考视频特征,对初始模型进行训练,根据训练结果确定与目标质量类型相匹配的预测模型。
其中,目标模型确定单元301、匹配度确定单元302和编码码率确定单元303的具体实现方式可以参见上述图3所对应实施例中对编码码率的描述,这里不再进行赘述。样本获取单元304、码率参数获取单元305、遍历编码单元306和模型训练单元307的具体实现方式可以参见上述图9所对应实施例中对初始训练模型的描述,这里不再进行赘述。
视频编码模块40,用于根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
其中,视频数据处理装置1可以运行于分布式服务器集群中的第一服务器;业务服务器为分布式服务器集群中的第二服务器;
编码码流返回模块50,用于将编码视频序列作为与缩放参数信息相关联的编码码流返回给第二服务器,第二服务器在接收到分布式服务器集群中的所有第一服务器针对同一缩放参数信息所返回的编码码流时,根据与切片处理后的视频源相关联的切片标识信息对接收到的所有编码码流进行合并处理。
其中,用户终端为虚拟直播间中的主播终端,初始视频数据为主播终端所采集到的直播视频数据;
拉流请求获取模块60,用于在获取到虚拟直播间中的观众终端的拉流请求时,获取拉流请求中的播放分辨率;
码流推送模块70,用于在编码视频序列中查找与播放分辨率相匹配的缩放参数信息所 对应的目标编码视频序列,将目标编码视频序列作为编码码流推送给观众终端,以使观众终端对编码码流进行解码处理后得到目标编码视频序列。
其中,质量参数获取模块10、预编码模块20、码率预测模块30和视频编码模块40的具体实现方式可以参见上述图3所对应实施例中对步骤S101-步骤S104的描述,这里不再进行赘述。进一步的,编码码流返回模块50、拉流请求获取模块60和码流推送模块70可以参见上述图7所对应实施例中对不同业务场景下得到编码视频序列的描述,这里不再进行赘述。
本申请实施例中的视频数据处理装置1可执行前文图3或图7所对应实施例中的视频数据处理方法,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
进一步地,请参见图11,是本申请实施例提供的一种计算机设备的结构示意图。如图11所示,该计算机设备1000可以为上述图2所对应实施例中的服务器10a,也可以为上述图2所对应实施例中的服务器10b,这里不对其进行限定。该计算机设备1000可以包括:处理器1001、网络接口1004和存储器1005,此外,该计算机设备1000还可以包括:用户接口1003和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)和键盘(Keyboard),用户接口1003还可以包括标准的有线接口和无线接口。网络接口1004可以包括标准的有线接口和无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器,也可以是非易失性存储器(non-transitory memory)或非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图11所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及设备控制应用程序。
其中,该计算机设备1000中的用户接口1003还可以包括显示屏(Display)和键盘(Keyboard)。在图11所示的计算机设备1000中,网络接口1004可提供网络通讯功能;用户接口1003主要用于为用户提供输入的接口;处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取与视频源相关联的待编码视频序列,获取与待编码视频序列相关联的视频质量标准参数;
根据视频质量标准参数对待编码视频序列进行预编码处理,得到预编码视频序列,根据预编码视频序列确定待编码视频序列对应的视频特征;
根据视频质量标准参数以及视频特征,预测与待编码视频序列相关联的编码码率;
根据编码码率对待编码视频序列进行编码处理,得到与视频源相关联的编码视频序列。
本申请实施例中所描述的计算机设备1000可执行前文图3或图7所对应实施例中的视频数据处理方法,也可执行前文图10所对应实施例中的视频数据处理装置1的功能,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,本申请实施例还提供了一种计算机存储介质,该计算机存储介质中存储有前文提及的视频数据处理装置1所执行的计算机程序,该计算机程序包括程序指令,当处理器执行程序指令时,能够执行前文图3或图7所对应实施例中的视频数据处理方法,这里不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过 计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (15)

  1. 一种视频数据处理方法,由服务器执行,包括:
    获取与视频源相关联的待编码视频序列,获取与所述待编码视频序列相关联的视频质量标准参数;
    根据所述视频质量标准参数对所述待编码视频序列进行预编码处理,得到预编码视频序列,根据所述预编码视频序列确定所述待编码视频序列对应的视频特征;
    根据所述视频质量标准参数以及所述视频特征,预测与所述待编码视频序列相关联的编码码率;
    根据所述编码码率对所述待编码视频序列进行编码处理,得到与所述视频源相关联的编码视频序列。
  2. 根据权利要求1所述的方法,其中,所述获取与视频源相关联的待编码视频序列,获取与所述待编码视频序列相关联的视频质量标准参数,包括:
    接收业务服务器分发的视频源中的第一视频序列;所述第一视频序列是所述业务服务器对所述视频源进行切片处理后所确定的;
    根据所述视频源的缩放参数信息以及所述第一视频序列,确定与所述视频源相关联的待编码视频序列;
    将所述业务服务器为所述第一视频序列所配置的视频质量参数,作为与所述待编码视频序列相关联的视频质量标准参数。
  3. 根据权利要求2所述的方法,其中,所述根据所述视频源的缩放参数信息以及所述第一视频序列,确定与所述视频源相关联的待编码视频序列,包括:
    获取与所述视频源的分辨率相关联的缩放参数信息;
    根据所述缩放参数信息对所述第一视频序列的分辨率进行缩放处理,将缩放处理后的第一视频序列确定为第二视频序列;所述第一视频序列的分辨率是由所述视频源的分辨率所确定的;
    根据所述第二视频序列以及所述第二视频序列的分辨率,确定待编码视频序列。
  4. 根据权利要求2所述的方法,其中,所述根据所述视频质量标准参数对所述待编码视频序列进行预编码处理,得到预编码视频序列,根据所述预编码视频序列确定所述待编码视频序列对应的视频特征,包括:
    根据所述视频质量标准参数,获取用于对所述待编码视频序列进行预编码的初始编码器;
    根据所述初始编码器对所述待编码视频序列进行预编码处理,得到预编码视频序列;所述预编码视频序列中包含关键视频帧和预测视频帧;
    根据所述关键视频帧、所述预测视频帧、所述预编码视频序列的分辨率以及所述预编码视频序列的码率,确定所述预编码视频序列的编码信息;
    将所述编码信息确定为所述待编码视频序列对应的视频特征。
  5. 根据权利要求4所述的方法,其中,所述预测视频帧中包含前向预测帧;
    所述根据所述关键视频帧、所述预测视频帧、所述预编码视频序列的分辨率以及所述预编码视频序列的码率,确定所述预编码视频序列的编码信息,包括:
    获取对所述前向预测帧进行帧间压缩时所选取的关键视频帧,将选取到的关键视频帧确定为所述前向预测帧对应的参考视频帧;
    将所述参考视频帧的总选取数量确定为第一数量,将所述关键视频帧的总数量确定为第二数量,将所述前向预测帧的总数量确定为第三数量;
    根据所述关键视频帧对应的数据容量和所述第二数量,确定所述关键视频帧的第一平均数据容量,根据所述前向预测帧对应的数据容量和所述第三数量,确定所述前向预测帧的第二平均数据容量;
    从所述关键视频帧对应的数据容量中获取最大数据容量,将所述第一平均数据容量与所述最大数据容量之间的比值,作为所述预编码视频序列的空域复杂度,将所述第二平均数据容量与所述第一平均数据容量之间的比值,确定为所述预编码视频序列的时域复杂度;
    将所述第一数量、所述第二数量、所述第三数量、所述空域复杂度、所述时域复杂度以及所述预编码视频序列的分辨率和所述预编码视频序列的码率,确定为所述预编码视频序列的编码信息。
  6. 根据权利要求2所述的方法,其中,所述根据所述视频质量标准参数以及所述视频特征,预测与所述待编码视频序列相关联的编码码率,包括:
    获取所述视频质量标准参数对应的目标质量类型,在与多个质量类型相关联的预测模型库中,将与所述目标质量类型相匹配的预测模型作为目标预测模型;
    将所述视频特征输入所述目标预测模型,输出所述视频特征分别与所述目标预测模型中的多个参考视频特征之间的匹配度;
    在所述匹配度中将与所述视频特征具有最高匹配度的参考视频特征作为目标参考视频特征,将与所述目标参考视频特征相关联的质量标签信息所对应的样本码率信息,作为与所述待编码视频序列相关联的编码码率。
  7. 根据权利要求6所述的方法,其中,还包括:
    获取与多个业务场景相关联的N个样本视频序列,将所述N个样本视频序列的样本视频特征作为参考视频特征,获取与所述N个样本视频序列相关联的多个质量类型;N为正整数;一个样本视频特征是对一个样本视频序列进行预编码处理后所确定的;
    从所述多个质量类型中获取目标质量类型,获取与所述目标质量类型相关联的样本码率参数;所述样本码率参数包含K个样本码率信息;K为正整数;
    根据所述K个样本码率参数对所述N个样本视频序列中的每个样本视频序列进行遍历编码,得到所述每个样本视频序列在所述K个样本码率参数下的质量评估值;一个质量评估值是一个样本视频序列在一个样本码率参数下所确定的;
    将得到的所有质量评估值作为与目标质量类型相关联的初始模型的质量标签信息,根据所述质量标签信息和N个参考视频特征,对所述初始模型进行训练,根据训练结果确定与所述目标质量类型相匹配的预测模型。
  8. 根据权利要求2所述的方法,其中,所述服务器为分布式服务器集群中的第一服务器;所述业务服务器为所述分布式服务器集群中的第二服务器;
    所述方法还包括:
    将所述编码视频序列作为与所述缩放参数信息相关联的编码码流返回给所述第二服务器,以使所述第二服务器在接收到所述分布式服务器集群中的所有第一服务器针对同一缩放参数信息所返回的编码码流时,根据与切片处理后的视频源相关联的切片标识信息对 接收到的所有编码码流进行合并处理。
  9. 根据权利要求1所述的方法,其中,所述获取与视频源相关联的待编码视频序列,获取与所述待编码视频序列相关联的视频质量标准参数,包括:
    接收用户终端采集并上传的初始视频数据,将接收到的初始视频数据确定为视频源;
    从所述视频源中获取关键视频帧,根据所述关键视频帧和与所述关键视频帧相关联的缓存序列长度,在所述视频源中确定用于进行场景检测的缓存视频序列,根据所述缓存视频序列和所述视频源的缩放参数信息,确定用于进行预编码的待编码视频序列;
    基于所述用户终端的配置信息,为所述待编码视频序列配置视频质量标准参数。
  10. 根据权利要求9所述的方法,其中,所述视频源包含与采集周期相关联的M个视频帧;所述M为正整数;
    所述从所述视频源中获取关键视频帧,根据所述关键视频帧和与所述关键视频帧相关联的缓存序列长度,在所述视频源中确定用于进行场景检测的缓存视频序列,根据所述缓存视频序列和所述视频源的缩放参数信息,确定用于进行预编码的待编码视频序列,包括:
    在所述视频源的所述M个视频帧中,将首个视频帧确定为第一关键视频帧;
    根据所述第一关键视频帧和与所述第一关键视频帧相关联的缓存序列长度,从所述M个视频帧中确定用于进行场景检测的缓存视频序列;
    在所述缓存视频序列中将除第一关键视频帧之外的视频帧确定为待检测视频帧b i,根据所述第一关键视频帧对所述缓存视频序列中的待检测视频帧b i进行场景切变检测;i为大于1且小于M的正整数;
    在检测到所述第一关键视频帧与所述待检测视频帧b i之间的视频内容变化度大于场景切变阈值时,将所述待检测视频帧b i确定为第二关键视频帧;
    将所述第一关键视频帧与所述第二关键视频帧之间的视频序列作为初始视频序列,根据所述视频源的缩放参数信息对所述初始视频序列进行缩放处理,将缩放处理后的初始视频序列确定为用于进行预编码的待编码视频序列。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    在所述缓存视频序列中删除所述初始视频序列,得到过渡视频序列;所述过渡视频序列的首个视频帧为所述第二关键视频帧;
    根据所述过渡视频序列和所述缓存序列长度,从所述视频源中获取待补齐视频序列,根据所述待补齐视频序列对所述过渡视频序列进行补齐处理;补齐处理后的过渡视频序列的序列长度与所述缓存序列长度相同;
    在所述补齐处理后的过渡视频序列中将除所述第二关键视频帧之外的视频帧确定为待检测视频帧d j,根据所述第二关键视频帧对所述补齐处理后的过渡视频序列中的待检测视频帧d j进行场景切变检测;j为大于1且小于所述M的正整数。
  12. 根据权利要求9所述的方法,其中,所述用户终端为虚拟直播间中的主播终端,所述初始视频数据为所述主播终端所采集到的直播视频数据;
    所述方法还包括:
    在获取到所述虚拟直播间中的观众终端的拉流请求时,获取所述拉流请求中的播放分辨率;
    在所述编码视频序列中查找与所述播放分辨率相匹配的缩放参数信息所对应的目标编码视频序列,将所述目标编码视频序列作为编码码流推送给所述观众终端,以使所述观 众终端对所述编码码流进行解码处理后得到所述目标编码视频序列。
  13. 一种视频数据处理装置,包括:
    质量参数获取模块,用于获取与视频源相关联的待编码视频序列,获取与所述待编码视频序列相关联的视频质量标准参数;
    预编码模块,用于根据所述视频质量标准参数对所述待编码视频序列进行预编码处理,得到预编码视频序列,根据所述预编码视频序列确定所述待编码视频序列对应的视频特征;
    码率预测模块,用于根据所述视频质量标准参数以及所述视频特征,预测与所述待编码视频序列相关联的编码码率;
    视频编码模块,用于根据所述编码码率对所述待编码视频序列进行编码处理,得到与所述视频源相关联的编码视频序列。
  14. 一种计算机设备,包括:一个或多个处理器以及一个或多个存储器;
    所述一个或多个存储器用于存储程序代码,所述一个或多个处理器用于调用并执行所述程序代码,以使得所述计算机设备执行如权利要求1-12任一项所述的方法。
  15. 一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被计算机设备的处理器执行时,使得所述计算机设备执行如权利要求1-12任一项所述的方法。
PCT/CN2020/126067 2020-01-22 2020-11-03 一种视频数据处理方法、装置及存储介质 WO2021147448A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/713,205 US20220232222A1 (en) 2020-01-22 2022-04-04 Video data processing method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010075680.9 2020-01-22
CN202010075680.9A CN111263154B (zh) 2020-01-22 2020-01-22 一种视频数据处理方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/713,205 Continuation US20220232222A1 (en) 2020-01-22 2022-04-04 Video data processing method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2021147448A1 true WO2021147448A1 (zh) 2021-07-29

Family

ID=70951004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126067 WO2021147448A1 (zh) 2020-01-22 2020-11-03 一种视频数据处理方法、装置及存储介质

Country Status (3)

Country Link
US (1) US20220232222A1 (zh)
CN (1) CN111263154B (zh)
WO (1) WO2021147448A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207205A1 (zh) * 2022-04-29 2023-11-02 上海哔哩哔哩科技有限公司 视频编码方法及装置

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111263154B (zh) * 2020-01-22 2022-02-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质
CN113518250B (zh) * 2020-08-07 2022-08-02 腾讯科技(深圳)有限公司 一种多媒体数据处理方法、装置、设备及可读存储介质
CN113301340A (zh) * 2020-09-29 2021-08-24 阿里巴巴集团控股有限公司 一种编码参数确定方法、视频传送方法和装置
CN112702605A (zh) * 2020-12-24 2021-04-23 百果园技术(新加坡)有限公司 视频转码***、视频转码方法、电子设备和存储介质
CN116567228A (zh) * 2022-01-27 2023-08-08 腾讯科技(深圳)有限公司 编码方法、实时通信方法、装置、设备及存储介质
CN114666634B (zh) * 2022-03-21 2024-03-19 北京达佳互联信息技术有限公司 画质检测结果显示方法、装置、设备及存储介质
CN114866840A (zh) * 2022-03-31 2022-08-05 广州方硅信息技术有限公司 Vmaf画质评价方法、终端、主机、***及存储介质
CN117354524B (zh) * 2023-12-04 2024-04-09 腾讯科技(深圳)有限公司 编码器编码性能测试方法、装置、设备及计算机介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016007215A1 (en) * 2014-07-10 2016-01-14 Intel Corporation Adaptive bitrate streaming for wireless video
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
CN110324621A (zh) * 2019-07-04 2019-10-11 北京达佳互联信息技术有限公司 视频编码方法、装置、电子设备和存储介质
CN110324721A (zh) * 2019-08-05 2019-10-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质
CN110719457A (zh) * 2019-09-17 2020-01-21 北京达佳互联信息技术有限公司 一种视频编码方法、装置、电子设备及存储介质
CN111263154A (zh) * 2020-01-22 2020-06-09 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007028175A1 (de) * 2007-06-20 2009-01-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Automatisiertes Verfahren zur zeitlichen Segmentierung eines Videos in Szenen unter Berücksichtigung verschiedener Typen von Übergängen zwischen Bildfolgen

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016007215A1 (en) * 2014-07-10 2016-01-14 Intel Corporation Adaptive bitrate streaming for wireless video
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
CN110324621A (zh) * 2019-07-04 2019-10-11 北京达佳互联信息技术有限公司 视频编码方法、装置、电子设备和存储介质
CN110324721A (zh) * 2019-08-05 2019-10-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质
CN110719457A (zh) * 2019-09-17 2020-01-21 北京达佳互联信息技术有限公司 一种视频编码方法、装置、电子设备及存储介质
CN111263154A (zh) * 2020-01-22 2020-06-09 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207205A1 (zh) * 2022-04-29 2023-11-02 上海哔哩哔哩科技有限公司 视频编码方法及装置

Also Published As

Publication number Publication date
CN111263154B (zh) 2022-02-11
US20220232222A1 (en) 2022-07-21
CN111263154A (zh) 2020-06-09

Similar Documents

Publication Publication Date Title
WO2021147448A1 (zh) 一种视频数据处理方法、装置及存储介质
CN111277826B (zh) 一种视频数据处理方法、装置及存储介质
US9800883B2 (en) Parallel video transcoding
CN111294612B (zh) 一种多媒体数据处理方法、***以及存储介质
US9571827B2 (en) Techniques for adaptive video streaming
US9071841B2 (en) Video transcoding with dynamically modifiable spatial resolution
US9532080B2 (en) Systems and methods for the reuse of encoding information in encoding alternative streams of video data
CN101917613B (zh) 一种流媒体采集编码服务***
Barman et al. H. 264/MPEG-AVC, H. 265/MPEG-HEVC and VP9 codec comparison for live gaming video streaming
CN108810545B (zh) 用于视频编码的方法、装置、计算机可读介质及电子设备
CN107634930B (zh) 一种媒体数据的获取方法和装置
EP3493547B1 (en) Video streaming delivery
KR20200109359A (ko) 비디오 스트리밍
US11356739B2 (en) Video playback method, terminal apparatus, and storage medium
WO2023142716A1 (zh) 编码方法、实时通信方法、装置、设备及存储介质
CN111970565A (zh) 视频数据处理方法、装置、电子设备及存储介质
US10674111B2 (en) Systems and methods for profile based media segment rendering
GB2610397A (en) Encoding and decoding video data
JP2022553964A (ja) ビデオ符号化の方法、装置およびコンピュータプログラム
JP7434561B2 (ja) Mpd有効期限処理モデル
US20240244229A1 (en) Systems and methods for predictive coding
Chen et al. AGiLE: Enhancing Adaptive GOP in Live Video Streaming
KR20230053229A (ko) 분산형 병렬 트랜스코딩 방법 및 장치
KR20230053210A (ko) 분산형 병렬 트랜스코딩 방법 및 장치
CN117714700A (zh) 一种视频编码方法、装置、设备、可读存储介质及产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915215

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915215

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/11/2022)