CN116248884B

CN116248884B - Multi-channel video decoding method based on session multiplexing

Info

Publication number: CN116248884B
Application number: CN202310536849.XA
Authority: CN
Inventors: 温研; 晏华
Original assignee: Beijing Linzhuo Information Technology Co Ltd
Current assignee: Beijing Linzhuo Information Technology Co Ltd
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-06-30
Anticipated expiration: 2043-05-12
Also published as: CN116248884A

Abstract

The invention discloses a multi-channel video decoding method based on session multiplexing, which is characterized in that a mode of distributing decoding sessions to a video stream to be decoded is determined according to the relation between the number of decoding sessions which are currently built by a GPU and the constraint of the number of decoding sessions, when the number of the decoding sessions which are currently built is smaller than the constraint, a new decoding session is built, otherwise, the built decoding sessions are multiplexed, so that more video decoding tasks are executed without increasing GPU hardware, and the video application scale of a An Zhuoyun, graphic cloud and multi-channel video server is obviously improved.

Description

Multi-channel video decoding method based on session multiplexing

Technical Field

The invention belongs to the technical field of computer software development, and particularly relates to a multichannel video decoding method based on session multiplexing.

Background

The current widespread use of An Zhuoyun, graphics cloud, and multi-path video servers puts higher demands on the decoding performance of GPUs. When using a server such as An Zhuoyun, a graphics cloud, and multiple videos, the requirement that the same GPU performs multiple processes of video decoding in parallel often occurs, however, most GPUs have a limitation on the number of supportable concurrent decoding sessions, and in general, the sessions in the GPU and the video streams to be decoded are in a one-to-one correspondence relationship, so the limitation on the number of sessions also indirectly limits the number of tasks of GPU processing decoding, and this limitation is difficult to meet the increasing requirement of the server such as An Zhuoyun, the graphics cloud, and multiple videos on GPU decoding capability.

Disclosure of Invention

In view of this, the present invention provides a multi-channel video decoding method based on session multiplexing, which realizes decoding of more video streams based on decoding session multiplexing, and breaks through the limitation of GPU to the number of decoding sessions.

The invention provides a multi-channel video decoding method based on session multiplexing, which comprises the following steps:

step 1, a video stream to be decoded is opened by a coding and decoding engine, the coding format and resolution of the video stream to be decoded are obtained, and a stream identifier is added for the video stream to be decoded;

step 2, the encoding and decoding engine obtains the total number of decoding sessions currently established by the GPU, if the total number of decoding sessions is smaller than the upper limit of the number, the step 3 is executed, otherwise, the step 4 is executed;

step 3, creating new decoding session and decoding history information for the video stream to be decoded, binding a decoder and a context for the decoding session, executing standard decoding operation, storing the serial numbers and stream identifiers of frames to be decoded in the decoding process in the decoding history information, and ending the process;

step 4, searching a decoding session which has the same coding format as the video stream to be decoded and has a resolution not smaller than the resolution of the video stream to be decoded in the currently established decoding session, taking the decoding session as a base line video stream if the decoding session exists, taking the video stream to be decoded as a combined video stream, and executing the step 6 if the decoding session does not exist;

step 5, judging whether a decoding session with the same coding format exists in the currently established decoding session, if so, selecting two decoding sessions as a base line video stream and a combined video stream respectively, adding a frame to be decoded of the combined video stream to the base line video stream to form a new video stream to be decoded, and executing step 3; if not, reporting an error and ending the flow;

and 6, adding the frames to be decoded of the combined video streams into the baseline video stream to form a new video stream to be decoded, executing decoding operation on the video stream to be decoded, storing the serial numbers and the stream identifications of the frames to be decoded in the decoding process in decoding history information, and ending the flow.

Further, the codec engine is FFmpeg or GStreamer.

Further, the upper limit of the number in the step 2 is the upper limit of the number of decoding sessions of the GPU.

Further, the step 4 further includes:

step 4.1, searching a decoding session with the same coding format as the video stream to be decoded in the currently established decoding session, if yes, executing the step 4.2, otherwise, executing the step 5;

step 4.2, judging whether a decoding session with the same resolution as the video stream to be decoded exists in the decoding sessions with the same coding format, and if so, executing the step 4.3 by taking the searched decoding session as an alternative session; if not, judging whether a decoding session with the resolution not less than the video stream to be decoded exists, if so, executing the step 4.3 by taking the decoding session with the difference between the resolution and the resolution of the video stream to be decoded less than a first threshold as an alternative session, and if not, executing the step 5;

step 4.3, if the number of alternative sessions is multiple, determining a reusable decoding session according to a first priority order, taking the decoding session as a base line video stream and a video stream to be decoded as a combined video stream, and executing step 6; if the reusable decoding session is not obtained according to the first priority order, randomly selecting one decoding session from the alternative sessions as a base line video stream and the video stream to be decoded as a combined video stream, and executing step 6.

Further, the first priority in the step 4.3 is selected according to the conditions from the decoding session with the difference between the coding rate of the video stream to be decoded and the coding rate of the video stream to be decoded being smaller than the second threshold, the decoding session with the type of the network video stream, the decoding session with the decoding history information and the minimum B frame, the decoding session with the decoding history information and the minimum P frame, and the decoding session with the minimum remaining decoding frame number.

Further, the step 5 further includes: judging whether a decoding session with the same coding format exists in the currently established decoding session, if so, selecting two decoding sessions as a base line video stream and a combined video stream according to a second priority order; and if the two decoding sessions are not selected according to the second priority order, the two decoding sessions are randomly selected from the currently established decoding sessions to serve as a base line video stream and a combined video stream respectively.

Further, the second priority order is selected according to the conditions that the resolutions are the same, the difference of the resolutions is smaller than a first threshold value, the difference of the coding rates is smaller than a second threshold value, the decoding history information exists and the B frames are minimum, the decoding history information exists and the sum of the P frames and the rest decoding frames is minimum in sequence.

Further, the adding the to-be-decoded frames of the combined video stream to the baseline video stream forms a new to-be-decoded video stream, and the method further includes:

if the current frame to be decoded is an I frame, recording the original number and the stream identification of the current frame to be decoded, and sequentially distributing continuous numbers for the current frame to be decoded; if the current frame to be decoded is a P frame, recording the original number and the stream identification of the current frame to be decoded, determining the continuous number of the I frame according to the original number and the stream identification of the dependent I frame, sequentially distributing the continuous number for the current frame to be decoded, and modifying the number of the dependent I frame into the continuous number; if the current frame to be decoded is a B frame, the original number and the stream identification of the current frame to be decoded are recorded, the continuous numbers of the forward I frame and the backward I frame are determined according to the original numbers and the stream identification of the forward I frame and the backward I frame which are depended on, the continuous numbers are sequentially allocated to the current frame to be decoded, and the numbers of the forward I frame and the backward I frame which are depended on are modified into the continuous numbers.

when the resolution of the combined video stream is different from the resolution of the baseline video stream, reallocating the decoding result space A according to the resolution of the baseline video stream, and modifying the resolution of the combined video stream to the resolution of the baseline video stream; otherwise, the original decoding result space is used.

Further, the performing decoding operation on the video stream to be decoded further includes: and copying the decoding result in the decoding result space A to the original decoding result space of the combined video stream after the decoding operation is executed.

Advantageous effects

According to the method and the device, a mode of distributing decoding sessions for the video stream to be decoded is determined according to the relation between the number of decoding sessions currently established by the GPU and the constraint of the number of decoding sessions, when the number of decoding sessions currently established is smaller than the constraint, a new decoding session is established, otherwise, the established decoding sessions are multiplexed, so that more video decoding tasks are operated under the condition that GPU hardware is not increased, and the video application scale of a An Zhuoyun server, a graphic cloud server and a multipath video server is remarkably improved.

Detailed Description

The present invention will be described in detail with reference to the following examples.

In the video decoding process, various video compression coding algorithms are often used to reduce the data capacity, and IPB frames are one of the most common algorithms. In the IPB frame, an I frame is an Intra frame (Intra Picture), also called full frame compression coding frame, and is usually the first frame of each Group of pictures (GOP), which is moderately compressed to serve as a reference point for random access, and may serve as a still image; p-frames are forward Predictive-coded frames (Predictive-frames), which generally refer to coded pictures in a sequence of pictures that have been compressed by removing sufficiently the temporal redundancy information of previously coded frames to a transmission data amount, also referred to as predicted frames; b frames are Bi-predictive interpolated encoded frames (Bi-directional Interpolated Prediction Frame) that compress an encoded image of a transmitted data amount by removing temporal redundancy information between a preceding encoded frame and a following encoded frame in an image sequence, also referred to as Bi-predictive frames.

From the decoding perspective, the I frame itself can be decompressed into a complete video picture by a video decompression algorithm, the P frame needs to refer to the previous I frame or P frame to be decoded into a complete video picture, and the B frame needs to refer to the previous I frame or P frame and the following P frame to be decoded into a complete video picture, so that the I frame removes redundant information of the video frame in the spatial dimension, and the P frame and the B frame remove redundant information of the video frame in the time dimension.

The invention provides a multi-channel video decoding method based on session multiplexing, which has the following core ideas: and by modifying the decoding process of the encoding and decoding engine, determining a mode of distributing decoding sessions for the video stream to be decoded according to the relation between the number of decoding sessions currently established by the GPU and the constraint of the number of decoding sessions, establishing a new decoding session when the number of the decoding sessions currently established is smaller than the constraint, otherwise multiplexing the established decoding session, namely, accessing frames to be decoded of a plurality of decoding tasks into the same decoding session, and realizing multiplexing of the decoding session.

The invention provides a multi-channel video decoding method based on session multiplexing, which specifically comprises the following steps:

step 1, opening a video stream to be decoded by a coding and decoding engine, obtaining video information such as coding format, frame resolution and the like of the video stream to be decoded, and adding a stream identifier for the video stream to be decoded.

Among them, the codec engine may employ FFmpeg, GStreamer, etc. In the existing coding engine, the decoding session and the video stream are in one-to-one correspondence, but in the invention, the decoding session and the video stream may be in a many-to-one relationship, so that an identifier needs to be added for the video stream. Specifically, taking FFmpeg as an example, an av_read_frame method provided by FFmpeg is used to read an audio stream, a video stream and a subtitle stream to obtain an AVPacket data packet, and by modifying the frame reading function av_read_frame of FFmpeg and the data structure AVPacket of the frame, the stream identification ID of the video stream to be decoded can be added.

And step 2, the encoding and decoding engine acquires the total number of decoding sessions currently established by the GPU, if the total number of decoding sessions is smaller than the upper limit of the number of decoding sessions of the GPU, the step 3 is executed, otherwise, the step 4 is executed.

And step 3, creating new decoding session and decoding history information for the video stream to be decoded, binding a decoder and a context for the decoding session, executing standard decoding operation, and storing the information such as the number, the stream identifier and the like of the frame to be decoded in the decoding process in the decoding history information.

And 4, searching a decoding session with the same coding format as the video stream to be decoded in the currently established decoding session, if so, executing the step 5, otherwise, executing the step 7.

Step 5, judging whether a decoding session with the same resolution as the video stream to be decoded exists in the decoding sessions with the same coding format, and if so, executing step 6 by taking the searched decoding session as an alternative session; if not, judging whether a decoding session with the resolution larger than the video stream to be decoded exists, if so, executing the step 6 by taking the decoding session with the difference between the resolution and the resolution of the video stream to be decoded smaller than the first threshold as an alternative session, and if not, executing the step 7.

Step 6, if the number of alternative sessions is multiple, determining a reusable decoding session according to the priority sequence of the decoding session with the difference between the encoding code rate of the video stream to be decoded and the encoding code rate of the video stream to be decoded being less than a second threshold, the decoding session with the type of the network video stream, the decoding session with the decoding history information and the minimum B frame, the decoding session with the decoding history information and the minimum P frame and the decoding session with the minimum residual decoding frame number, and executing the step 9, wherein the decoding session is used as a base line video stream and the video stream to be decoded is used as a combined video stream; if no reusable decoding session is obtained according to the priority order, randomly selecting one decoding session from the alternative sessions as a base line video stream and the video stream to be decoded as a combined video stream, and executing step 9.

Step 7, judging whether a decoding session with the same coding format exists in the currently established decoding session, if so, selecting two decoding sessions as decoding sessions to be combined according to the priority sequence of the same resolution, the difference of the resolutions being smaller than a first threshold value, the difference of the coding rates being smaller than a second threshold value, the presence of decoding history information and the minimum B frames, the presence of decoding history information and the minimum P frames and the minimum sum of the residual decoding frames in sequence in the decoding session with the same coding format, and if not, randomly selecting two decoding sessions from the currently established decoding sessions as decoding sessions to be combined, and executing step 8; if not, indicating that no decoding session is available, reporting an error and exiting the process.

And step 8, taking any one of the two decoding sessions to be combined as a base line video stream and the other one as a combined video stream to execute step 9, and simultaneously executing step 3.

Step 9, when the resolution of the combined video stream is different from the resolution of the baseline video stream, reallocating the decoding result space A to store the decoding result according to the resolution of the baseline video stream, and modifying the resolution of the combined video stream to the resolution of the baseline video stream; otherwise, using the original decoding result space;

if the current frame to be decoded is an I frame, recording an original number and a stream identifier of the current frame to be decoded, and sequentially distributing continuous numbers for the current frame to be decoded, wherein the continuous numbers are numbers of the current frame to be decoded in the combined video stream; if the current frame to be decoded is a P frame, recording the original number and the stream identification of the current frame to be decoded, determining the continuous number of the I frame according to the original number and the stream identification of the dependent I frame, sequentially distributing the continuous number for the current frame to be decoded, and modifying the number of the dependent I frame into the continuous number; if the current frame to be decoded is a B frame, recording the original number and the stream identification of the current frame to be decoded, determining the continuous numbers of the forward I frame and the backward I frame according to the original numbers and the stream identification of the forward I frame and the backward I frame which are depended on, sequentially distributing the continuous numbers for the current frame to be decoded, and modifying the numbers of the forward I frame and the backward I frame which are depended on into the continuous numbers; storing the information such as the original number, the stream identification, the continuous number and the like of the modified frame to be decoded in decoding history information;

and adding the modified frame to be decoded into the baseline video stream to execute decoding operation, and copying the decoding result in the decoding result space A into the original decoding result space of the combined video stream after the decoding operation is executed.

Examples

The multi-channel video decoding method based on session multiplexing provided by the invention in the embodiment realizes the support of GPU to more video decoding tasks, and specifically comprises the following steps:

s1, reading decoding history information.

For each video file which is decoded, decoding history information is established, and the decoding history information can be stored in a file mode, namely a history information file; the history information file is a binary file, each two bits represents the type of a frame, the initial value is 00, if the initial value is 01, the frame is an I frame, 10 is a P frame, and 11 is a B frame.

The file information is read into the memory in the form of binary arrays, and each decoded video file corresponds to one array. The decoding HISTORY information is noted decodedVideoHistory, decodedVideoHistory as an array of type video_high, which contains the full path of the VIDEO file and the corresponding P/B frame binary array.

S2, modifying the decoding process of the FFmpeg (encoding and decoding engine), and when the decoding session is created by using the GPU interface, if the total number of the decoding sessions which are currently established reaches the GPU limit, merging the session into the decoding session which is currently established, wherein the combinable decoding session must have the same coding format as the video stream to be decoded.

Wherein the GPU interface is VAAPI or VDPAU.

S2.1, opening a video stream including a network video stream or a video file by adopting an associated API of FFmpeg, such as an avformat_open_input, and acquiring video information of the video stream, such as an encoding format, a frame resolution and the like by using an avformat_find_stream_info method.

Creating a decodedvideo history for the current video stream, all bits initialized to 0; the frame read function av_read_frame of FFmpeg and the data structure AVPacket of the frame are modified again, and the stream identification ID of the video stream where the frame is located is added and recorded as the stream ID.

S2.2, modifying the initialization functions of FFmpeg, such as average_find_decoder and average_alloc_context 3, and realizing multiplexing of the existing decoding session.

If the current decoding session number does not exceed the GPU limit, executing a standard decoding flow, namely creating a new GPU decoding session, and binding the new GPU decoding session to a decoder and context; if the current number of decoding sessions has reached or exceeded the GPU limit, then the decoding sessions with the same encoding format are looked up in the existing sessions,

if decoding sessions with the same coding format exist, selecting one decoding session for multiplexing in the following mode:

firstly, searching decoding sessions with the same resolution, if the decoding sessions do not exist, searching decoding sessions with larger resolution and closest decoding sessions, and if the decoding sessions do not exist, ending the process by reporting errors;

if a plurality of alternative sessions exist, the session with the closest coding rate is found out as a new alternative session; if multiple alternative sessions exist, the session of the network video stream is preferentially selected as a new alternative session, because the network video stream generally does not contain B frames; if a plurality of alternative sessions exist, selecting a session with a decodedvideo history and the least B frame as a new alternative session; if a plurality of alternative sessions exist, selecting the session with the least P frames as a new alternative session; if a plurality of alternative sessions exist, a session with the least residual decoding frame number is found out as a new alternative session, wherein the residual decoding frame number is the product of the residual duration and the code rate; if a plurality of alternative sessions exist, one decoding session is randomly selected as a current video stream to be decoded as a base stream, and the base stream is recorded.

If no decoding session with the same coding format exists, searching whether decoding sessions with the same coding format exist in all established decoding sessions, for example, decoding sessions with two H264 coding formats exist in the existing decoding sessions; if the two video streams exist, two independent sessions with the same coding format are selected for fusion, and the independent session is a decoding session without fusion of other video streams, wherein the fusion mode is as follows:

firstly, two decoding sessions with the same resolution are found, and if the two decoding sessions with the smallest resolution difference are not found, the two decoding sessions with the smallest resolution difference are selected; if a plurality of alternatives exist, two decoding sessions with the closest coding rate are found out, and the network video streams are not considered preferentially, because the sources of the two paths of network video streams are generally different, and the merging and decoding fluctuation of the two paths of network video streams is larger; if there are multiple alternatives and there are decoding sessions with decodedVideoHistory, then find out two sessions with minimum sum of B frames from them; if a plurality of alternatives exist, two sessions with the least sum of P frames are found; if a plurality of alternatives exist, two sessions with the least sum of the residual decoding frames are found; if there are multiple alternatives, then two sessions are randomly selected.

After the two decoding sessions are combined, a standard subsequent decoding flow is executed, namely a new GPU decoding session is created and bound to a decoder and context.

If there is no decoding session with the same coding format in all the established decoding sessions, the decoder initializes the related function to report errors and exits the current decoding flow.

S3, judging that the combined video stream is not the baseline video stream according to the streamId of the combined video stream, and when the resolution of the combined video stream is different from that of the baseline video stream, reallocating a decoding result space A according to the resolution of the baseline video stream to store a decoding result, modifying the resolution of the combined video stream into that of the baseline video stream, and aligning the read frame content according to Top and Left; otherwise, the original decoding result space is used.

The embodiment realizes that two paths of video streams reuse one session by modifying the average codec_send_packet function of FFmpeg.

The method comprises the following steps: if the resolution of the current frame is different from that of the baseline video stream, reallocating a decoding result space according to the resolution of the baseline video stream, judging whether the stream Id of the current input frame to be decoded is the same as the previous frame to be decoded, if not, searching forward I frames belonging to another video stream and nearest to the current session of FFmpeg forward, adding 1 to the frame number of the other video streams in the structure, namely, recording how many frames of another video stream are inserted after one I frame;

decoding is performed based on the frame type:

if the frame is the I frame, executing the standard follow-up decoding code of the average code_send_packet, and recording the current frame number; if the frame is the P frame, calculating the sum of the frames of other video streams of all I frames (without the latest I frame) before the current P frame of the current video stream in the current session, and adding the sum to the number of the I frame referenced by the current P frame; if the frame is a B frame, calculating a new number of a forward I frame of the current B frame according to the previous calculation mode, then calculating the sum of the frames of other video streams of all I frames (including the latest I frame) before the current B frame of the current video stream in the current session, and adding the sum to the number of a backward I frame referenced by the current B frame.

And executing the subsequent standard codes to finish decoding.

For frames to be decoded that differ from the resolution of the baseline video stream, the space that the decoding result reallocates is copied back to the original decoding result space according to the true resolution of the frame. After the decoding of the current video stream is finished, the decoded video history is written into the file.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The multi-channel video decoding method based on session multiplexing is characterized by comprising the following steps:

2. The multi-channel video decoding method of claim 1 wherein the codec engine is FFmpeg or GStreamer.

3. The multi-channel video decoding method according to claim 1, wherein the upper limit of the number in step 2 is an upper limit of the number of decoding sessions of the GPU.

4. The multi-channel video decoding method according to claim 1, wherein the step 4 further comprises:

5. The multi-channel video decoding method according to claim 4, wherein the first priority in step 4.3 is selected in order from a decoding session having a difference between a coding rate with respect to a video stream to be decoded smaller than a second threshold, a decoding session of a network video stream type, a decoding session having decoding history information and having a minimum number of B frames, a decoding session having decoding history information and having a minimum number of P frames, to a decoding session having a minimum number of remaining decoding frames.

6. The multi-channel video decoding method according to claim 1, wherein the step 5 further comprises: judging whether a decoding session with the same coding format exists in the currently established decoding session, if so, selecting two decoding sessions as a base line video stream and a combined video stream according to a second priority order; and if the two decoding sessions are not selected according to the second priority order, the two decoding sessions are randomly selected from the currently established decoding sessions to serve as a base line video stream and a combined video stream respectively.

7. The multi-channel video decoding method of claim 6 wherein the second priority order is selected in order from the same resolution, a difference in resolution less than a first threshold, a difference in coding rate less than a second threshold, there being decoding history information with minimum B frames, there being decoding history information with minimum P frames to the sum of remaining decoding frames.

8. The multi-channel video decoding method of claim 1, wherein adding the frames to be decoded of the combined video stream to the baseline video stream forms a new video stream to be decoded, further comprising:

9. The multi-channel video decoding method of claim 1, wherein adding the frames to be decoded of the combined video stream to the baseline video stream forms a new video stream to be decoded, further comprising:

10. The multi-channel video decoding method of claim 9, wherein the performing a decoding operation on the video stream to be decoded further comprises: and copying the decoding result in the decoding result space A to the original decoding result space of the combined video stream after the decoding operation is executed.