CN113055680B

CN113055680B - Distributed transcoding method

Info

Publication number: CN113055680B
Application number: CN202110282203.4A
Authority: CN
Inventors: 陈典; 谭顺华
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-12-21
Anticipated expiration: 2041-03-16
Also published as: CN113055680A

Abstract

The invention discloses a distributed transcoding method, which comprises the following steps: s1, uploading to a server; s2, checking; s3, separating data streams; s4, slicing the video stream data; s5, generating a transcoding task queue; s6, delivering the task information to a transcoding unit in an idle state for transcoding; s7, judging whether all tasks to be transcoded are completed, if so, entering the step S8, otherwise, returning to the step S6; and S8, synthesizing the transcoded sub-videos into a complete video stream, and encapsulating the complete video stream and the separated other stream data to generate a complete transcoded data file. The invention can rapidly complete data fragmentation based on data block fragmentation, and simultaneously add GOP detection, thereby ensuring imaging quality; by increasing the calculation amount of the subtasks and roughly setting the task granularity, the system overhead caused by frequent tasks is reduced.

Description

Distributed transcoding method

Technical Field

The invention relates to the field of video data processing, in particular to a distributed transcoding method.

Background

In today's mobile internet environment, video data of different format specifications is being produced in large quantities from different devices. In contrast to the past, a large number of different video standards, different terminal equipment, different transmission networks were put into use. Video transcoding is a method for converting the encoding mode or the display mode of an original video into another different format, so as to adapt to different scene use requirements. Generally speaking, the transcoding process is a computationally intensive task, and the time required for computation is greatly increased according to the frame rate and resolution of the video.

With the development of scientific technology and the popularization of cloud computing service, the performance of computer equipment is improved, the conventional single-computer single-process video transcoding means cannot meet the use requirement easily, a distributed parallel computing framework and a distributed file system are used, data fragmentation is carried out on a single video, transcoding tasks are carried out under multiple computers simultaneously, and the time consumption for transcoding can be effectively reduced.

An important work before distributed transcoding is to segment video files, and currently, commonly used segmentation means are segmentation according to video time length, code stream data length and GOP (group of pictures). The first method is simple to realize according to the time segmentation mode, the overall segmentation speed is high, the image information is well stored, but the length of the segmented data is difficult to guarantee to be consistent, and therefore different machines under the transcoding cluster are unbalanced in work. The division according to the data length of the code stream is fast, subtasks are balanced, but the data-based division can destroy the continuity of the GOP of the image, so that the quality of the transcoded video is reduced. The information cannot be lost by dividing the video according to the GOP unit, but the whole video is pre-decoded by a dividing component during dividing, and meanwhile, a large number of block data blocks are generated by dividing the video according to the GOP unit, so that multiple IO operations are caused, and the whole decoding speed is reduced.

Disclosure of Invention

Aiming at the defects in the prior art, the distributed transcoding method provided by the invention solves the problems that the decoding speed is reduced and the quality of the transcoded and generated video is influenced by the conventional video segmentation method.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a distributed transcoding method is provided, which comprises the following steps:

s1, generating an uploading task according to the basic information of the target video file, and uploading the target video file to a server;

s2, checking the uploaded file according to the uploading task, if the uploaded file passes the checking, entering the step S3, and if the uploaded file passes the checking, terminating the subsequent operation;

s3, adding a transcoding task, and separating the video stream in the target video file from other data streams to obtain other stream data and video stream data;

s4, slicing the video stream data;

s5, generating a transcoding task queue: copying corresponding data segments in the target video file according to the fragmentation result to obtain a plurality of source data blocks, and a task to be transcoded and task information corresponding to each source data block;

s6, taking out the task to be transcoded from the transcoding task queue in an FIFO mode through the main node, and handing the task information to the transcoding unit in an idle state for transcoding;

s7, judging whether all tasks to be transcoded are completed, if so, entering the step S8, otherwise, returning to the step S6;

and S8, synthesizing the transcoded sub-videos into a complete video stream, and encapsulating the complete video stream and the separated other stream data to generate a complete transcoded data file.

Further, in step S1, the target video file basic information includes a video type, a video duration, an encoding mode of a video stream, a video encoding parameter, a video encoding level, an image resolution, an image sampling mode, an image frame rate, an image bit rate, an encoding mode of an audio stream, a sampling rate of an audio stream, a video file byte length, a video transcoding parameter, a transcoded video resolution, a transcoded video frame rate, and a transcoded video encoding mode.

Further, the specific method for verifying the uploaded file according to the upload task in step S2 is as follows:

checking the length and the size of the uploaded file; if the uploaded file is in the packaged format, reading the key metadata and verifying whether the MIME file type is correct or not; verifying whether the data length in the mdat is consistent with the file size; and if the two pass, the verification is passed.

Further, the specific method of step S4 includes the following sub-steps:

s4-1, obtaining the size of the basic block according to the resolution and frame rate of the video stream data, and representing the size as sgt _ size;

s4-2, judging whether the video stream data is packaged, if so, entering the step S4-3, otherwise, entering the step S4-9;

s4-3, decapsulating the video stream data, and reading the index information of the stss key frame list, the size list information of the stsz sampling block and the offset list information of the stco block data;

s4-4, accumulating the index information of the stss key frame list and the size list information of the stsz sampling block to generate a new index list stgs which is based on the GOP block size list and is monotonically increased;

s4-5, from offset position i in stgs_offsetStarting accumulation calculation, judging whether an stgs element with the accumulated data size of 0.9-1.1 times of that of the basic block exists from the current offset position, and if so, judging the stgs elementThe position is marked as i; otherwise, marking the position of the latest stgs element of the basic block which is more than 1.1 times as large as i; wherein i_offsetIs 0;

s4-6, searching the sampling sequence number j of the corresponding position i in the index information of the stss key frame list, and copying the offset amount j in the video stream data according to the size list information of the stsz sampling block and the offset list information of the stco block data_offsetJ, writing the data block into a disk to finish the video stream data fragmentation; wherein j_offsetIs 0;

s4-7, using the current position i as a new offset position i_offsetTaking the current j value as a new j_offsetA value;

s4-8, repeating the steps S4-5 to S4-7 until all the fragments of the video stream data are completed;

s4-9, obtaining an offset data point by adopting 0.95 as a basic offset coefficient, reading subsequent data of the offset data point, decoding, and finishing primary video stream data fragmentation by taking the current offset when an IDR frame is decoded as a fragmentation position;

s4-10, repeating the step S4-9 until all the fragments of the video stream data are finished.

Further, the specific method of step S4-1 is:

for video stream data with a resolution of 360P, setting the size of a basic block to 6M if the frame rate is 30fps, and setting the size of the basic block to 12M if the frame rate is 60 fps;

for video stream data with a resolution of 480P, the base block size is set to 12M if the frame rate is 30fps, and 25M if the frame rate is 60 fps;

for the video stream data with the resolution of 7200P, the size of the basic block is set to 20M if the frame rate is 30fps, and the size of the basic block is set to 40M if the frame rate is 60 fps;

for video stream data having a resolution of 1080P, the base block size is set to 35M if the frame rate is 30fps, and 64M if the frame rate is 60 fps.

Further, the specific method for transcoding by the transcoding unit in step S6 includes the following sub-steps:

s6-1, the transcoding unit receiving the task to be transcoded updates the node state from idle to busy, and analyzes task information, wherein the task information comprises an original data block path, an original video frame rate, an original video image resolution, an original video encoding mode, a transcoded video frame rate, a transcoded video image resolution, a transcoded video encoding mode and a transcoded output sequence number;

s6-2, judging whether video frame rates, image resolutions and encoding modes before and after a task to be transcoded are consistent, if so, entering a step S6-4, and otherwise, entering a step S6-3;

s6-3, directly extracting and assembling the video stream data to complete transcoding, and entering the step S6-5;

s6-4, according to the original video frame rate, the original video image resolution, the original video coding mode, the transcoded video frame rate, the transcoded video image resolution and the transcoded video coding mode, the original video parameter information initialization decoder and the target parameter initialization encoder to be transcoded are adopted for transcoding. Entering step S6-5;

and S6-5, writing the transcoded video stream data into a file system, exiting the transcoding process, releasing the decoder and the encoder, closing the opened source video file, cleaning the memory and related resources, and updating the node state of the transcoding unit from busy to idle.

Further, the specific method for transcoding by using the original video parameter information initialization decoder and the target parameter initialization encoder to be transcoded in step S6-4 is as follows:

the original video parameter information is adopted to initialize a decoder to decompress the video data stream to be transcoded according to the original format, and the video data stream is restored into a frame-by-frame continuous YUV 420-format image; judging whether the resolution of a frame of continuous YUV 420-format image is reduced, if so, stretching the frame of continuous YUV 420-format image to a target resolution to obtain an image to be coded; otherwise, directly obtaining the image to be coded;

and (5) adopting a target parameter to be transcoded to initialize an encoder to encode the image to be encoded according to the video encoding mode after transcoding, generating new video stream data, and finishing transcoding.

The invention has the beneficial effects that:

1. the method can rapidly complete data fragmentation based on data block fragmentation, and simultaneously add GOP detection to ensure imaging quality. Frequent task loading increases overall performance consumption compared to full GOP fragmentation, which generates a large number of small data tasks. By increasing the calculation amount of the subtasks and roughly setting the task granularity, the system overhead caused by frequent tasks is reduced.

2. The method not only supports the transcoding work of the conventional playing format (MP4, MKV), but also supports the video transcoding and packaging based on the bare stream (H.264, H.265).

3. The method is based on a distributed parallel computing framework, decomposes a long-time-consuming transcoding task into different subtasks and completes the whole transcoding process in parallel, and can greatly accelerate the whole transcoding process.

Drawings

FIG. 1 is a schematic flow diagram of the process;

FIG. 2 is a schematic diagram of a system architecture in which the method may be implemented.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, the distributed transcoding method includes the following steps:

s4, slicing the video stream data;

In step S1, the target video file basic information includes a video type, a video duration, a video stream encoding mode, video encoding parameters, a video encoding level, an image resolution, an image sampling mode, an image frame rate, an image bit rate, an audio stream encoding mode, an audio stream sampling rate, a video file byte length, video transcoding parameters, a transcoded video resolution, a transcoded video frame rate, and a transcoded video encoding mode.

The specific method for verifying the uploaded file according to the upload task in step S2 is as follows: checking the length and the size of the uploaded file; if the uploaded file is in the packaged format, reading the key metadata and verifying whether the MIME file type is correct or not; verifying whether the data length in the mdat is consistent with the file size; and if the two pass, the verification is passed.

The specific method of step S4 includes the following substeps:

s4-5, from offset position i in stgs_offsetStarting accumulation calculation, judging whether stgs elements with accumulated data size being 0.9-1.1 times of that of the basic blocks exist from the current offset position, and if so, marking the positions of the stgs elements as i; otherwise, marking the position of the latest stgs element of the basic block which is more than 1.1 times as large as i; wherein i_offsetIs 0;

The specific method of step S4-1 is:

The specific method for transcoding by the transcoding unit in step S6 includes the following substeps:

In step S6-4, the specific method for transcoding by using the original video parameter information initialization decoder and the target parameter initialization encoder to be transcoded is as follows:

In a specific implementation process, as shown in fig. 2, a distributed transcoding system logic structure that can implement the method is composed of a file input component, a data fragmentation component, a database, and a transcoder cluster.

The file input component comprises a group of file uploading interfaces, and can finish the uploading work of the file data to be transcoded by a maximum bandwidth utilization rate through a multipoint uploading mode based on HTTP and TCP network protocols.

The fragment component realizes a rapid file segmentation mode, and after the fragment component works, a total transcoding task record is generated and recorded in a database, and meanwhile, series transcoding subtask information is added into a task queue.

The database uses a relational database PostgreSQL and a resident memory database Redis, and the former is used for storing detailed transcoding task information, task records in the transcoding process, real-time state information of each node of the distributed system and the like. Compared with the traditional relational database, the memory database (often called NoSQL) has higher speed for frequent reading and modification operations, so the memory database is used as a transcoding asynchronous queue carrier to store some video transcoding subtask information, key frame index positions and data offset positions in videos and the like.

The transcoding cluster comprises a Master node and a plurality of Worker transcoding units, wherein the Master node takes out the tasks to be processed from the task queue in an FIFO mode and delivers the task information to the transcoding units in an idle state for transcoding. And the cluster executes a distributed transcoding process, and when the task scheduler receives that the transcoding unit is changed from a busy state to an idle state, the task scheduler takes out a task to be processed from the head of the task queue and distributes the task to the corresponding transcoding unit for block transcoding. And after transcoding is finished, outputting a transcoded data block, and recording information such as the current block index and the video time stamp.

And finally, by reading the transcoding records, combining the transcoded output units in sequence to generate a complete video stream data, multiplexing the video stream and other streams according to a specified format to generate a complete transcoded file, and thus completing transcoding.

In an embodiment of the present invention, when the video is in a packaged format (e.g., MP4), the demultiplexing read video information is compared with the upload task information, the key metadata information is read, and the MIME file type is verified to be in a common format such as avc1, iso2, iso, MP41, and the like. And verifying whether the data length in the mdat is consistent with the file size, and when the data length of the mdat plus the current offset is greater than the total file length, considering that the current file is damaged, and not performing subsequent judgment. And when the mdat passes the verification, reading moov metadata continuously to verify the file completely). And if the verification fails, the current file of the task is damaged, and the subsequent transcoding operation is not continuously executed.

When the MP4 file is used as an example for decapsulation, the Video track metadata in MOOV data is read, the index information of the stss (sync Sample box) key frame list is read, the list information of stsz (Sample Size box) Sample Size is read, and the Offset list information of stco (chunk Offset box) block data is read.

And accumulating the read stss key frame list information and the size data of the stsz sampling block to generate a new GOP block size-based list index list, which is called stgs in the method. The newly generated stgs list has the same length as the stss list, and each element has a value stgs [ i ] ═ sum (stsz [0.. stss [ i ] -1]), where i > 1. sum () represents the cumulative summation function, which represents the cumulative sum of the ith stgs list element value equal to the block size of stsz from the first element value to the stss [ i ] -1 sample. For example, assuming that the list data of stss is (1, 127, 196, 242..) and the first accumulated data of stgs is (0, sum (stsz [0.. 126]), sum (stsz [0.. 195]), and sum (stsz 0.. 241 ])) in general, the length of the stsz list is much larger than the length of the stss list, it can be seen that the data in the stgs list is in a sequentially increasing trend, and the value of each term represents the cumulative sum of the sizes of the data blocks before the current GOP.

And partitioning the generated stgs according to the determined partition size, and partitioning according to the size of 0.9-1.1 times of the basic partition. Assuming that the current basic block size is 20MB, the corresponding partition data length interval is 18MB-22MB, and if stgs accumulates sequence data as follows: (0,2M,3.52M,5.53M,. multidot., 16.62M,19.28M,22.54M,23.92M,. multidot.), the first block location is 19.28M, and if the corresponding location of the 19.28MB data is i, the sample number sampling serial number j of the i-th location is looked up in the stss index table, so as to obtain the first block, and the stsz block size table and the stco block offset table are combined to extract 1-j block data and write the data into the disk to complete the first data block. And recording the current block information, namely the sequence number after the block division, the block size and the task information of the current block.

In some extreme cases the block size cannot be within a predetermined block, e.g. a base block size of 20MB corresponds to 18MB to 22MB, using a backward slicing strategy. If the i-th GOP cumulative data size is 17.6MB and the i + 1-th cumulative size is 22.1MB, the block size is divided using 22.1MB as the block size. Of course, the GOP block granularity in the video data according to the parameter corresponding to each block size in the test table usually does not reach such a large span, and here, only a processing measure which does not satisfy the situation is provided, and the occurrence probability is extremely low in practical operation.

When the original video stream of the operation file which is not encapsulated, such as h.264, h.265, etc., does not contain the container encapsulation data, such as the stss, stco, stsz information under the MP4 file, the media description information cannot be directly read through the encapsulation container, because the original video stream information is the original video stream information. The total block logic of the naked video stream is approximately similar, the basic block size is still determined according to the data in the table above according to the video parameters, 0.95 is used as a basic offset coefficient, then the subsequent data of an offset data point is read and decoded, the current offset is recorded as a separation point when an IDR frame is decoded, the section of data is extracted, and the current block information (the sequence number after the block, the block size and the task information to which the block belongs) is recorded for blocking. Taking 720P and 30FPS videos as an example, corresponding to a basic block size of 20MB, firstly, a data file reading pointer is shifted to a position of 20 × 0.95 ═ 19MB, then, subsequent data blocks are continuously read, for example, subsequent 2M data is continuously read, when decoding, a next I frame data mark appears at a position of 20.12MB (19MB +1.12MB), and the first block point is 20.12MB, then, data division is continuously performed according to the method, the size of each data block after division is approximate by the division method, and complete GOP data in each division block is ensured, and image loss after decoding is avoided.

And after the division is finished, outputting a source data block to generate a task to be transcoded, wherein the task information comprises task generation time, an index of the original data block, original video parameters (comprising a frame rate, an image resolution and an encoding mode), a video timestamp, video duration (not including the item when the code stream to be transcoded is a bare stream), transcoded video parameters (comprising the frame rate, the image resolution and the encoding mode) and an output serial number (corresponding to the original data block).

The Worker node receives the transcoding task, the node state is updated to be a busy state from idle, task information is analyzed, and the task information comprises an original data block path, original video parameters (frame rate, image resolution and coding mode), transcoded video parameters (frame rate, image resolution and coding mode) and a transcoded output sequence number.

And after transcoding is finished, writing the transcoded data into a file system, exiting the transcoding process, releasing the initialized codec, closing the opened source video file, and cleaning the memory and related resources. Writing a transcoding output file, updating transcoding task information, and updating the node state from busy to idle until a block transcoding task is completed. And when all the sub-blocks are completely transcoded under one transcoding task, executing audio and video merging operation: firstly, the transcoded sub-videos are synthesized into a complete video stream, and then the complete video stream is encapsulated with other previously separated stream data to generate a complete transcoded data file.

In summary, the present invention can rapidly complete data fragmentation based on data block fragmentation, and simultaneously add GOP detection, thereby ensuring imaging quality. Frequent task loading increases overall performance consumption compared to full GOP fragmentation, which generates a large number of small data tasks. By increasing the calculation amount of the subtasks and roughly setting the task granularity, the system overhead caused by frequent tasks is reduced.

Claims

1. A distributed transcoding method, comprising the steps of:

s4, slicing the video stream data;

s8, synthesizing the transcoded sub-videos into a complete video stream, and encapsulating the complete video stream and other separated stream data to generate a complete transcoded data file;

the specific method of step S4 includes the following substeps:

s4-1, obtaining the size of the basic block according to the resolution and frame rate of the video stream data;

2. The distributed transcoding method of claim 1, wherein the target video file basic information in step S1 includes a video type, a video duration, an encoding mode of a video stream, a video encoding parameter, a video encoding level, an image resolution, an image sampling mode, an image frame rate, an image bit rate, an encoding mode of an audio stream, a sampling rate of an audio stream, a video file byte length, a video transcoding parameter, a transcoded video resolution, a transcoded video frame rate, and a transcoded video encoding mode.

3. The distributed transcoding method of claim 2, wherein the specific method for verifying the uploaded file according to the upload task in step S2 is as follows:

4. The distributed transcoding method of claim 1, wherein the specific method in step S4-1 is:

5. The distributed transcoding method of claim 1, wherein the specific method for transcoding by the transcoding unit in step S6 comprises the following sub-steps:

s6-4, according to the original video frame rate, the original video image resolution, the original video coding mode, the transcoded video frame rate, the transcoded video image resolution and the transcoded video coding mode, adopting an original video parameter information initialization decoder and a target parameter initialization encoder to be transcoded to transcode; entering step S6-5;

6. The distributed transcoding method of claim 5, wherein the specific method for transcoding by using the original video parameter information initialization decoder and the target parameter initialization encoder to be transcoded in step S6-4 is as follows: