CN108848384A - A kind of efficient parallel code-transferring method towards multi-core platform - Google Patents
A kind of efficient parallel code-transferring method towards multi-core platform Download PDFInfo
- Publication number
- CN108848384A CN108848384A CN201810628187.8A CN201810628187A CN108848384A CN 108848384 A CN108848384 A CN 108848384A CN 201810628187 A CN201810628187 A CN 201810628187A CN 108848384 A CN108848384 A CN 108848384A
- Authority
- CN
- China
- Prior art keywords
- thread
- decoding
- coding
- stage
- gop
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000008569 process Effects 0.000 claims abstract description 22
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 230000015654 memory Effects 0.000 claims description 4
- 238000012423 maintenance Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 238000012986 modification Methods 0.000 claims description 2
- 230000000750 progressive effect Effects 0.000 claims description 2
- 230000000717 retained effect Effects 0.000 claims description 2
- 102100037812 Medium-wave-sensitive opsin 1 Human genes 0.000 claims 19
- 230000006870 function Effects 0.000 description 7
- 230000009466 transformation Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013442 quality metrics Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234309—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440218—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention belongs to field of computer technology, specially a kind of efficient parallel code-transferring method towards multi-core platform.In the present invention, video code conversion includes decoding and encoding two stages, and energy level includes two modules of decoding and coding parallel, and data level includes GOP grades and frame level parallel;One section of buffer area is equipped in system to store the image arranged by display order, coding thread is taken out continuous one section(Coding unit)Absolute coding is carried out, and generates intermediate temporary file;Finally, temporary file can be merged into target video;After video input, thread is waken up and executes transcoding task;In transcoding process, the stripping and slicing of thread experience, coding, merges this four-stage at decoding, and the thread of different phase is parallel in pipelined fashion;The result that the previous stage generates is supplied to latter stage use, and by special data structure managing;The present invention can make full use of the efficiency of the computing resource raising transcoding of bottom multicore hardware under the premise of guaranteeing video quality.
Description
Technical field
The invention belongs to field of computer technology, and in particular to a kind of efficient parallel code-transferring method towards multi-core platform,
The computing resource of bottom multicore hardware is made full use of to improve the efficiency of transcoding under the premise of guaranteeing video quality.
Background technique
With the rapid development of internet and multimedia information, data start explosive growth, and internet is every
The mass data of its transmission, digital video account for main part.According to CISCO in the network traffic data report of publication in 2017
It accuses, network total flow in 2016 is 1.15ZB(1ZB=10243TB), the ratio of video flow is 72%;Pre-estimation by 2021,
Total flow is up to 3.33ZB, and the ratio of video flow is even more to reach 82%.
The universal of digital video enriches people's lives, and people can be used mobile phone, apparatus such as computer and watch view online
Frequently.However, video is needed in playing process in view of compatibility issues such as resolution ratio, code rate, coded formats.For example, video
It to play, be needed according to corresponding scaling in the equipment of different screen size;It is broadcast under the poor environment of network broadband
It puts, needs to reduce code rate;Played in specific player, need transform coding format, such as H.264, MPEG-4.Video code conversion
Technology is exactly to develop to solve the above-mentioned problems.
In order to allow user to watch video under various circumstances, service provider can first be regarded local Video Quality Metric at certain specification
Frequently, user is then transmitted to by network.By taking Netflix as an example, a video need to be transcoded into 120 targets according to different parameters
Video file is then transferred to user.When video code conversion, generally require to turn according to different resolution ratio, code rate, coded format
At multiple target video files.Transcoding needs to guarantee lower delay, such as 25 frames are per second above just can guarantee good user
Experience.Along with the application of transcoding inherently computation-intensive, these all bring huge challenge to Video service quotient, add
Fast video code conversion very it is necessary to.
Compatibility of the video under different scenes can be improved in transcoding technology, and according to different parameters, the same video can
To change into the target video of multiple format.For ordinary user, input video is transcoded into certain format target video very
Kind is common, i.e., single source single goal transcoding.For Video service quotient, need certain HD video by different transcoding parameters(Point
Resolution, code rate, coded format)Change into multichannel target video, i.e., the transcoding scene of single source multiple target.Either which kind of scene turns
Code, which generally requires lower delay, can just good user experience, such as must reach 25 frame per second, and transcoding usually needs
Guarantee the constant mass of target video.
The computing cost of transcoding is larger, and the transcoding frame per second of single core processor is not usually in ten frames hereinafter, be able to satisfy user's
Demand.The appearance of multi-core technology provides opportunity to transcoding acceleration, and has had relevant concurrent technique to add applied to transcoding
Speed can be mainly divided into GOP(Group of Picture, picture group)Rank and frame level are other parallel.Although GOP rank and
Row scalability is preferable, but develop and it is immature, there are problems that objective video quality decline.Frame level is still main parallel
The parallel scheme of stream is used by the mainstreams codec such as FFmpeg, x264, but their parallel scalability is poor, can not
Make full use of the computing resource of multi-core platform.
The present invention analyzes the concurrency based on GOP transcoding, devises a kind of efficient parallel code-transferring method, solves meter
Calculate the low problem of resource utilization.
Summary of the invention
The purpose of the present invention is to provide a kind of high efficient parallel transcodings towards multi-core platform of computing resource utilization rate
Method.
Efficient parallel code-transferring method provided by the invention towards multi-core platform is the independence using video GOP encoding and decoding
Property, under the premise of guaranteeing video image quality, the concurrency of video code conversion is excavated, makes full use of the calculating of bottom multicore hardware
Resource accelerates the process of transcoding.
Video code conversion includes decoding and encoding two stages, and energy level mainly includes decoding and encoding this coarseness mould parallel
Block, data level mainly include GOP grades and frame level parallel.GOP grades of parallel transcodings need in advance by video by closure GOP cutting, decoding
Thread obtains different closure GOP, and is decoded into original sequence.It is suitable by showing to store that one section of buffer area is had in system
The image of sequence arrangement, coding thread are taken out continuous one section(Coding unit)Absolute coding is carried out, and generates intermediate interim text
Part.Finally, temporary file can be merged into target video.
It is closed between GOP and data dependence is not present, so the scalability of this parallel mode is preferable, the present invention is based on close
The GOP of conjunction realizes efficient parallel trans-coding system.
Efficient parallel code-transferring method provided by the invention towards multi-core platform, frame are as shown in Figure 1.The present invention utilizes
Thread pool manages computing resource, when without transcoding task, thread suspend mode;After video input, thread, which is waken up and executes transcoding, appoints
Business.In transcoding process, the stripping and slicing of thread experience, coding, merges this four-stage at decoding, and the thread of different phase is with assembly line
Mode is parallel.The adjacent stage meets producers and consumers' relationship, and the result that the previous stage generates can be supplied to the latter
Stage uses, and by special data structure managing, such as the video section information that closure GOP queue storage stripping and slicing generates.
After transcoding threads are waken up, dicing stage is initially entered.The thread of dicing stage is video to be closed GOP as unit
Being cut into independently decoded section, other threads can immediately enter decoding stage.System uses the label of a stripping and slicing state
Come control only one thread can stripping and slicing, specific implementation can be parallel between the stage part be discussed in detail.Stripping and slicing thread will close
The block information that closing GOP indicates is put into closure GOP queue, and decodes thread and obtain block information from the queue and be decoded,
The two can execute parallel.
In decoding process, the original image that decoding generates is put into coding unit by thread.With existing parallel transcoding system
Unified sample, coding unit are to store the data structure of continuous one section of original image to carry out as a whole after being filled
Coding.In view of decoding intermediate data committed memory is larger, efficient parallel trans-coding system using in annular team to coding unit into
Row unified management.It decodes thread and coding thread and dynamic dispatching is carried out according to the state of circle queue, to guarantee encoding and decoding
Higher computing resource utilization rate is maintained in journey.
After having encoded the original image frame in a coding unit, coding thread can export in the section at temporary file.
If there is continuous one section of temporary file generates, coding thread can be responsible in advance merging these temporary files, avoid integrating
Used time is longer.Which temporary file is the present invention, which record using reorder table, is generated, and merges in advance to help to encode thread.Institute
After the completion of there are encoding tasks, then integrate.After file destination generates, transcoding task terminates, and thread is recycled by thread pool,
Into dormant state, transcoding task next time is waited.
In the present invention, the parallel transcoding, it is parallel by the way of assembly line to be primarily referred to as four stage of transcoding, each stage
Next stage can't be entered back into after being fully completed, but the adjacent stage parallel simultaneously can execute.Flowing water is parallel
Mode serially executes for eliminating stripping and slicing, decodes thread and frequently sleep, merge and serial execute asking for bring computing resource waste
Topic.
Firstly, dicing stage can execute parallel with decoding.In efficient parallel trans-coding system, all threads are by same
Entrance executes transcoding task.However, due to dicing stage discomfort merging rows, in order to ensure only one thread cuts video
Block, the label of one stripping and slicing state of system maintenance, the label share non-stripping and slicing, stripping and slicing carry out in, stripping and slicing three kinds of states are completed,
It uses respectivelyc 0 ,c 1 ,c 2 It indicates, as shown in Figure 5.
The label is initialized as when transcoding task startc 0 , when reading if there is a thread, this is labeled asc 0 , then
The thread is set toc 1 , and execute stripping and slicing task.When other threads read stripping and slicing label.The label has been configured toc 1 Orc 2 , it is then directly entered decoding stage, as shown in Figure 4.Thread needs to be carried out with lock to the reading or modification of stripping and slicing status indication
Protection, just can ensure that the atomicity of read operation.After thread has determined the task of oneself, stripping and slicing thread will by scan video
The block information of closure GOP is put into closure GOP queue, and decodes thread and obtain closure GOP information from the queue, indicates it
Video section be decoded.Therefore, stripping and slicing thread can be executed with decoding thread parallel.After stripping and slicing thread executes completion,
Stripping and slicing state can be set toc 2 , and enter decoding stage.
According to the state of annular coding unit queue, decoding stage and coding stage will do it dynamic dispatching.Due to decoding
The raw image data committed memory that process generates is more, is uniformly deposited using annular coding queue to original image herein
Storage.Coding unit is available free, is saturated two states, as shown in Figure 6.Idle state represents the not stored any original graph of coding unit
Picture;Saturation state represents the coding unit and is filled up by original image.The state of coding unit can be during encoding and decoding
Saturation state can be set to after decoding thread fills a full coding unit by carrying out switching at runtime;The coding of coding unit is appointed
After business terminates, coding unit can be set to idle state.
In the present invention, parallel transcoding further includes parallel in the stage in each stage of transcoding;Video is by a series of video frame structures
At the GOP of multiple closures can be divided into.The decoding process for being closed GOP is mutually indepedent, can be with parallel decoding.Decoded figure
It, can be to their parallel encodings as being divided into different image sequences by display order.Finally, the adjacent code sequence that coding generates
Column can be merged parallel with the method for merger.
Dicing stage:If video does not have frame index, stripping and slicing needs progressive scan video and is divided into closure GOP.Such as
Fruit video file includes frame index, then reads the frame index data of video, is that the time is long video slicing according to the number of cutting
Similar several segments are spent, and dicing position is sent to next stage --- decoding.
Decoding stage:It is decoded in order to prevent there are data dependence between thread, the present invention is single as cutting using closure GOP
The of member, and adjacent closure GOP has an overlapping of I frame, the last frame of previous closure GOP and the latter closure GOP
One frame is same frame, as shown in Figure 2.After closure GOP is decoded completely, last I frame is thrown away, to guarantee that the I frame of overlapping only can
Retained by the latter decoding GOP.The I frame at the last one decoding end GOP is not Chong Die with other decoding GOP, therefore the frame needs
Retain.
Merging phase:Parallel encoding generates many temporary files, and the method for present invention merger merges temporary file.
In order to reduce the read-write number to disk, temporary file carries out two-stage merger, as shown in Figure 3.Level-one temporary file is that coding is single
The temporary file that member generates;Second level temporary file is the file after once merging.
Further include parallel Data Rate Distribution in the present invention, is exactly distributed according to the code rate of input video to help parallel encoding
Data Rate Distribution, it is specific to press SATD using a kind of(Sum of Absolute Transformed Difference, Ha Deman change
The 4X4 prediction residual absolute value summation changed)Carry out the algorithm of Data Rate Distribution.
Since SATD uses half precision residual error data, if calculating SATD in an encoding process, need
Complicated prediction process is completed first, can bring biggish performance cost.It can be given birth to after inverse quantization and inverse transformation in view of decoding
At residual error, the calculating process of SATD is placed decoding stage by the present invention, thus need to only complete simple Ha Deman transformation and absolutely
Calculating to value, and avoid complicated prediction process.The process of Data Rate Distribution is carried out as shown in fig. 7, the present invention according to SATD
Input video is decoded;Then the SATD value of every frame is calculated, and uses it as the standard of complexity;Then for coding requirement
Corresponding code rate is distributed for coding unit;Video frame is finally recompiled according to the code rate of distribution.
Using code rate allocation method proposed by the present invention, the calculating of SATD is completed in decoding, due to decoded image
A large amount of memories are occupied, video re-encodes after cannot decoding completely, thus can not calculate SATD points of entire video before encoding
Cloth.Start after circle queue is filled due to encoding, the present invention sets being averaged in circle queue section in the initial state
Code rate is target bit rate, then accordingly distributes code rate according to the distribution of SATD.With the progress of coding, encoded coding unit
Number can be more and more, and the distribution curve of SATD also can be increasingly more complete, what Data Rate Distribution algorithm only needed to guarantee to inscribe when coding
Average bit rate is up to standard.
Detailed description of the invention
Fig. 1 is efficient parallel transcoding frame.
Fig. 2 is the GOP of closure.
Fig. 3 is video merging.
Fig. 4 is stripping and slicing and the parallel signal of decoding.
Fig. 5 is that stripping and slicing state is that symbol corresponds to table.
Fig. 6 is coding unit state transition diagram.
Fig. 7 is the parallel Data Rate Distribution signal of SATD.
Fig. 8 is stripping and slicing and decoded Parallel Implementation.
Fig. 9 is the dynamic dispatching process of decoding and coding.
Figure 10 is the process for encoding file mergences.
Figure 11 is the process of SATD distribution distribution.
Specific embodiment
In order to keep the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, specifically
Bright preferred implementation of the invention.Before this it should be noted that term used in present specification and claims or
It is the meaning in common meaning or dictionary that word, which is unable to limited interpretation, and should be based in order to illustrate its hair in the best way
The principle that bright people can suitably define the concept of term is construed to meaning and concept of the technical idea of the present invention.With
It, the structure indicated in embodiment and attached drawing documented by this specification is one of preferred embodiment, can not be complete
Quan represents technical idea of the invention, is able to carry out each of replacement it will therefore be appreciated that may exist for the present invention
Kind equivalent and variation.
FFmpeg is a powerful open source multi-media processing frame, and convertible, the more clock audio-video documents of editor are used
It is very extensive on the way.Here how we is based on FFmpeg encoding and decoding frame and POSIX multi-thread programming model if introducing, realize high
The parallel trans-coding system of effect.
1, parallel transcoding
In FFmpeg frame, we are by realizing clip_video, decode_closed_gop, scale_and_encode
With tetra- power functions of concatenate, corresponding to the stripping and slicing of parallel transcoding, decoding, coding, merge four-stage.Further,
We also need to realize the execution entrance after launch_transcoding function is waken up as thread, and each power function can lead to
Setting trans_ctx parameter is crossed to be called.
To realize stripping and slicing and decoded parallel, the variable record stripping and slicing shape of clip_status is defined in trans_ctx
State, the variable share CLIP_NOT_YET, tri- kinds of states of CLIPPING, CLIP_FIN.Simultaneously as the read-write of the variable is former
Sub-operation, therefore it is protected using mutual exclusion lock.Thread enter the execution stream after launch_transcoding as shown in figure 8,
If after thread obtains lock, the state for reading clip_status is CLIP_NOT_YET, then thread dispatching clip_video
Function executes stripping and slicing, and calls decode_closed_gop to be decoded after the completion of stripping and slicing.The line of execution stripping and slicing is not needed
Journey will directly decode.In the realization of clip_video, we use the av_seek_frame index functions I of FFmpeg offer
Frame obtains pts after the I frame decoding(Presentation Time Stamp)It records, after the completion of stripping and slicing, clip_status is set to
CLIP_FIN.
After thread enters decoding stage, traversal closure GOP queue is decoded the section, as shown in Figure 9.If acquisition is closed
GOP success is closed, and there are enough free code units, then thread will traverse closure GOP, and calls avcodec_
The decoding of decode_video2 function.If obtaining closure GOP failure, if stripping and slicing at this time has been completed, illustrate not have
New closure GOP is generated, then thread will enter coding stage, this is first entrance of the decoding scheduling to coding.If obtained
Closure GOP success is taken, but without enough free code units, illustrates that decoding speed is too fast, thread enters coding stage, this
It is second entrance of the decoding scheduling to coding.
After thread enters coding stage, traversal saturation coding unit, and avcodec_encode_video2 is called to be compiled
Code.The case where if it is multichannel, needs to be embedded in one layer of circulation again to traverse multiplex coding context, to realize that a frame image is pressed
The target of multichannel parameter coding.Certainly, if obtaining saturation coding unit failure, there are two kinds of situations, and one is decodings
It is over, new saturation coding unit there will be no to generate, enter merging phase at this time;Another kind is to decode and do not complete,
It is only sky in annular team, at this moment encodes thread and be rescheduled into decoding stage.
After coding unit generates temporary file, thread can't go to obtain next saturation coding unit at once, but sentence
The temporary file after whether one section being reordered that breaks merges in advance.Therefore, as shown in Figure 10, a temporary file is generated in coding
Afterwards, thread can enter merging phase, the label in corresponding reorder table is set as being completed, if the son that reorders at place
Section is all set, then these temporary files are just merged into second level temporary file in advance.Finally, again that second level is temporarily literary
Part is merged into target video.
2, the realization of SATD
The SATD of each frame is calculated in decoding stage, the summation of corresponding coding unit SATD is then counted, then analyzes it in video
Proportion in total SATD, to realize Data Rate Distribution.Although the SATD summation of input video, thread can not be obtained in advance
After only filling full circle queue, coding just will do it, so still suffering from sufficient distributed intelligence.In addition, with transcoding
Progress, obtained SATD information can be more and more, can reasonably for coding unit distribute code rate.
The process of SATD distribution distribution is as shown in figure 11.In the decoding process of H264, the calculating process of SATD is will be pre-
It surveys residual error and carries out Ha Deman transformation, then seek the sum of absolute value.FFmpeg call encoding and decoding library libavcodec, realization it is anti-
The function that transformation generates macroblock residuals can be stored in residual error in the residual array of H264Context structural body.Therefore, real
It only needs to carry out Ha Deman transformation to the residual array of the H264Context structural body when existing SATD.
Claims (5)
1. a kind of efficient parallel code-transferring method towards multi-core platform, which is characterized in that video code conversion includes decoding and encoding two
A stage, energy level include two modules of decoding and coding parallel, and data level includes GOP grades and frame level parallel;GOP grades of parallel transcodings
It needs in advance by video by closure GOP cutting, decoding thread obtains different closure GOP, and is decoded into original sequence;System
In be equipped with one section of buffer area to store the image arranged by display order, coding thread is taken out continuous one section i.e. coding unit
Absolute coding is carried out, and generates intermediate temporary file;Finally, temporary file is merged into target video.
2. the efficient parallel code-transferring method according to claim 1 towards multi-core platform, which is characterized in that video input
Afterwards, thread is waken up and executes transcoding task;In transcoding process, the stripping and slicing of thread experience, coding, merges this four-stage at decoding,
The thread of different phase is parallel in pipelined fashion;The result that the previous stage generates is supplied to latter stage use, and
By special data structure managing;
After transcoding threads are waken up, dicing stage is initially entered;Video is cut using being closed GOP as unit in the thread of dicing stage
Being cut into independently decoded section, other threads to immediately enter decoding stage;System is controlled using the label of a stripping and slicing state
Making only one thread being capable of stripping and slicing;The block information for being closed GOP expression is put into closure GOP queue by stripping and slicing thread, and is decoded
Thread obtains block information from the queue and is decoded, and the two executes parallel;
In decoding process, the original image that decoding generates is put into coding unit by thread;Coding unit is storage continuous one
The data structure of section original image is encoded as a whole after being filled;System is single to coding using circle queue
Member is managed collectively;It decodes thread and coding thread and dynamic dispatching is carried out according to the state of circle queue, to guarantee to compile solution
Higher computing resource utilization rate is maintained during code;
After having encoded the original image frame in a coding unit, coding thread exports in the section at temporary file;If there is
Continuous one section of temporary file generates, and coding thread is responsible in advance merging these temporary files;Which is recorded using reorder table
A little temporary files are generated, and are encoded thread with help and are merged in advance;After the completion of all encoding tasks, then integrate;Mesh
After marking file generated, transcoding task terminates, and thread is recycled by thread pool, into dormant state, waits transcoding task next time.
3. the efficient parallel code-transferring method according to claim 2 towards multi-core platform, which is characterized in that the quadravalence of transcoding
The mode of Duan Caiyong assembly line is parallel, i.e., the parallel execution simultaneously of adjacent stage;
Firstly, dicing stage can execute parallel with decoding;In system, all threads execute transcoding task by the same entrance;
However, in order to ensure only one thread carries out stripping and slicing to video, system maintenance one is cut due to dicing stage discomfort merging rows
The label of bulk state, the label be divided into non-stripping and slicing, stripping and slicing carry out in, stripping and slicing three kinds of states are completed, use respectivelyc 0 ,c 1 ,c 2 Table
Show;
The label is initialized as when transcoding task startc 0 , when reading if there is a thread, this is labeled asc 0 , then the line
Journey is set toc 1 , and execute stripping and slicing task;When other threads read stripping and slicing label, which has been configured toc 1 Orc 2 , in
It is to be directly entered decoding stage;Thread protects the reading or modification of stripping and slicing status indication with lock;When thread has determined certainly
After oneself task, the block information for being closed GOP is put into closure GOP queue by scan video by stripping and slicing thread, and decodes thread
Closure GOP information is obtained from the queue, the video section indicated it is decoded;That is stripping and slicing thread and decoding thread parallel
It executes;After stripping and slicing thread executes completion, stripping and slicing state is set toc 2 , and enter decoding stage;
According to the state of annular coding unit queue, decoding stage and coding stage carry out dynamic dispatching;Since decoding process produces
Raw raw image data committed memory is more, carries out unified storage to original image using annular coding queue;Coding unit
Available free, saturation two states;Idle state represents the not stored any original image of coding unit;Saturation state represents the coding
Unit is filled up by original image;The state of coding unit carries out switching at runtime during encoding and decoding, and decoding thread is filled out
After a coding unit, it is set to saturation state;After the encoding tasks of coding unit terminate, coding unit is set to
Idle state.
4. the efficient parallel code-transferring method according to claim 3 towards multi-core platform, which is characterized in that further include transcoding
Each stage stage in it is parallel;
Dicing stage:If video does not have frame index, stripping and slicing needs progressive scan video and is divided into closure GOP;If depending on
Frequency file includes frame index, then reads the frame index data of video, is time span phase video slicing according to the number of cutting
Close several segments, and dicing position is sent to next stage --- decoding;
Decoding stage:Using closure GOP as cutter unit, there is the overlapping of I frame, previous closure in adjacent closure GOP
The first frame of last frame and the latter the closure GOP of GOP is same frame;After closure GOP is decoded completely, last I is thrown away
Frame is retained with guaranteeing that the I frame of overlapping only can decode GOP by the latter;The last one decoding the end GOP I frame not with it is other
GOP overlapping is decoded, therefore the frame needs to retain;
Merging phase:Parallel encoding generates many temporary files, merges temporary file with the method for merger;In order to reduce pair
The read-write number of disk, temporary file carry out two-stage merger;Level-one temporary file is the temporary file that coding unit generates;Second level
Temporary file is the file after once merging.
5. the efficient parallel code-transferring method according to claim 4 towards multi-core platform, which is characterized in that further include parallel
Data Rate Distribution, is exactly distributed the Data Rate Distribution to help parallel encoding according to the code rate of input video, carries out code using by SATD
The algorithm of rate distribution, concrete operations are as follows:
Before encoding to data, the SATD value of the every frame of video is calculated first, it is right then using SATD as the standard of complexity
Coding unit distributes corresponding code rate;Here the calculating process of SATD is placed into decoding stage, since coding must be in annular
Queue just starts after being filled, and therefore, can set the average bit rate in circle queue section as object code in the initial state
Then rate accordingly distributes code rate according to the distribution of SATD;With the progress of coding, encoded coding unit number is more and more,
The distribution curve of SATD also can be increasingly more complete, and Data Rate Distribution algorithm only needs to guarantee that the average bit rate inscribed when coding is up to standard i.e.
It can.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810628187.8A CN108848384A (en) | 2018-06-19 | 2018-06-19 | A kind of efficient parallel code-transferring method towards multi-core platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810628187.8A CN108848384A (en) | 2018-06-19 | 2018-06-19 | A kind of efficient parallel code-transferring method towards multi-core platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108848384A true CN108848384A (en) | 2018-11-20 |
Family
ID=64202221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810628187.8A Pending CN108848384A (en) | 2018-06-19 | 2018-06-19 | A kind of efficient parallel code-transferring method towards multi-core platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108848384A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110602122A (en) * | 2019-09-20 | 2019-12-20 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN110996172A (en) * | 2019-12-17 | 2020-04-10 | 杭州当虹科技股份有限公司 | Method for quickly synthesizing 4K MXF file |
CN111343503A (en) * | 2020-03-31 | 2020-06-26 | 北京金山云网络技术有限公司 | Video transcoding method and device, electronic equipment and storage medium |
CN112637634A (en) * | 2020-12-24 | 2021-04-09 | 北京睿芯高通量科技有限公司 | High-concurrency video processing method and system for multi-process shared data |
CN112822494A (en) * | 2020-12-30 | 2021-05-18 | 稿定(厦门)科技有限公司 | Double-buffer coding system and control method thereof |
CN114245143A (en) * | 2020-09-09 | 2022-03-25 | 阿里巴巴集团控股有限公司 | Encoding method, device, system, electronic device and storage medium |
CN114697675A (en) * | 2020-12-25 | 2022-07-01 | 扬智科技股份有限公司 | Decoding display system and memory access method thereof |
CN115297328A (en) * | 2022-10-10 | 2022-11-04 | 湖南马栏山视频先进技术研究院有限公司 | Multi-node parallel video transcoding method facing distributed cluster |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101098483A (en) * | 2007-07-19 | 2008-01-02 | 上海交通大学 | Video cluster transcoding system using image group structure as parallel processing element |
WO2013165088A1 (en) * | 2012-05-02 | 2013-11-07 | Samsung Electronics Co., Ltd. | Distributed transcoding apparatus and method using multiple servers |
CN104469370A (en) * | 2013-09-17 | 2015-03-25 | 中国普天信息产业股份有限公司 | Video transcode method and device |
CN105451031A (en) * | 2015-11-18 | 2016-03-30 | 腾讯科技(深圳)有限公司 | Video transcoding method and system thereof |
CN106254867A (en) * | 2016-08-08 | 2016-12-21 | 暴风集团股份有限公司 | Based on picture group, video file is carried out the method and system of transcoding |
-
2018
- 2018-06-19 CN CN201810628187.8A patent/CN108848384A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101098483A (en) * | 2007-07-19 | 2008-01-02 | 上海交通大学 | Video cluster transcoding system using image group structure as parallel processing element |
WO2013165088A1 (en) * | 2012-05-02 | 2013-11-07 | Samsung Electronics Co., Ltd. | Distributed transcoding apparatus and method using multiple servers |
CN104469370A (en) * | 2013-09-17 | 2015-03-25 | 中国普天信息产业股份有限公司 | Video transcode method and device |
CN105451031A (en) * | 2015-11-18 | 2016-03-30 | 腾讯科技(深圳)有限公司 | Video transcoding method and system thereof |
CN106254867A (en) * | 2016-08-08 | 2016-12-21 | 暴风集团股份有限公司 | Based on picture group, video file is carried out the method and system of transcoding |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110602122A (en) * | 2019-09-20 | 2019-12-20 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN110996172A (en) * | 2019-12-17 | 2020-04-10 | 杭州当虹科技股份有限公司 | Method for quickly synthesizing 4K MXF file |
CN110996172B (en) * | 2019-12-17 | 2022-01-11 | 杭州当虹科技股份有限公司 | Method for quickly synthesizing 4K MXF file |
CN111343503A (en) * | 2020-03-31 | 2020-06-26 | 北京金山云网络技术有限公司 | Video transcoding method and device, electronic equipment and storage medium |
CN111343503B (en) * | 2020-03-31 | 2022-03-04 | 北京金山云网络技术有限公司 | Video transcoding method and device, electronic equipment and storage medium |
CN114245143A (en) * | 2020-09-09 | 2022-03-25 | 阿里巴巴集团控股有限公司 | Encoding method, device, system, electronic device and storage medium |
CN112637634A (en) * | 2020-12-24 | 2021-04-09 | 北京睿芯高通量科技有限公司 | High-concurrency video processing method and system for multi-process shared data |
CN114697675A (en) * | 2020-12-25 | 2022-07-01 | 扬智科技股份有限公司 | Decoding display system and memory access method thereof |
CN114697675B (en) * | 2020-12-25 | 2024-04-05 | 扬智科技股份有限公司 | Decoding display system and memory access method thereof |
CN112822494A (en) * | 2020-12-30 | 2021-05-18 | 稿定(厦门)科技有限公司 | Double-buffer coding system and control method thereof |
CN115297328A (en) * | 2022-10-10 | 2022-11-04 | 湖南马栏山视频先进技术研究院有限公司 | Multi-node parallel video transcoding method facing distributed cluster |
CN115297328B (en) * | 2022-10-10 | 2023-01-20 | 湖南马栏山视频先进技术研究院有限公司 | Multi-node parallel video transcoding method facing distributed cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108848384A (en) | A kind of efficient parallel code-transferring method towards multi-core platform | |
CN103621085B (en) | Reduce method and the computing system of the delay in video decode | |
CN102150425B (en) | System and method for decoding using parallel processing | |
US8170120B2 (en) | Information processing apparatus and information processing method | |
CN101895765B (en) | Transcoder, recorder, and transcoding method | |
US20070286289A1 (en) | Information-processing apparatus, information-processsing method, recording medium and program | |
WO2017107442A1 (en) | Video transcoding method and device | |
CN102301710A (en) | Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming | |
CN104205834A (en) | Method and apparatus for video encoding for each spatial sub-area, and method and apparatus for video decoding for each spatial sub-area | |
CN102447906A (en) | Low-latency video decoding | |
CN102741830A (en) | Systems and methods for a client-side remote presentation of a multimedia stream | |
CN1469645A (en) | Method and apparatus for regenerating image and image recording device | |
JP2007221323A (en) | Method for processing information, method for displaying thumbnail of moving picture, decoding device, and information processor | |
CN104469370A (en) | Video transcode method and device | |
CN102984465A (en) | Program synthesis system and method applicable to networked intelligent digital media | |
Heikkinen et al. | Distributed multimedia content analysis with MapReduce | |
CN100556140C (en) | Moving picture re-encoding apparatus, moving picture editing apparatus and method thereof | |
CN113271467B (en) | Ultra-high-definition video layered coding and decoding method supporting efficient editing | |
CN107079159A (en) | The method and apparatus of parallel video decoding based on multiple nucleus system | |
CN108886638A (en) | Transcriber and reproducting method and file creating apparatus and document generating method | |
CN101094368B (en) | Reproduction apparatus and reproduction method | |
CN106454369B (en) | Dynamic image prediction decoding method, dynamic image prediction decoding device | |
CN101296346B (en) | Image data recording/playback device, system, and method | |
Richter et al. | Parallelization and multi-threaded latency constrained parallel coding of JPEG XS | |
CN100518320C (en) | Signal reproducing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181120 |
|
WD01 | Invention patent application deemed withdrawn after publication |