CN109168032A

CN109168032A - Processing method, terminal, server and the storage medium of video data

Info

Publication number: CN109168032A
Application number: CN201811337105.0A
Authority: CN
Inventors: 黄书敏
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2018-11-12
Filing date: 2018-11-12
Publication date: 2019-01-08
Anticipated expiration: 2038-11-12
Also published as: CN109168032B

Abstract

The invention discloses a kind of processing method of video data, terminal, server and storage mediums, belong to technical field of data processing.The embodiment of the present invention obtains the target area information of an at least frame raw video image by the first equipment, and during the first equipment generates video data stream, the video data stream generated is set to carry corresponding target area information, after receiving the video data stream so as to the second equipment, required target area information can be directly extracted from the video data stream, it avoids the second equipment and is again based on process that relevant video image obtains this complexity of target area information, data processing time is greatly saved, reduces system burden.

Description

Processing method, terminal, server and the storage medium of video data

Technical field

The present invention relates to technical field of data processing, in particular to a kind of processing method of video data, terminal, server And storage medium.

Background technique

It is more and more to the processing method of video data with the continuous development of data processing technique, for example, in order to adapt to The processing capacity of different network bandwidth or different terminals needs to carry out transcoding processing to corresponding video data, for not Same user demand, it is also possible to need to carry out mixed flow processing to corresponding video data.Journey is being treated to video data In, the target area of corresponding video image can be identified according to demand, for example, can be carried out to area-of-interest Identification, so that can distribute more code rates during Video coding to the area-of-interest, improve the matter of Video coding Amount.

Currently, the processing method of common video data are as follows: according to the recognition rule of setting, to an at least frame original video Image carries out target area identification, and to this, at least a frame raw video image is encoded based on the target area recognized, So that more code rates are distributed to the target area in cataloged procedure, to obtain corresponding video data stream.And then when to the view When frequency data stream carries out transcoding processing, first the video data stream is decoded, corresponding video image is obtained, then according to knowledge It is irregular, target area identification is carried out to the video image again, based on the target area again identified that, according to different Target bit rate recodes to the video image, finally obtains target video data stream corresponding with target bit rate.

Based on the processing method of above-mentioned video data, is encoded and recoded to an at least frame raw video image During, it needs repeatedly to carry out target area identification to video image, the process of target area identification is complex and consumes When it is longer, therefore, repeatedly carry out target area identification considerably increase system burden.

Summary of the invention

The embodiment of the invention provides a kind of processing method of video data, terminal, server and storage mediums, can solve Certainly need the problem of target area identification repeatedly is carried out to video image.The technical solution is as follows:

On the one hand, a kind of processing method of video data is provided, which comprises

Obtain an at least frame raw video image；

Based on an at least frame raw video image, the target area letter of an at least frame raw video image is obtained Breath；

Based on the target area information of an at least frame raw video image, to an at least frame raw video image It is encoded, generates video data stream, the video data stream carries the target area of an at least frame raw video image Information；

The video data stream is sent to the second equipment.

In a kind of possible implementation, the target area information based on an at least frame raw video image, An at least frame raw video image is encoded, video data stream is generated, the video data stream carrying is described at least The target area information of one frame raw video image includes:

To at least target area information of a frame raw video image and an at least frame raw video image into Row coding, generates at least one first data packet for carrying at least one target area mark, at least one described target area Mark is encoded to obtain by an at least frame raw video image；

Based at least one first data packet of described at least one target area of carrying mark, the video data is generated Stream.

The target area information of an at least frame raw video image is encoded, at least one second data is generated Packet；

At least one described raw video image is encoded, at least one first data packet is generated；

Every the first data packet of preset number, it is inserted into second data packet, generates the video data stream.

Video data stream is received, the video data stream carries the target area information of an at least frame raw video image；

From the video data stream, the target area information of an at least frame raw video image is extracted；

The video data stream is decoded, the corresponding video image of the video data stream is generated；

Based on the corresponding target area information of the video data stream, to the corresponding video image of the video data stream into Row is recoded, and target video data stream is generated.

It is described to be based on the video data stream in a kind of possible implementation, extract an at least frame original video The target area information of image includes:

Based at least one field of at least one the first data packet in the video data stream, at least one target is extracted Area identification；

At least one described target area mark is decoded, the target of an at least frame raw video image is generated Area information.

Based on the first data packet of at least one of described video data stream and at least one second data packet, every default The first data packet of number is decoded the second data packet after the first data packet of the preset number, described in generation At least target area information of a frame raw video image.

At least two-path video data flow is received, every road video data stream carries the target area of an at least frame raw video image Domain information；

From at least two-path video data flow, the corresponding at least frame original video figure of every road video data stream is extracted The target area information of picture；

Every road video data stream is decoded, at least corresponding video figure of two-path video data flow is generated Picture；

The corresponding video image of at least two-path video data flow is merged, target video image is generated；

Based on the corresponding target area information of at least two-path video data flow, weight is carried out to the target video image Coding generates target video data stream.

It is described based on at least two-path video data flow in a kind of possible implementation, extract every road video data The target area information for flowing a corresponding at least frame raw video image includes:

Based at least one field of at least one the first data packet in every road video data stream, extraction is described at least Corresponding at least one target area mark of two-path video data flow；

At least one corresponding target area of every road video data stream mark is decoded, generates described at least two The target area information of the corresponding at least frame raw video image of road video data stream.

Based on the first data packet of at least one of every road video data stream and at least one second data packet, every The first data packet of preset number is decoded the second data packet after the first data packet of the preset number, generates The target area information of an at least frame raw video image in at least two-path video data flow.

On the one hand, a kind of processing unit of video data is provided, described device includes:

Module is obtained, for obtaining an at least frame raw video image；

The acquisition module is also used to that it is original to obtain an at least frame based on an at least frame raw video image The target area information of video image；

Generation module, for the target area information based on an at least frame raw video image, to described at least one Frame raw video image is encoded, and video data stream is generated, and the video data stream carries an at least frame original video The target area information of image；

Sending module, for sending the video data stream to the second equipment.

In a kind of possible implementation, the generation module is used for:

An at least frame raw video image is encoded, at least one first data packet is generated；

Receiving module, for receiving video data stream, the video data stream carries an at least frame raw video image Target area information；

Extraction module extracts the target area of an at least frame raw video image for being based on the video data stream Domain information；

Decoder module generates the corresponding video figure of the video data stream for being decoded to the video data stream Picture；

Recodification module, for the target area information based on an at least frame raw video image, to the video The corresponding video image of data flow is recoded, and target video data stream is generated.

In a kind of possible implementation, the extraction module is used for:

Receiving module, for receiving at least two-path video data flow, every road video data stream carries the original view of an at least frame The target area information of frequency image；

Extraction module, for it is corresponding at least to extract every road video data stream based on at least two-path video data flow The target area information of one frame raw video image；

Decoder module generates at least two-path video data flow for being decoded to every road video data stream Corresponding video image；

Merging module generates target for merging the corresponding video image of at least two-path video data flow Video image；

Recodification module, for being based on at least corresponding target area information of two-path video data flow, to the mesh Mark video image is recoded, and target video data stream is generated.

In a kind of possible implementation, the extraction module is used for:

On the one hand, provide a kind of terminal, the terminal includes processor and memory, be stored in the memory to A few instruction, described instruction are loaded as the processor and are executed to realize as performed by the processing method of above-mentioned video data Operation.

On the one hand, a kind of server is provided, the server includes processor and memory, is stored in the memory There is at least one instruction, described instruction is loaded by the processor and executed to realize the processing method institute such as above-mentioned video data The operation of execution.

On the one hand, a kind of computer readable storage medium is provided, at least one instruction is stored in the storage medium, Described instruction is loaded as processor and is executed to realize the operation as performed by the processing method of above-mentioned video data.

The embodiment of the present invention obtains the target area information of an at least frame raw video image by the first equipment, and the During one equipment generates video data stream, the video data stream generated is made to carry corresponding target area information, so as to the After two equipment receive the video data stream, required target area information can be directly extracted from the video data stream, is kept away The second equipment is exempted from and has been again based on process that relevant video image obtains this complexity of target area information, number is greatly saved According to the processing time, system burden is reduced.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention；

Fig. 3 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention；

Fig. 4 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention；

Fig. 5 is the flow chart of a kind of pair of encoding video pictures provided in an embodiment of the present invention and transcoding；

Fig. 6 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention；

Fig. 7 is the flow chart of a kind of pair of encoding video pictures provided in an embodiment of the present invention and mixed flow；

Fig. 8 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention；

Figure 10 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention；

Figure 11 is a kind of structural block diagram of terminal provided in an embodiment of the present invention；

Figure 12 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Fig. 1 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention, the place of the video data Reason method can be applied in the first equipment.Referring to Fig. 1, which includes:

101, an at least frame raw video image is obtained.

102, it is based on an at least frame raw video image, obtains the target area letter of an at least frame raw video image Breath.

103, the target area information based on an at least frame raw video image, to an at least frame raw video image It is encoded, generates video data stream, which carries the target area information of an at least frame raw video image.

104, the video data stream is sent to the second equipment.

In some embodiments, should target area information based on an at least frame raw video image, to this at least one Frame raw video image is encoded, and video data stream is generated, which carries an at least frame raw video image Target area information include:

To this at least the target area information of a frame raw video image and this at least a frame raw video image is compiled Code, generate carry at least one target area mark at least one first data packet, at least one target area mark by At least a frame raw video image encodes to obtain for this；

Based at least one first data packet of at least one target area of carrying mark, the video data stream is generated.

To this, at least a frame raw video image is encoded, and generates at least one first data packet；

All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.

Fig. 2 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention, the place of the video data Reason method can be applied in the second equipment.Referring to fig. 2, which includes:

201, video data stream is received, which carries the target area letter of an at least frame raw video image Breath.

202, from the video data stream, the target area information of an at least frame raw video image is extracted.

203, the video data stream is decoded, generates the corresponding video image of the video data stream.

204, based on the video data stream carry target area information, to the corresponding video image of the video data stream into Row is recoded, and target video data stream is generated.

In some embodiments, it should be based on the video data stream, extract the target area of an at least frame raw video image Domain information includes:

Based at least one field of at least one the first data packet in the video data stream, at least one target area is extracted Domain identifier；

At least one target area mark is decoded, the target area of an at least frame raw video image is obtained Information.

Based on the first data packet of at least one of the video data stream and at least one second data packet, every present count Mesh the first data packet is decoded the second data packet after the first data packet of the preset number, generate this at least one The target area information of frame raw video image.

Fig. 3 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention, the place of the video data Reason method can be applied in the second equipment.Referring to Fig. 3, which includes:

301, at least two-path video data flow is received, every road video data stream carries the mesh of an at least frame raw video image Mark area information.

302, from this at least two-path video data flow, the corresponding at least frame original video of every road video data stream is extracted The target area information of image.

303, every road video data stream is decoded, generates at least corresponding video figure of two-path video data flow Picture.

304, at least corresponding video image of two-path video data flow is merged, generates target video image.

305, it is based on at least corresponding target area information of two-path video data flow, weight is carried out to the target video image Coding generates target video data stream.

In some embodiments, it should be based on at least two-path video data flow, it is corresponding extremely to extract every road video data stream The target area information of a frame raw video image includes: less

Based at least one field of at least one the first data packet in every road video data stream, at least two-way is extracted Corresponding at least one target area mark of video data stream；

At least one corresponding target area of every road video data stream mark is decoded, obtaining this, at least two-way regards The target area information of the corresponding at least frame raw video image of frequency data stream.

Based on the first data packet of at least one of every road video data stream and at least one second data packet, every pre- If the first data packet of number is decoded the second data packet after the first data packet of the preset number, generates this extremely The target area information of an at least frame raw video image in few two-path video data flow.

Fig. 4 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention, the place of the video data Reason method is illustrated so that the first equipment and the second equipment interact as an example, wherein the first equipment has encoding function, Second equipment has the function of transcoding.Referring to fig. 4, which includes:

401, the first equipment obtains an at least frame raw video image.

In embodiments of the present invention, which has the function of video image acquisition and encoding function, first equipment Function can be obtained by the video image, obtain an at least frame raw video image.Wherein, an at least frame original video figure As being that un-encoded wait is handled, the video image that the first equipment is originally taken.

By taking first equipment is terminal as an example, multimedia client, such as live streaming client can be installed in the terminal, The multimedia client can acquire an at least frame raw video image in real time by the camera in terminal.Wherein, terminal can First to get an at least frame raw video image, then to this, at least a frame raw video image is encoded again.Certainly, As soon as terminal can also be often to get a raw video image, to corresponding encoding function is passed through, to an original video figure As being encoded.

Certainly, which is also possible to server, which can receive at least frame that any terminal is sent Raw video image, and at least frame raw video image real-time perfoming received is compiled based on the encoding function on server Code.Certainly, which can also first get an at least frame raw video image, then again to an at least frame original video Image is encoded.Concrete form and an acquisition at least frame original video figure of the embodiment of the present invention at this to first equipment The detailed process of picture is without limitation.

402, the first equipment is based on an at least frame raw video image, obtains the mesh of an at least frame raw video image Mark area information.

In embodiments of the present invention, target area refers to the image district for needing emphasis to handle on each raw video image Domain is based on the target area, and the first equipment, can be with emphasis to the target area when encoding to each raw video image It is analyzed, and distributes more code rates to the target area, to increase the encoding precision of the target area, improve binary encoding Quality.Target area information is the information in relation to the target area, which can be used to indicate that each original Whether the correspondence macro block on video image belongs to target area, which can be used for indicating each original video The bias of the importance of correspondence macro block on image or corresponding macro block.Certainly, which may be other Information in relation to the target area, the embodiment of the present invention at this to the particular content of the target area information without limitation.

Specifically, which can be area-of-interest, which can be the required emphasis of user The region of concern, or the main part in correspondence image, for example, the area-of-interest can be face, certainly, the mesh Marking region can also be other setting regions, and the embodiment of the present invention is it is not limited here.As shown in figure 5, in the mistake of image procossing Cheng Zhong, the first equipment can know an above-mentioned at least frame raw video image by corresponding target area recognizer Not, to identify the target area in each raw video image.Wherein, the first equipment can not be advised by box, circle or Then the modes such as polygon sketch the contours the target area in each raw video image recognized.In turn, the first equipment can To extract target area corresponding with each target area based on the target area in each raw video image recognized Information.

By taking selective search (Selective Search) algorithm as an example, the extraction process of target area information is said Bright: the first equipment can run Selective Search algorithm to an above-mentioned at least frame raw video image, to each original Beginning video image carries out initial image segmentation, and each raw video image is divided at least one lesser candidate region, Then screening and merger are carried out at least one corresponding candidate region of each raw video image, target area will not be met The candidate region that domain requires is deleted, and the candidate region for meeting target area requirement is carried out merger.

For example, can the parameters pair such as color based at least one above-mentioned candidate region, texture, size and space be overlapping Similarity between at least one candidate region is calculated, and the target in each candidate region and database can also be calculated Similarity between region, for example, the similarity of the human face region stored in each candidate region and database can be calculated, with Determine whether the candidate region is required human face region.It may finally be based on the higher candidate region of similarity, obtained each The corresponding target area of raw video image.In turn, it can be based on each target area, obtained corresponding with each target area Target area information then can be with for example, the similarity between the target area stored in a target area and database is higher One target area is determined as more important target area, target area information corresponding with a target area is then It can be used to indicate that a target area is important area.

Certainly, in other embodiments, the first equipment can also be above-mentioned to identify by other target area recognizers Target area in an at least frame raw video image simultaneously obtains corresponding target area information, which can also be with For other information, the tool of specific algorithm and above-mentioned target area information of the embodiment of the present invention at this to target area identification Hold in vivo with concrete form without limitation.

It should be noted that first equipment can then pass through corresponding target often to get a raw video image Region recognition algorithm, to obtain the target area information of said one raw video image.Certainly, which can also be first Part or all raw video images to be processed are got, then by corresponding target area recognizer, in acquisition The target area information of part or all raw video images to be processed is stated, the embodiment of the present invention is it is not limited here.

403, the first equipment encodes the target area information of an at least frame raw video image, generates at least one A target area mark.

In embodiments of the present invention, the target area based at least one raw video image got in step 402 Information, the first equipment are carried in the video data stream that after at least a frame raw video image encodes, which is generated to this There is target area corresponding with each raw video image information, so as to the subsequent processes of related video datastream In, when relevant device needs corresponding target area information, required target can be directly extracted from above-mentioned video data stream Area information avoids again to this complicated process of associated video image operational objective region recognition algorithm, greatly reduces Processing time of video data, reduce the processing load of system.

In one embodiment, the first equipment can be by believing the corresponding target area of an at least frame raw video image Breath is encoded, at least one target area mark generated is incorporated into the video data stream ultimately generated, to realize view Frequency data stream carries the purpose of at least target area information of a frame raw video image.

Specifically, the first equipment can compress the corresponding target area information of each raw video image, will Above-mentioned target area information is converted into corresponding binary digit, wherein the corresponding binary digit is each original view The corresponding target area mark of the target area information of frequency image.Target area mark can be used to indicate that corresponding target area Significance level, for example, when the target area information indicates that corresponding target area is most important region, then to the target The target area mark that area information generates after being encoded can be digital " 1 ", when the target area information indicates corresponding When target area is normal areas, then the target area mark that information generates after encoding to the target area can be number “0”。

Certainly, in other embodiments, above-mentioned target area identifies its that can be also used for indicating corresponding target area His target area information, and, corresponding target area information can also be identified by other means, and the embodiment of the present invention is to this The specific expression content and be specifically identified mode without limitation that target area identifies.

404, to this, at least a frame raw video image encodes the first equipment, generates at least one first data packet.

In embodiments of the present invention, at least frame raw video image got based on step 401, the first equipment can be with Each raw video image is encoded, by data volume it is huge this at least a frame raw video image is compressed into data volume Lesser video data stream, is transmitted convenient for Transmission system, saves transmission time.

Specifically, the first equipment can remove the redundancy letter of an above-mentioned at least frame raw video image by encoding function Breath, for example, the first equipment can remove the spatial redundancy information of each raw video image, time redundancy information, visual redundancy Information etc., to compress to an at least frame raw video image, which be can specifically include: prediction, transformation, quantization And the processes such as entropy coding, by the above process, the first equipment available corresponding with each raw video image at least one A code.

Based at least one obtained code, the first equipment can be arranged the code for setting quantity according to corresponding rule It is listed in together, and is packaged, for example, carrying out NAL (Network AbstractLayer, network abstract layer) to above-mentioned code It is packaged, first data packet is formed, for above-mentioned by an at least frame raw video image.Encode at least one generation generated Code, at least one available corresponding first data packet.It wherein, may include at least one generation in each first data packet Code, the embodiment of the present invention at this to the quantity of the code in each first data packet without limitation.

405, at this at least one first data packet, corresponding at least one target area of being inserted into identifies the first equipment, Generate the video data stream.

In embodiments of the present invention, at least one target area mark obtained based on above-mentioned steps 403 is obtained with step 404 The first data packet of at least one arrived, the first equipment, which can identify at least one target area, corresponding be filled in corresponding the In one data packet, so that corresponding first data packet carries corresponding target area mark, and it is based at least one target area Domain identifier and at least one first data packet generate video data stream, realize that an at least frame is carried in video data stream is original The purpose of the target area information of video image.

It specifically, include at least one generated based on target area at least one first data packet that the first equipment generates At least one first data packet that a first data packet and nontarget area generate, the first equipment can be based on target area above-mentioned It is corresponding at least one first data packet that domain generates to be inserted at least one target area mark.For example, the first equipment can incite somebody to action Each target area mark is incorporated into the first data packet corresponding with each target area mark, and certainly, the first equipment may be used also Each target area mark to be inserted in the rearmost position for identifying corresponding first data packet with each target area, so that base A corresponding target area is carried in the first data packet of each of target area generation to identify.

Based on the above process, at least one target area mark of generation is successively all inserted into corresponding first number by the first equipment After the corresponding position of packet, the first equipment can be based at least one target area mark and at least one first data Packet a series of processes such as is spliced and is packaged, ultimately generates corresponding video data stream, then carry in the video data stream Corresponding target area mark.Wherein, in an encoding process, the first equipment can be in an at least frame raw video image More code rates are distributed in target area, so that the first equipment is higher to the coding quality of target area.

Above-mentioned steps 403 to step 405 is that the first equipment is believed based on the target area of an at least frame raw video image Breath generates corresponding at least one target area mark, based on an at least frame raw video image generate it is corresponding at least one the One data packet, and at least one target area is identified in corresponding insertion at least one first data packet, finally to give birth to The process of corresponding target area information is carried at video data stream.

Except process involved in step 403 to step 405, another kind introduced below can be such that the video data stream generated takes Process with corresponding target area information:

(1) first equipment to the target area information of at least frame raw video image obtained based on step 402 into Row coding, generates at least one second data packet.Wherein, each second data packet is made of at least one corresponding code, should Code is to pass through the data that the encoding function of the first equipment compresses each raw video image.Specifically, One equipment can be predicted by the target area information to an at least frame raw video image, be converted, being quantified and entropy is compiled The processes such as code are obtained corresponding with each target area information with removing the related redundancy information in above-mentioned target area information At least one code, and then the first equipment can rule by least one corresponding code of each target area information to set It is arranged together, and is packaged, obtain at least one corresponding second data packet of at least one target area information.Wherein, The present invention to the specific queueing discipline of at least one above-mentioned character without limitation；

(2) first equipment to got based on step 401 this at least a frame raw video image encodes, generate extremely Few first data packet.Similarly, this will not be repeated here by the present invention for the detailed process and above-mentioned steps 404；

(3) first equipment are based at least one first data packet, every the first data packet of preset number, are inserted into one Second data packet generates the video data stream.Specifically, based at least one second data packet and step obtained in step (1) Suddenly at least one first data packet obtained in (2), the first equipment can be in the last positions of the first data packet of every preset number It sets, is inserted into second data packet, so that the first data packet of every preset number carries second data packet, wherein default Number can be any positive integer of the first equipment setting.It is of course also possible to which the first data packet of setting section does not carry the second number According to packet.The embodiment of the present invention at this to the specific value of preset number without limitation, and to carrying specific the of the second data packet One data packet is without limitation.

Based on the above process, at least one second data packet of generation is sequentially inserted into every preset number the by the first equipment After the corresponding position of one data packet, the first equipment can be based at least one second data packet and at least one first data Packet a series of processes such as is spliced and is packaged, ultimately generates corresponding video data stream, then the video data stream China takes With corresponding second data packet.Wherein, in an encoding process, the first equipment can be in an at least frame raw video image More code rates are distributed in target area, so that the first equipment is higher to the coding quality of target area.

Above-mentioned steps (1) to step (3) is that the first equipment is believed based on the target area of an at least frame raw video image Breath generates at least one corresponding second data packet, and insertion of at least one second data packet is based on the original view of an at least frame by this In at least one first data packet that frequency image generates, the video data stream generated is finally made to carry corresponding target area letter The process of breath.

In other embodiments, except above two method can make the video data stream generated carry corresponding target area Except domain information, above-mentioned video data stream can also be made to carry corresponding target area information, the present invention using other modes Embodiment is not done repeat one by one herein.

It should be noted that above-mentioned steps 403 to process involved in step 405 is to an at least frame original video figure At least a frame raw video image is encoded the target area information of picture with this, is generated and is carried at least one target area mark The process of at least one the first data packet.In this process, the first equipment can be based on an above-mentioned at least frame raw video image And its corresponding target area information, while generating at least one first data packet, that is to say, the first equipment can it is above-mentioned extremely At least one target area mark is inserted into few at least one corresponding code of a frame raw video image, so that the first equipment can At least one above-mentioned code and at least one target area are identified while be packaged, to generate at least one first data packet. To this, the specific generating mode of at least one the first data packet is without limitation at this for the embodiment of the present invention.

406, the first equipment sends the video data stream to the second equipment.

In embodiments of the present invention, as shown in figure 5, based on the video data stream that step 405 obtains, the first equipment can be incited somebody to action The video data stream is to other any second equipment.Wherein, the first equipment can be based on corresponding Transmission system for the view Frequency data stream is transmitted in corresponding second equipment, which can be internet, terrestrial wireless broadcast and satellite etc.. Form based on video data stream transmits data, so that data are more quick in transmission process, and stores more convenient, mitigation The burden of Transmission system.

It should be noted that second equipment can have store function, decoding function and recodification function, this second is set Standby to can be terminal, which can flow into video data by the application program with decoding function and recodification function Processing row decoding and recoded.Second equipment may be server, which can obtain corresponding video in real time Data flow, and by the decoding and recodification process on server, the video data stream real-time perfoming got is handled.This hair Bright embodiment at this to the concrete form of second equipment without limitation.

407, the second equipment receives video data stream, which carries the target of an at least frame raw video image Area information.

In embodiments of the present invention, based on step 401 to step 405 it is found that the first equipment is original based on an at least frame During video image is encoded, the target area information extracted from this at least a frame raw video image is also compiled Enter in corresponding video data stream, so that the video data stream generated carries the target area of an at least frame raw video image Information, therefore, the second equipment it is received from the video data stream of the first equipment while, be also received by and be incorporated into video counts According to the target area information of at least frame raw video image in stream.

It should be noted that second equipment can that is to say that second equipment can be with one with real-time reception video data stream Side receives video data stream, synchronizes handle the video data stream received on one side.Certainly, which can also be first All video data streams of the first equipment transmission have been received, then the video data stream received have been performed corresponding processing, this hair Bright embodiment is it is not limited here.

408, the second equipment is decoded at least one target area mark in the video data stream, obtains this at least The target area information of one frame raw video image.

In embodiments of the present invention, right as shown in figure 5, second equipment can be based on decoding function and recodification function The video data stream received carries out transcoding, wherein transcoding refers to the video data stream generated based on above-mentioned first equipment, will The video data stream is converted into another video data stream, with adapt to different network bandwidths, different terminal processing capacities and Different user demands etc..For example, above-mentioned video data stream can be transcoded into the video counts of different video format by the second equipment According to stream, for example the second equipment can be by MPEG-2 (Moving Picture Experts Group, Motion Picture Experts Group) lattice The video data stream of formula switchs to the video data stream of H.264 format, and the second equipment can also change to be received from the first equipment The bit rate of video data stream, to meet the demand of the broadcasting of distinct device, in addition, second equipment can also be to receiving Video data stream carries out transcoding, so that the resolution ratio of the corresponding video image of video data stream before and after transcoding changes, than HD video can such as be switched to SD video.The embodiment of the present invention does not limit the particular use of the transcoding process at this It is fixed.

The essence of above-mentioned transcoding process be first decoded based on the video data stream received, then to the obtained data of decoding into The process that row is recoded.Wherein, the video data stream received for the second equipment, can by above-mentioned steps 403 to step 405 Know, had both included the data encoded by an at least frame raw video image in the video data stream, and also included by an at least frame The data that the corresponding target area information coding of raw video image obtains.Therefore, which can be based on the video counts According to stream, corresponding target area information is extracted, wherein the process of the extraction target area information is to above-mentioned video data Decoded process is flowed, which is the process unziped it to the related data in video data stream.

Accordingly with step 405, in one embodiment, may include in the video data stream that the second equipment receives At least one target area mark, at least one target area mark are based in a corresponding at least frame raw video image Target area Information Compression obtained from.Therefore, it when second equipment needs corresponding target area information, can be based on At least one target area mark in the video data stream is decoded, to extract required target area information.

Specifically, the first data packet of each of above-mentioned video data stream includes at least one field, at least one word Section includes data head and data body portion, wherein the data head can identify for corresponding target area, which exists It can first be extracted during extracting at least target area information of a frame raw video image based on above-mentioned video data stream The corresponding target area mark of data head at least one above-mentioned field, is then decoded target area mark, with The corresponding target area information in each target area is extracted, the embodiment of the present invention is at this to second equipment to above-mentioned video The detailed process that at least one target area mark in data flow is decoded is without limitation.

The above process is to be decoded at least one target area mark in video data stream, to extract correspondence At least target area information of a frame raw video image for be illustrated, be described below another from video data stream The middle method for extracting at least target area information of a frame raw video image:

Accordingly with the step (1) in step 405 to step (3), in one embodiment, the second equipment can be based on being somebody's turn to do The first data packet of at least one of video data stream and at least one second data packet, every the first data of preset number Packet is decoded the second data packet after the first data packet of the preset number, generates an at least frame original video figure The target area information of picture.Specifically, may include in the video data stream at least one first data packet and at least one the Two data packets, at least one second data packet are that the target area information coding based on an at least frame raw video image obtains It arrives, the second equipment at least one second data packet can be decoded this, and required target area information can be obtained.

Wherein, the second equipment can detect above-mentioned video data stream, and the second equipment may be every preset number First data packet can detecte to corresponding second data packet, and specifically, the second equipment can be every N number of first data Packet detects that the N+1 data packet is the second data packet, wherein N can be any positive integer.Certainly, above-mentioned corresponding second Data packet may also be located at the other positions of every preset number the first data packet, and, the every two that the second equipment detects the The first data packet between two data packets may be any other quantity, and the embodiment of the present invention is it is not limited here.Second sets It is standby second data packet to be unziped it, which is reduced to correspond to based on the decoding function having Target area information, achieve the purpose that extract an at least frame raw video image target area information.

Based on the above process, the second equipment can more quickly extract the target area letter carried in video data stream Breath, avoids in subsequent processing, and the second equipment carries out target area recognizer to video image again, needed for acquisition Target area information, greatly reduces data processing time, reduces the operation burden of the second equipment.

It should be noted that two kind of second equipment except above-mentioned introduction extracts corresponding mesh based on the video data stream received The method for marking area information, the second equipment can also extract corresponding target area information by other methods, and the present invention is implemented Example extracts the specific method of target area information without limitation to second equipment at this.

409, the second equipment is decoded the first data packet of at least one of the video data stream, obtain this at least one The corresponding video image of a first data packet.

In embodiments of the present invention, which is former at least frame got by the first equipment Beginning encoding video pictures obtain.Second equipment needs to be based on during carrying out transcoding to the video data stream received The video data stream is decoded, and the first data packet of at least one of the video data stream is reduced to corresponding video figure Picture, and then the parameters such as the resolution ratio based on the setting of the second equipment or format, handle corresponding video image, are accorded with The video image of conjunction demand.

Specifically, accordingly with step 404, the second equipment can be by corresponding decoding algorithm to above-mentioned video data stream At least one of the first data packet be decoded, for example, the second equipment can be by H.264 decoding algorithm to the video data The first data packet of at least one of stream is decoded.Second equipment can call the correlation function in the decoding algorithm, obtain The packaging information in the video data stream is taken, at least one of to read and analyze the video data stream the first data packet, is sought The leader for finding each first data packet is known, and is then decoded, finally obtains each to the data between the knowledge of every two leader The corresponding each video image of data.Based on the above process, the second equipment can be by least one of video data stream first Data packet is successively reduced to a corresponding at least frame video image, realizes the purpose of video data stream decoding.

Above-mentioned steps 408 to step 409 is that the video data stream that the second equipment interconnection receives is decoded, and generates the video The process of the corresponding video image of data flow, the process include to video data stream decoding to obtain corresponding target area information Process, further include in video data stream database decode to obtain the process of corresponding video image.Certainly, at other In embodiment, the second equipment can also be decoded video data stream by other decoding algorithms, and the embodiment of the present invention is herein Without limitation to the detailed process of video data stream decoding.

It should be noted that in above-mentioned steps 408 to the first equipment involved in step 409 at least one the first data Packet be decoded during, first equipment can obtain simultaneously at least one corresponding video image of the first data packet with And corresponding target area information.The embodiment of the present invention obtains above-mentioned video image and its corresponding target to the first equipment at this The sequence of area information is without limitation.

410, target area information of second equipment based on an at least frame raw video image, to the video data stream pair The video image answered is recoded, and target video data stream is generated.

In embodiments of the present invention, the target area letter based on an at least frame raw video image obtained in step 408 The corresponding video image obtained in breath and step 409 to video data stream decoding, the second equipment can be to the corresponding video Image is recoded, which is ROI (Region Of Interest, area-of-interest) coding, and according to above-mentioned mesh Area information is marked, during recodification, more to target area corresponding with the target area information in video image distribution More code rate, to generate the higher target video data stream of quality.

Specifically, similarly with the process that is encoded in step 404, the second equipment can according to the object format of setting or The parameters such as target resolution, predict the corresponding video image of above-mentioned video data stream, are converted, being quantified and entropy coding etc. Process, to remove the redundancy of the video image, final second equipment can be by the corresponding video figure of above-mentioned video data stream As corresponding at least one object code of the parameters such as boil down to and the object format of setting or target resolution.

Based at least one object code obtained above, the second equipment can according to corresponding rule, by this at least one A object code is arranged, and carries out the process such as being packaged, and ultimately generates object format or target resolution etc. with setting The corresponding target video data stream of parameter is realized and carries out transcoding to the corresponding video data stream of an at least frame raw video image Process.

It should be noted that the second equipment during recoding to above-mentioned video image, can also be based on it He recodes at parameter, and the detailed process of parameter and recodification that the embodiment of the present invention recodes to the second equipment at this is not It limits.

Above-mentioned steps 407 to step 410 is the process for the video data stream progress transcoding that the second equipment interconnection receives, such as Shown in Fig. 5, during the transcoding, the second equipment can directly extract corresponding target area letter from video data stream Breath, avoids the process for reruning target area recognizer, greatly improves system performance.Certainly, except above-mentioned mentioned Except transcoding process, the second equipment can also realize transcoding by other methods, as long as the second equipment can be directly from video Corresponding target area information is extracted in data flow, the embodiment of the present invention is it is not limited here.

Above-described embodiment can be applied in net cast scene, and specifically, in net cast, live streaming client can be with An at least frame raw video image is obtained in real time by the camera of terminal, and terminal can be to an at least frame raw video image Target area identification is carried out, and an at least frame raw video image is encoded based on obtained target area information.Terminal For the video data stream that above-mentioned coding can be generated to server, server can be based on the mesh carried in video data stream Mark area information is decoded video data stream, obtains corresponding video image and its carries target area information, and to upper It states video image to recode, realizes the purpose to video data stream transcoding, so that the target video data stream that transcoding generates Corresponding video resolution or video format etc. change, to adapt to the different demands of user.And server can also incite somebody to action Target video data stream after transform format is sent to other terminals, to adapt to the video playing and processing capacity of different terminals. In addition to above-mentioned net cast scene, which can also be applied to other scenes, and the embodiment of the present invention is right at this The particular use of transcoding processing is without limitation.

Fig. 6 is a kind of flow chart of the processing method of video data provided in an embodiment of the present invention, the place of the video data Reason method is interacted and is illustrated with the first equipment and the second equipment, wherein the first equipment have encoding function, second Equipment has the function of mixed flow.Referring to Fig. 6, which includes:

601, the first equipment obtains an at least frame raw video image.

602, the first equipment is based on an at least frame raw video image, obtains the mesh of an at least frame raw video image Mark area information.

603, the first equipment encodes the target area information of an at least frame raw video image, generates at least one A target area mark.

604, to this, at least a frame raw video image encodes the first equipment, generates at least one first data packet.

605, at this at least one first data packet, corresponding at least one target area of being inserted into identifies the first equipment, Generate the video data stream.

606, the first equipment sends the video data stream to the second equipment.

In embodiments of the present invention, same to step 406 with step 401 as shown in fig. 7, above-mentioned steps 601 are to step 606 Reason, details are not described herein for the embodiment of the present invention.

607, the second equipment receives at least two-path video data flow, and every road video data stream carries an at least frame original video The target area information of image.

In embodiments of the present invention, as shown in fig. 7, second equipment can have store function, decoding function, merge function At least two-path video data flow from least one the first equipment can be can receive with recodification function, second equipment, with At least two-path video data flow progress mixed flow processing to receiving, wherein mixed flow processing refers to above-mentioned source is different The corresponding video image of at least two-path video data flow merge, be finally by above-mentioned at least two-path video data stream merging It that is to say with video data stream all the way with meeting the needs of users, the essence of mixed flow processing is at least two-path video data flow The process for being decoded, merging and recoding.

Second equipment can be server, which can have mixed flow function, which, which can receive, comes from At least two-path video data flow of different multimedia client, and to this at least two-path video data flow be decoded, merge and It recodes, by at least two-path video data flow mixed flow at target video data stream all the way.Certainly, which can be with For terminal, which can receive at least two-path video data flow that any other equipment is sent, and by at least two-path video Data stream merging is with target video data stream all the way.The embodiment of the present invention does not limit the concrete form of second equipment at this It is fixed.

Above-mentioned second equipment can at least two-path video data flow with real-time reception from different first equipment, and synchronize pair At least two-path video data flow carries out mixed flow processing for this, that is to say, which can the different video of one side receipt source Data flow carries out mixed flow processing to the video data stream having been received on one side.Certainly, which can also first receive The different all video data streams in source, then based on all video data streams received, mixed flow processing is carried out, the present invention is implemented Example to the second equipment interconnection rating frequency data stream and carries out the sequence of mixed flow processing without limitation at this.

It should be noted that can both have the function of mixed flow in same second equipment or have the function of transcoding, for example, together It can have mixed flow system and trans-coding system in one the second equipment, wherein the mixed flow system can be at least two received Road video data stream carries out mixed flow processing, which can carry out transcoding processing to the every road video data stream received. Certainly, the mixed flow system and trans-coding system can also be located in the second different equipment, wherein the with mixed flow system Two equipment can carry out mixed flow processing at least two-path video data flow that receives, and the second equipment with trans-coding system can be with Transcoding processing carried out to every road video data stream for receiving, the embodiment of the present invention is at this to whether having simultaneously in second equipment There are mixed flow function and transcoding function without limitation.

608, the second equipment is decoded at least one target area mark in each video data stream, is somebody's turn to do The target area information of an at least frame raw video image at least in two-path video data flow.

609, the second equipment is decoded the first data packet of at least one of every road video data stream, obtains this extremely At least one of few two-path video data flow corresponding video image of the first data packet.

All video data streams that above-mentioned steps 608 to step 609 needs to receive the second equipment all carry out accordingly Processing, as shown in Figure 7, wherein to the treatment process of every road video data stream all with the treatment process of step 408 to step 409 Similarly, details are not described herein for the embodiment of the present invention.

610, the second equipment merges at least corresponding video image of two-path video data flow, generates target video Image.

In embodiments of the present invention, as shown in fig. 7, it is every in at least two-path video data flow obtained based on step 609 The corresponding video image of road video data stream, the second equipment can be by corresponding pooling functions, by at least two-path video number It is combined according to corresponding video image is flowed, so that the video image of at least two-path video data flow is combined into an entirety, It that is to say and corresponding target video image is generated based on an at least frame video image.

Specifically, it is based on the corresponding at least frame video image of above-mentioned at least two-path video data flow, the second equipment can be with Since first video image of every road video data stream, by each view of same position in this at least two-path video data flow Frequency image correspondence merges.In addition, the second equipment can also will be corresponding in above-mentioned at least two-path video data flow Per N number of video image, correspondence is merged, wherein N is positive integer.Further, the second equipment can be by above-mentioned at least two An at least frame video image corresponding to video data stream carries out left back merging or merges up and down, and the second equipment can also be to upper The merging mode that an at least frame video image carries out " the big small figure of picture frame " is stated, so that the target video image generated is " picture-in-picture " Form.Certainly, in addition to the merging method of above-mentioned video image, the second equipment can also merge at least two using other modes The corresponding video image of road video data stream, to generate target video image, this clearly demarcated embodiment generates the second equipment at this The concrete mode of target video image is without limitation.

611, the second equipment is based on at least corresponding target area information of two-path video data flow, to the target video figure As recoding, target video data stream is generated.

In embodiments of the present invention, as shown in fig. 7, step 611 and above-mentioned steps 410 similarly, the embodiment of the present invention is herein It repeats no more.

Above-described embodiment can be applied in net cast scene, and specifically, during live streaming, mixed flow processing can be answered For processes such as video interactives between main broadcaster and other users, in the process, server be can receive from different more The video data stream that media client is sent, the video data stream that server can be different to the source received carry out above-mentioned mixed Stream process, so that the video data stream of above-mentioned separate sources is merged into target video data stream all the way.Except above-mentioned net cast Except scene, which can also be applied to other scenes, and the embodiment of the present invention is at this to the tool of mixed flow processing Body purposes is without limitation.

Fig. 8 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention.It, should referring to Fig. 8 Device includes: to obtain module 801, generation module 802, sending module 803.

Module 801 is obtained, for obtaining an at least frame raw video image；

The acquisition module 801 is also used to obtain an at least frame original video based on an at least frame raw video image The target area information of image；

Generation module 802, for the target area information based on an at least frame raw video image, to an at least frame Raw video image is encoded, and video data stream is generated, which carries an at least frame raw video image Target area information；

Sending module 803, for sending the video data stream to the second equipment.

In some embodiments, which is used for:

It should be understood that the processing unit of video data provided by the above embodiment is in the processing of video data, only The example of the division of the above functional modules, in practical application, can according to need and by above-mentioned function distribution by Different functional modules is completed, i.e., the internal structure of equipment is divided into different functional modules, described above complete to complete Portion or partial function.In addition, the processing unit of video data provided by the above embodiment and the processing method of video data are real It applies example and belongs to same design, specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Fig. 9 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention.It, should referring to Fig. 9 Device includes: receiving module 901, extraction module 902, decoder module 903, recodification module 904.

Receiving module 901, for receiving video data stream, which carries an at least frame raw video image Target area information；

Extraction module 902 extracts the target area of an at least frame raw video image for being based on the video data stream Information；

Decoder module 903 generates the corresponding video figure of the video data stream for being decoded to the video data stream Picture；

Recodification module 904, for target area information and target bit rate based on an at least frame raw video image, It recodes to the corresponding video image of the video data stream, generates target video data stream.

In some embodiments, which is used for:

Figure 10 is a kind of structural schematic diagram of the processing unit of video data provided in an embodiment of the present invention.Referring to Figure 10, The device includes: receiving module 1001, extraction module 1002, decoder module 1003, merging module 1004, recodification module 1005。

Receiving module 1001, for receiving at least two-path video data flow, it is former that every road video data stream carries an at least frame The target area information of beginning video image；

It is corresponding extremely to extract every road video data stream for being based on at least two-path video data flow for extraction module 1002 The target area information of a few frame raw video image；

Decoder module 1003 generates at least two-path video data flow for being decoded to every road video data stream Corresponding video image；

Merging module 1004 generates mesh for merging at least corresponding video image of two-path video data flow Mark video image；

Recodification module 1005, for being based on at least corresponding target area information of two-path video data flow, to the mesh Mark video image is recoded, and target video data stream is generated.

In some embodiments, which is used for:

Figure 11 is a kind of structural block diagram of terminal 1100 provided in an embodiment of the present invention.The terminal 1100 may is that intelligence Mobile phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image Expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic shadow As expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 1100 is also possible to referred to as user Other titles such as equipment, portable terminal, laptop terminal, terminal console.

In general, terminal 1100 includes: processor 1101 and memory 1102.

Processor 1101 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1101 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1101 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1101 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1101 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 1102 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1102 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1102 can Storage medium is read for storing at least one instruction, at least one instruction for performed by processor 1101 to realize this hair The processing method for the video data that bright middle embodiment of the method provides.

In some embodiments, terminal 1100 is also optional includes: peripheral device interface 1103 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1101, memory 1102 and peripheral device interface 1103.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1103.Specifically, peripheral equipment includes: In radio circuit 1104, touch display screen 1105, camera 1106, voicefrequency circuit 1107, positioning component 1108 and power supply 1109 At least one.

Peripheral device interface 1103 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1101 and memory 1102.In some embodiments, processor 1101, memory 1102 and periphery Equipment interface 1103 is integrated on same chip or circuit board；In some other embodiments, processor 1101, memory 1102 and peripheral device interface 1103 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1104 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1104 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1104 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1104 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1104 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio frequency electrical Road 1104 can also include NFC (Near Field Communication, wireless near field communication) related circuit, the present invention This is not limited.

Display screen 1105 is for showing UI (User Interface, user interface).The UI may include figure, text, Icon, video and its their any combination.When display screen 1105 is touch display screen, display screen 1105 also there is acquisition to exist The ability of the touch signal on the surface or surface of display screen 1105.The touch signal can be used as control signal and be input to place Reason device 1101 is handled.At this point, display screen 1105 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press Button and/or soft keyboard.In some embodiments, display screen 1105 can be one, and the front panel of terminal 1100 is arranged；Another In a little embodiments, display screen 1105 can be at least two, be separately positioned on the different surfaces of terminal 1100 or in foldover design； In still other embodiments, display screen 1105 can be flexible display screen, is arranged on the curved surface of terminal 1100 or folds On face.Even, display screen 1105 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1105 can be with Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1106 is for acquiring image or video.Optionally, CCD camera assembly 1106 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.? In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1106 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.

Voicefrequency circuit 1107 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1101 and handled, or be input to radio circuit 1104 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 1100 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1101 or radio frequency will to be come from The electric signal of circuit 1104 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1107 may be used also To include earphone jack.

Positioning component 1108 is used for the current geographic position of positioning terminal 1100, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1108 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.

Power supply 1109 is used to be powered for the various components in terminal 1100.Power supply 1109 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 1109 includes rechargeable battery, which can support wired Charging or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 1100 further includes having one or more sensors 1110.One or more sensing Device 1110 includes but is not limited to: acceleration transducer 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensing Device 1114, optical sensor 1115 and proximity sensor 1116.

Acceleration transducer 1111 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1100 Size.For example, acceleration transducer 1111 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1101 acceleration of gravity signals that can be acquired according to acceleration transducer 1111, control touch display screen 1105 with transverse views Or longitudinal view carries out the display of user interface.Acceleration transducer 1111 can be also used for game or the exercise data of user Acquisition.

Gyro sensor 1112 can detecte body direction and the rotational angle of terminal 1100, gyro sensor 1112 Acquisition user can be cooperateed with to act the 3D of terminal 1100 with acceleration transducer 1111.Processor 1101 is according to gyro sensors The data that device 1112 acquires, following function may be implemented: action induction (for example changing UI according to the tilt operation of user) is clapped Image stabilization, game control and inertial navigation when taking the photograph.

The lower layer of side frame and/or touch display screen 1105 in terminal 1100 can be set in pressure sensor 1113.When When the side frame of terminal 1100 is arranged in pressure sensor 1113, user can detecte to the gripping signal of terminal 1100, by Reason device 1101 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1113 acquires.Work as pressure sensor 1113 when being arranged in the lower layer of touch display screen 1105, is grasped by processor 1101 according to pressure of the user to touch display screen 1105 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.

Fingerprint sensor 1114 is used to acquire the fingerprint of user, is collected by processor 1101 according to fingerprint sensor 1114 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1114 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1101, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1114 can be set Set the front, the back side or side of terminal 1100.When being provided with physical button or manufacturer Logo in terminal 1100, fingerprint sensor 1114 can integrate with physical button or manufacturer Logo.

Optical sensor 1115 is for acquiring ambient light intensity.In one embodiment, processor 1101 can be according to light The ambient light intensity that sensor 1115 acquires is learned, the display brightness of touch display screen 1105 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1105 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1105 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1101 can also be acquired according to optical sensor 1115, is moved The acquisition parameters of state adjustment CCD camera assembly 1106.

Proximity sensor 1116, also referred to as range sensor are generally arranged at the front panel of terminal 1100.Proximity sensor 1116 for acquiring the distance between the front of user Yu terminal 1100.In one embodiment, when proximity sensor 1116 is examined When measuring the distance between the front of user and terminal 1100 and gradually becoming smaller, by processor 1101 control touch display screen 1105 from Bright screen state is switched to breath screen state；When proximity sensor 1116 detect the distance between front of user and terminal 1100 by When gradual change is big, touch display screen 1105 is controlled by processor 1101 and is switched to bright screen state from breath screen state.

It, can be with it will be understood by those skilled in the art that the restriction of the not structure paired terminal 1100 of structure shown in Figure 11 Including than illustrating more or fewer components, perhaps combining certain components or being arranged using different components.

Figure 12 is a kind of structural schematic diagram of server provided in an embodiment of the present invention, the server 1200 can because of configuration or Performance is different and generates bigger difference, may include one or more processors (central processing Units, CPU) 1201 and one or more memory 1202, wherein at least one is stored in the memory 1202 Instruction, at least one instruction are loaded by the processor 1201 and are executed the video to realize above-mentioned each embodiment of the method offer The processing method of data.Certainly, which can also have wired or wireless network interface, keyboard and input/output interface Equal components, to carry out input and output, which can also include other for realizing the component of functions of the equipments, not do herein It repeats.

In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by the processor in terminal to complete the processing method of video data in above-described embodiment.For example, the meter Calculation machine readable storage medium storing program for executing can be read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), tape, floppy disk and Optical data storage devices etc..

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, above procedure can store computer-readable to be deposited in a kind of In storage media, storage medium mentioned above can be read-only memory, disk or CD etc..

It above are only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all in the spirit and principles in the present invention Within, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of processing method of video data, which is characterized in that be applied to the first equipment, which comprises

Obtain an at least frame raw video image；

Based on an at least frame raw video image, the target area information of an at least frame raw video image is obtained；

Based on the target area information of an at least frame raw video image, an at least frame raw video image is carried out Coding, generates video data stream, and the video data stream carries the target area information of an at least frame raw video image；

The video data stream is sent to the second equipment.

2. the method according to claim 1, wherein the mesh based on an at least frame raw video image Area information is marked, an at least frame raw video image is encoded, generates video data stream, the video data stream is taken The target area information of a band at least frame raw video image includes:

At least target area information of a frame raw video image and an at least frame raw video image are compiled Code generates at least one first data packet for carrying at least one target area mark, at least one target area mark It encodes to obtain by an at least frame raw video image；

Based at least one first data packet of described at least one target area of carrying mark, the video data stream is generated.

3. the method according to claim 1, wherein the mesh based on an at least frame raw video image Area information is marked, an at least frame raw video image is encoded, generates video data stream, the video data stream is taken The target area information of a band at least frame raw video image includes:

The target area information of an at least frame raw video image is encoded, at least one second data packet is generated；

4. a kind of processing method of video data, which is characterized in that be applied to the second equipment, which comprises

Based on the video data stream, the target area information of an at least frame raw video image is extracted；

Based on the target area information of an at least frame raw video image, video image corresponding to the video data stream It recodes, generates target video data stream.

5. according to the method described in claim 4, extraction is described at least it is characterized in that, described be based on the video data stream The target area information of one frame raw video image includes:

Based at least one field of at least one the first data packet in the video data stream, at least one target area is extracted Mark；

At least one described target area mark is decoded, the target area of an at least frame raw video image is obtained Information.

6. according to the method described in claim 4, extraction is described at least it is characterized in that, described be based on the video data stream The target area information of one frame raw video image includes:

Based on the first data packet of at least one of described video data stream and at least one second data packet, every preset number A first data packet is decoded the second data packet after the first data packet of the preset number, and generation is described at least The target area information of one frame raw video image.

7. a kind of processing method of video data, which is characterized in that be applied to the second equipment, which comprises

At least two-path video data flow is received, every road video data stream carries the target area letter of an at least frame raw video image Breath；

Based on at least two-path video data flow, the corresponding at least frame raw video image of every road video data stream is extracted Target area information；

Every road video data stream is decoded, at least corresponding video image of two-path video data flow is obtained；

Based on the corresponding target area information of at least two-path video data flow, the target video image is rearranged Code generates target video data stream.

8. the method according to the description of claim 7 is characterized in that described based on at least two-path video data flow, extraction The target area information of the corresponding at least frame raw video image of every road video data stream includes:

At least one corresponding target area of every road video data stream mark is decoded, at least two-way is obtained and regards The target area information of the corresponding at least frame raw video image of frequency data stream.

9. the method according to the description of claim 7 is characterized in that described based on at least two-path video data flow, extraction The target area information of the corresponding at least frame raw video image of every road video data stream includes:

Based on the first data packet of at least one of every road video data stream and at least one second data packet, every default The first data packet of number is decoded the second data packet after the first data packet of the preset number, described in generation The target area information of an at least frame raw video image at least in two-path video data flow.

10. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory One instruction, described instruction are loaded as the processor and are executed to realize as described in claim 1 to any one of claim 9 Video data processing method performed by operation.

11. a kind of server, which is characterized in that the server includes processor and memory, is stored in the memory At least one instruction, described instruction are loaded by the processor and are executed to realize such as any one of claim 1 to claim 9 Operation performed by the processing method of the video data.

12. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium Instruction is stated to be loaded by processor and executed to realize such as claim 1 to the described in any item video datas of claim 9 Operation performed by reason method.