CN113259712A

CN113259712A - Video processing method and related device

Info

Publication number: CN113259712A
Application number: CN202010090517.XA
Authority: CN
Inventors: 郭晓彬; 王海亮; 林晓鑫; 高原
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2021-08-13
Anticipated expiration: 2040-02-13
Also published as: CN113259712B

Abstract

The embodiment of the application discloses a video processing method and a related device, wherein the method comprises the steps of obtaining a video to be processed and corresponding mask textures; the video to be processed comprises N video frames; any one of the N video frames is a target video frame; according to the corresponding relation between the video frame and the mask texture, carrying out texture combination on the video frame and the mask texture to obtain N combined video frames; the merged video frame corresponding to the target video frame is obtained by merging the target video frame and the mask texture corresponding to the target video; and generating a video to be fused according to the N merged video frames.

Description

Video processing method and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a video processing method and related apparatus.

Background

With the development of video processing technology, the user personalized information can be fused into the existing video, and when the fused video is played, the user personalized information can be displayed, so that the user substitution feeling is improved.

In the prior art, the technical means of masking is mainly adopted to realize the fusion of the user personalized information and the existing video. The mask file is added for supporting the fusion effect, the mask textures needed by fusion are put together through the mask file, but the video is decoded frame by frame, and the mask file needs to be frequently read, so that the video is not easy to play.

If all the mask files are read at one time, the problem of memory increase is caused, and if the area decoding is performed, the burden of a Central Processing Unit (CPU) and an Input/Output (I/O) is caused.

Disclosure of Invention

In order to solve the technical problem, the present application provides a video processing method, which can reduce consumption of a CPU and a memory, and improve work efficiency of processing a video.

In view of this, the embodiment of the present application discloses the following technical solutions:

in a first aspect, an embodiment of the present application provides a video processing method, where the method includes:

acquiring a video to be processed and corresponding mask textures; the video to be processed comprises N video frames; any one of the N video frames is a target video frame;

according to the corresponding relation between the video frame and the mask texture, carrying out texture combination on the video frame and the mask texture to obtain N combined video frames; the merged video frame corresponding to the target video frame is obtained by merging the target video frame and the mask texture corresponding to the target video;

and generating a video to be fused according to the N merged video frames.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

the acquiring unit is used for acquiring a video to be processed and corresponding mask textures; the video to be processed comprises N video frames; any one of the N video frames is a target video frame;

the merging unit is used for performing texture merging on the video frames and the mask textures according to the corresponding relation between the video frames and the mask textures to obtain N merged video frames; the merged video frame corresponding to the target video frame is obtained by merging the target video frame and the mask texture corresponding to the target video;

and the generating unit is used for generating a video to be fused according to the N combined video frames.

In a third aspect, an embodiment of the present application provides a device for video processing, where the device includes a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the video processing method of the first aspect according to instructions in the program code.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, where the computer program is used to execute the video processing method according to the first aspect.

According to the technical scheme, the video to be processed and the corresponding mask texture are obtained, the video frame and the mask texture are subjected to texture combination according to the corresponding relation between the video frame and the mask texture to obtain N combined video frames, and the video to be fused is generated according to the N combined video frames. Because the merged video frame comprises the mask texture corresponding to each video frame, the video to be merged obtained according to the merged video frame carries the mask texture, and the mask file is not required to store the mask texture of the video independently, so that the CPU is not required to process the mask file, and the consumption of the CPU and the memory is reduced. Based on this, a Graphics Processing Unit (GPU) is used to perform hardware decoding once, so that the mask texture required by each video frame of the video can be obtained for personalized information fusion, thereby improving the video decoding efficiency and improving the working efficiency of the video Processing process.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another video processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a method for generating a fusion video according to an embodiment of the present application;

fig. 5 is a schematic view of a scene to which a video processing method is applied according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 8 is a block diagram of a partial structure related to a terminal according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

In the related technology, video personalized information fusion is to put all mask textures corresponding to N video frames of a video to be processed in the same large texture to form a mask file. When fusing personalized information of videos, for each video frame, a corresponding mask texture needs to be acquired from a mask file. One possible way is to read the mask file at a time by using the CPU and select a mask texture corresponding to each video frame from all the read mask textures. Based on the mode, the CPU needs to read a large amount of data at one time, the memory occupancy rate is increased, and the performance of the CPU for processing other tasks at the same time is influenced to a certain extent. Another possibility is to use the CPU to perform region decoding on the large texture to obtain the mask texture corresponding to each video frame, which may cause a certain burden on the CPU and I/O.

In order to improve video processing efficiency, the embodiment of the application provides a video processing method, and the method does not need a mask file, but merges each video frame with a corresponding mask texture, and generates a video to be fused by using a plurality of merged video frames obtained by merging, so as to be used for personalized information fusion, thereby improving the processing efficiency of video fusion personalized information.

The video processing method provided by the embodiment of the application can be applied to video processing equipment with video processing capacity, such as terminal equipment or a server, and the method can be independently executed through the terminal equipment, can also be independently executed through the server, can also be applied to a network scene of communication between the terminal equipment and the server, and can be executed through the cooperation between the terminal equipment and the server. The terminal equipment can be a mobile phone, a desktop computer, a portable computer and the like; the server may be understood as an application server or a Web server, and in actual deployment, the server may be an independent server or a cluster server.

An application scenario in which the video processing method provided by the embodiment of the present application can be applied is described below with reference to the drawings. In this application scenario, the video processing device is specifically a terminal device.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application. As shown in fig. 1, the application scenario includes: and the terminal equipment 101 is used for video processing.

The terminal device 101 first obtains a video to be processed and a mask texture corresponding to the video. The mask texture is used for blocking part of the image content and displaying the picture of the image content of the specific area.

In the application scenario shown in fig. 1, the terminal device 101 acquires N video frames of a video to be processed, and correspondingly, the terminal device 101 acquires a mask texture corresponding to each video frame of the video, for example, the terminal device 101 acquires a target video frame and a mask texture corresponding to the target video frame.

The terminal device 101 performs texture merging on the N video frames and the corresponding mask textures, respectively, to obtain N merged video frames. The texture merging is to splice the video frame and the mask texture into one texture. That is, each video frame and its corresponding mask texture are merged with the video frame as the processing object of texture merging, which is different from putting all mask textures in the same mask file.

As shown in fig. 1, for a target video frame, a terminal device 101 performs texture merging on the target video frame and a corresponding mask texture thereof to obtain a merged video frame, where the merged video frame includes: a target video frame and a mask texture. And the terminal equipment performs batch processing on the N video frames according to the same processing process to obtain N combined video frames.

Since the merged video frame includes the mask texture corresponding to the video frame, the mask file is not required to store the mask texture of the video. Therefore, when the personalized information of the video is fused subsequently, the CPU is not required to read the mask file, and the consumption of the CPU and the memory is reduced.

The terminal device 101 encodes the N merged video frames to generate a to-be-fused video, and when performing personalized information fusion based on the to-be-fused video, the GPU may be used to perform a decoding process on the to-be-fused video, and perform personalized texture fusion on the decoded to-be-fused video, thereby realizing personalized fusion of the video and improving the decoding efficiency and the working efficiency of video processing.

The following describes a video processing method provided by the present application in detail by embodiments.

Referring to fig. 2, fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure. With reference to fig. 3, fig. 3 is a schematic flowchart of another video processing method according to an embodiment of the present disclosure. For convenience of description, the video processing method is described with a terminal device as an execution subject of the video processing method. As shown in fig. 2, the video processing method includes the steps of:

s201: acquiring a video to be processed and corresponding mask textures; the video to be processed comprises N video frames; any one of the N video frames is a target video frame.

Because the common video only comprises video frame data, the terminal equipment cannot directly process the video frame data and fuses the personalized information into the video. Therefore, when the terminal device acquires the video to be processed, the terminal device also needs to acquire a corresponding mask texture which is used for shielding the content of the specific area in the video frame and displaying the personalized information to be fused, so that the personalized information can be fused in the video.

When the terminal equipment acquires one video frame, the mask texture contained in the video frame is correspondingly acquired. The mask texture may be image data of any shape, and the mask texture corresponding to each video frame may include a plurality of mask information. As shown in fig. 3, for the target video frame, the corresponding mask texture includes: text mask, avatar mask 1 and avatar mask 2. The number and shape of the mask texture corresponding to each video frame are not limited, and can be set according to actual requirements.

The terminal equipment acquires N pieces of video frame data included in the video to be processed and mask texture information corresponding to the N pieces of video frames, and based on the information, the terminal equipment can perform subsequent personalized information fusion processing on the video.

S202: according to the corresponding relation between the video frame and the mask texture, carrying out texture combination on the video frame and the mask texture to obtain N combined video frames; and the merged video frame corresponding to the target video frame is obtained by merging the target video frame and the mask texture corresponding to the target video.

Since all the mask textures corresponding to the video to be processed are placed in the same mask file, when the mask textures are read by using a CPU, the consumption of the CPU and a memory is caused, and the efficiency of video processing is influenced. Therefore, the present application provides a feasible implementation manner, and the mask texture corresponding to each video frame of the video to be processed is used as the processing object for processing.

For each video frame, the terminal device may perform texture merging on the video frame and the mask texture corresponding to the video frame according to the correspondence between the video frame and the mask texture, so as to obtain a merged video frame. If the video to be processed comprises N video frames, texture merging is carried out on the N video frames respectively to obtain N merged video frames.

Texture merging of a video frame and a mask texture is understood to be that the video frame and the mask texture are spliced together to obtain a texture, and the texture is the merged video frame. Merging the video frames includes: the video frame and the corresponding masking texture of the video frame.

Because the merged video frame contains the mask texture required by the video frame, when the personalized information is fused with the video, the personalized information fusion can be carried out on each merged video frame. Based on the N merged video frames, the N merged video frames can be subjected to personalized information fusion, so that the personalized information fusion of the video to be processed is realized.

The above process of texture merging for video frames and mask textures provides another feasible way. As shown in fig. 3, taking a target video frame as an example, the terminal device may perform the following steps:

s301: acquiring a color pixel image and a transparency pixel image of the target video frame;

s302: performing data compression on the transparency pixel image;

s303: and carrying out texture combination according to the color pixel image, the compressed transparency pixel image and the mask texture corresponding to the target video frame to obtain a combined video frame corresponding to the target video frame.

The terminal device respectively obtains a color pixel image and a transparency pixel image of a target video frame, wherein the color pixel image is an original image of the target video frame, and the transparency pixel image is regarded as a transparency mask of the target video frame. The mask texture can be regarded as a mask of the video frame, and the mask texture is placed on the video frame, so that the video frame can be cut and fused. Wherein the mask texture can be placed at the same location of the video frame.

The color pixel image may be an RGB pixel image, and the transparency pixel image may be an ALPHA pixel image. The ALPHA value of the ALPHA pixel image is generally 0-255, and when ALPHA is 0, the transparency pixel image is not displayed at all; when ALPHA is 255, the transparency pixel image is completely displayed; when 0< ALPHA <255, the transparency pixel image has transparency, and can display the content of the next layer image. The terminal equipment can fuse the color pixel image and the transparency pixel image with different ALPHA values by adjusting the ALPHA value of the transparency pixel image to obtain video frame images with different fusion effects.

Experimental studies show that the naked eye is insensitive to the data change of the ALPHA channel in the image. Therefore, the terminal device performs appropriate compression processing on the transparency pixel image without affecting the visual effect of naked eyes. Typically, the transparency pixel image can be compressed to 0.5 times the original image. Therefore, the compressed transparency pixel image and the color pixel image are fused, and the visual effect of the original target video frame image cannot be caused.

The terminal device includes, based on the obtained merged video frame: the color pixel image and the compressed transparency pixel image, and the corresponding mask texture of the target video frame. As shown in fig. 3, for the target video frame, a resulting merged video frame. Therefore, on the premise of not influencing the visual effect of the video, the transparency pixel image is compressed, and the occupation of the memory is effectively saved.

Generally, a video is obtained by performing data compression processing on a plurality of video frames, and the existing video compression coding standard includes: the JPEG standard, the h.264 standard, the AVS standard, and the like, therefore, the videos to be processed acquired in different standards have different video formats, and the format of the video to be processed is not limited at all here. Due to the problem of video quality loss in the process of compressing and encoding the video frames, edge effects can occur when the masking textures are set in the video frames. That is, noise caused by other masks occurs at abrupt positions in the video frame where the mask texture is placed, which may affect the visual effect of watching the video to some extent.

In a feasible manner, taking the target video frame as an example, a blank pixel area may be reserved around the position of the mask texture in the merged video frame corresponding to the target video frame. That is, in the merged video frame, the corresponding video frame image pixel values around the mask texture are set to 255.

If the pixel value is 0, namely the image is characterized as black, the image is represented as full hiding in the mask texture, therefore, even if black noise exists at the edge position of the mask texture, the aliasing of the image color is not caused, and the edge transition of the video frame image and the mask texture is smoother.

S203: and generating a video to be fused according to the N merged video frames.

Based on the above S202, the terminal device obtains N merged video frames of the video to be processed, and encodes the N merged video frames to generate the video to be fused. The video to be fused comprises: video frame data of a video to be processed and mask texture information corresponding to each video frame. If the personalized information needs to be fused into the video to be processed, the terminal equipment can perform personalized information fusion by using the video to be fused obtained after the video processing, and the problem that the personalized information fusion cannot be performed on the video is solved.

It can be understood that, according to different personalized requirements, the mask texture set in different video frames may be different as the video is played. For example, the mask textures included in different video frames are different; the same mask texture is placed at different video frame locations, or is of different size.

In a feasible manner, the terminal device may determine a corresponding profile for the N merged video frames. The configuration file comprises mask texture configuration information corresponding to each merged video frame, and corresponding mask textures are set in each video frame according to the configuration information. And then, the terminal equipment generates a video to be fused by using the configuration file and the N merged video frames.

The video to be fused may be stored in the form of a file, for example, the video to be fused may be stored in a VAP type file in an mp4 compression coding standard, so that a vapc box may be extended in the VAP file to add configuration information. In practical applications, the files can be stored in different types of files according to different requirements, and are not limited herein.

In another feasible mode, the terminal device may further obtain audio data corresponding to the video to be processed, and generate a video to be fused according to the audio data and the N merged video frames, so as to perform personalized information fusion. And the audio data is added into the video to be fused, so that the playing effect of the video to be fused is enhanced, and the ornamental value of the video is improved.

Because the video to be fused comprises the configuration information and the mask texture information, the terminal equipment can perform personalized information fusion by using one video file to be fused, the mask texture does not need to be stored independently by the mask file, the configuration information does not need to be stored independently by the configuration file, and the management complexity of the terminal equipment to the file is reduced.

In the video processing method provided by the above embodiment, the video to be processed and the corresponding mask texture are obtained, the video frame and the mask texture are texture-merged according to the corresponding relationship between the video frame and the mask texture to obtain N merged video frames, and the video to be fused is generated according to the N merged video frames. Because the merged video frame comprises the mask texture corresponding to each video frame, the video to be merged obtained according to the merged video frame carries the mask texture, and the mask file is not required to store the mask texture of the video independently, so that the CPU is not required to process the mask file, and the consumption of the CPU and the memory is reduced. After the video to be fused is obtained based on the video processing method provided above, personalized information fusion by using the video to be fused is introduced below.

Referring to fig. 4, fig. 4 is a schematic flowchart of a method for generating a fusion video according to an embodiment of the present application. For convenience of description, the personalized information fusion process is introduced to the video by taking the terminal device as an execution subject. In fig. 4, the method comprises the steps of:

s401: and carrying out video decoding on the video to be fused to obtain the N combined video frames.

Based on the video to be fused obtained in S203, the video to be fused does not carry personalized information, and therefore, the video to be fused needs to be processed to obtain a video with personalized information.

The video to be fused carries the video frame data and the mask texture information of the original video and is generated in the form of the merged video frame, so that the terminal equipment can decode the video to be fused to obtain N merged video frames. Based on each merged video frame, the terminal device may perform personalized information fusion on each merged video frame.

In practical application, the terminal device obtains the VAP file, and may use the GPU to decapsulate the video to be fused stored in the mp4 format, so as to obtain corresponding video information. Based on the above, if the VAP file includes configuration information and audio data, the terminal device may further obtain corresponding audio information and configuration information after performing decoding processing.

The terminal equipment can obtain the decoded N video frames, the corresponding mask textures, the configuration information and the audio information by only utilizing the GPU to decode the video to be fused once, so that the processing of multi-path stream data is simplified, the video processing speed is increased, and the requirement for reading data by the CPU is reduced.

S402: and fusing the acquired personalized textures into the N combined video frames to obtain a fused video.

The terminal device fuses the personalized texture into the N merged video frames, and it can be understood that the terminal device superimposes the personalized texture corresponding to the mask texture at the position where the mask texture is located in each merged video frame according to the mask texture in each merged video frame. For example, if a merged video frame includes a text mask, the terminal device sets the acquired personalized text texture at the position of the text mask in the merged video.

The merged video frame fused with the personalized information comprises the following steps: the video frames, the mask textures and the personalized textures, so that the terminal equipment processes the N combined video frames to obtain a fusion video comprising personalized information.

S403: and playing the video according to the fused video.

The fusion video carries personalized information, so that the visual effect with personalized design can be observed by playing the fusion video.

The embodiment provides a video personalized fusion method, which is based on the video processing method provided by the embodiment of the application, and the personalized textures are fused into N decoded merged video frames by decoding the video to be fused to generate a fused video. Because the fused video carries the personalized information, the expected personalized playing effect can be obtained by playing the fused video.

Based on the above, referring to fig. 5, fig. 5 is a scene schematic diagram of an application video processing method according to an embodiment of the present application.

The terminal device extracts the acquired to-be-processed video and the mask texture, and acquires the mask texture corresponding to a PNG image, for example, one PNG image shown in fig. 5, every time one video frame is extracted. The video to be processed comprises N video frames.

As shown in fig. 5, after extracting the RGB pixel image and the ALPHA pixel image of the PNG image and the mask texture corresponding to the PNG image, the terminal device performs texture merging on the RGB pixel image, the ALPHA pixel image and the mask texture, and calculates the layout positions of the three in the same large texture, thereby obtaining the PNG image with the information of the three merged. Based on this, the terminal device performs batch processing on each video frame in the video to be processed, that is, performs the above-mentioned operations of extracting, merging and calculating on each video frame. And then, the terminal device configures the N PNG images obtained by combination, the acquired audio data and the configuration file in the same VAP file to generate the video to be fused in the mp4 format.

And the terminal equipment unpacks the VAP file in an mp4 format and reads the audio and video information and VAP configuration information contained in the VAP file. And then, analyzing information carried by the video to be processed (for example, analyzing the video to be processed in the h.264 format), decoding the video to be processed by using the GPU, and acquiring N video frames in the video to be processed. The terminal device sends personalized textures acquired from the outside, such as user head portrait textures, character textures and the like, N video frames and configuration information into a rendering pipeline, synthesizes the personalized textures in an ALPHA channel, cuts and fuses the fused textures at the same time to generate a personalized fused video, and the personalized fused video is played on a screen.

In practical application, after the terminal device decodes the video to be fused by using the GPU, the attribute of the user may be directly added in the drawing process of each video frame by using an Open graphics Library (OpenGL), and the fusion of the video animation and the user attribute is realized by customizing the animation script.

The following provides test data for implementing the video personalized information fusion method by using the video processing method (the scheme) and the mask file (the old scheme) provided by the embodiment of the present application, and compares the two schemes to explain the beneficial effects of the video processing method provided by the embodiment of the present application.

The test parameters include:

(1) testing equipment: a mobile phone;

(2) the video to be tested: frame number 20, resolution 672 × 1504;

(3) the number of masks corresponding to each video frame: 4, the number of the channels is 4;

the parameter comparison between the scheme and the traditional scheme is shown in table 1:

TABLE 1 protocol parameter alignment

Scheme(s)	File size	Number of documents	Memory usage	CPU occupation
					This scheme	2.4M	1	18M	13％
Old scheme	5.9M	7	28M	89％

Comparing the file size data of the two schemes, the video processing method provided by the embodiment of the application has the advantages of smaller data size to be read and higher video processing speed. Analysis shows that the method and the device can compress the transparency pixel image extracted according to the video frame, and reduce data carried by the video on the premise of not influencing the visual effect of the video.

Comparing the number of files of the two schemes shows that the number of files of the embodiment of the application is 1, which is far less than the number of files of the old scheme. According to the method and the device, a plurality of materials such as the mask information, the configuration information and the audio information are coded into one stream to obtain one file, and compared with the method and the device which need a plurality of files to store different materials in an old scheme, the method and the device reduce the complexity of material management.

Comparing the CPU and memory occupation parameters of the two schemes shows that the CPU and memory occupation corresponding to the scheme is far smaller than that of the old scheme. Because the video to be fused carries the mask information, the mask file is not required to store the mask information independently, the mask texture is not required to be processed in the CPU, the fixed texture required by all video frames can be obtained only by hardware decoding once, the video decoding efficiency and performance are improved, the consumption of the CPU and the memory is reduced, and the video processing efficiency is improved.

Aiming at the video processing method described above, the embodiment of the present application further provides a corresponding video processing apparatus.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. As shown in fig. 6, the video processing apparatus 600 includes an acquisition unit 601, a merging unit 602, and a generation unit 603; wherein the content of the first and second substances,

the acquiring unit 601 is configured to acquire a video to be processed and corresponding mask textures; the video to be processed comprises N video frames; any one of the N video frames is a target video frame;

a merging unit 602, configured to perform texture merging on the video frame and the mask texture according to a corresponding relationship between the video frame and the mask texture to obtain N merged video frames; the merged video frame corresponding to the target video frame is obtained by merging the target video frame and the mask texture corresponding to the target video;

a generating unit 603, configured to generate a video to be fused according to the N merged video frames.

Optionally, the merging unit 602 is further configured to:

acquiring a color pixel image and a transparency pixel image of the target video frame;

performing data compression on the transparency pixel image;

and carrying out texture combination according to the color pixel image, the compressed transparency pixel image and the mask texture corresponding to the target video frame to obtain a combined video frame corresponding to the target video frame.

Optionally, a blank pixel area is reserved around the masking texture in the merged video frame corresponding to the target video frame.

Optionally, the apparatus further comprises: a determination unit;

the determining unit is configured to determine a configuration file corresponding to the N merged video frames;

the generating unit is specifically configured to:

and generating a video to be fused according to the N combined video frames and the configuration file.

Optionally, the obtaining unit is further configured to:

acquiring audio data corresponding to the video to be processed;

the generation unit is further configured to:

and generating a video to be fused according to the N combined video frames and the audio data.

Optionally, the apparatus further comprises: a decoding unit, a fusion unit and a playing unit;

the decoding unit is used for performing video decoding on the video to be fused to acquire the N combined video frames;

the fusion unit is used for fusing the acquired personalized textures into the N merged video frames to obtain a fused video;

and the playing unit is used for playing the video according to the fusion video.

For the video processing method provided by the foregoing embodiment, the embodiment of the present application further provides a server and a terminal device for executing the video processing method, and the video processing device is described below with reference to the accompanying drawings.

Referring to fig. 7, fig. 7 is a schematic diagram of a server 1400 according to an embodiment of the present application, where the server 1400 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1422 (e.g., one or more processors) and a memory 1432, one or more storage media 1430 (e.g., one or more mass storage devices) for storing applications 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, a central processor 1422 may be disposed in communication with storage medium 1430 for executing a series of instruction operations on storage medium 1430 on server 1400.

The server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 7.

The CPU 1422 is configured to perform the following steps:

and generating a video to be fused according to the N merged video frames.

Optionally, the CPU 1422 may further execute the method steps of any specific implementation of the video processing method in this embodiment.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a video processing terminal device according to an embodiment of the present application. For convenience of explanation, only the parts related to the embodiments of the present application are shown, and details of the specific technology are not disclosed. The terminal device can be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (English full name: Personal Digital Assistant, English abbreviation: PDA) and the like:

fig. 8 is a block diagram illustrating a partial structure related to a terminal provided in an embodiment of the present application. Referring to fig. 8, the terminal includes: radio Frequency (RF) circuit 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuit 1560, wireless fidelity (WiFi) module 1570, processor 1580, and power 1590. Those skilled in the art will appreciate that the video processing terminal device configuration shown in fig. 8 does not constitute a limitation of the video processing terminal device, and may include more or less components than those shown, or combine some components, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 8:

the memory 1520 may be used to store software programs and modules, and the processor 1580 implements various functional applications of the terminal and data processing by operating the software programs and modules stored in the memory 1520. The memory 1520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1520 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 1580 is a control center of the terminal, connects various parts of the entire tablet pc using various interfaces and lines, and performs various functions of the tablet pc and processes data by operating or executing software programs and/or modules stored in the memory 1520 and calling data stored in the memory 1520, thereby integrally monitoring the tablet pc. Optionally, the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor may not be integrated into the processor 1580.

In the embodiment of the present application, the terminal includes a memory 1520 that can store the program code and transmit the program code to the processor.

The processor 1580 included in the terminal may execute the method for processing a video according to the instructions in the program code.

The embodiment of the present application further provides a computer-readable storage medium for storing a computer program, where the computer program is used to execute the video processing method provided by the foregoing embodiment.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.

It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video processing, the method comprising:

and generating a video to be fused according to the N merged video frames.

2. The method according to claim 1, wherein said texture merging the video frame and the mask texture according to the correspondence between the video frame and the mask texture to obtain N merged video frames for the target video frame comprises:

performing data compression on the transparency pixel image;

3. The method according to claim 1, wherein a blank pixel area is reserved around a mask texture in the merged video frame corresponding to the target video frame.

4. The method according to any one of claims 1-3, wherein after texture merging the video frame and the mask texture to obtain N merged video frames according to the correspondence between the video frame and the mask texture, the method further comprises:

determining a configuration file corresponding to the N merged video frames;

the generating a video to be fused according to the N merged video frames comprises:

5. The method according to any one of claims 1-3, further comprising:

acquiring audio data corresponding to the video to be processed;

6. The method according to any one of claims 1-3, further comprising:

performing video decoding on the video to be fused to acquire the N merged video frames;

fusing the acquired personalized textures into the N combined video frames to obtain a fused video;

and playing the video according to the fused video.

7. A video processing apparatus characterized in that the apparatus comprises an acquisition unit, a merging unit, and a generation unit:

8. The apparatus of claim 7, wherein the merging unit is further configured to:

performing data compression on the transparency pixel image;

9. A video processing apparatus, characterized in that the apparatus comprises a processor and a memory:

the processor is configured to perform the method of any of claims 1-6 according to instructions in the program code.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1 to 6.