CN112839236A

CN112839236A - Video processing method, device, server and storage medium

Info

Publication number: CN112839236A
Application number: CN202011632492.8A
Authority: CN
Inventors: 周甜甜
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-25

Abstract

The disclosure relates to a video processing method, a video processing device, a server and a storage medium. The video processing method comprises the following steps: receiving a first input of a user to a playing interface of a first video, wherein the first input is used for indicating a target audio stream; responding to the first input, and under the condition that the audio stream of the first video comprises the first audio stream, separating the first audio stream from the first video to obtain a second video; and synthesizing the target video according to the second video and the target audio stream. By adopting the video processing method, the video processing device, the server and the storage medium, the problem that the videos published by the video publishers cannot meet the personalized requirements of video audiences can be solved.

Description

Video processing method, device, server and storage medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a video processing method, an apparatus, a server, and a storage medium.

Background

With the rapid development of internet technology and intelligent electronic devices, users can watch videos in which the users are interested through clients.

At present, in the related art, the audio stream of the video seen by the video audience is usually selected by the video publisher, and after the video publisher publishes the video, the audio stream in the video cannot be changed, and the personalized requirement of the user cannot be met.

Disclosure of Invention

The present disclosure provides a video processing method, apparatus, server and storage medium, so as to at least solve the problem in the related art that videos published by video publishers cannot meet personalized requirements of video audiences.

The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video processing method, including:

receiving a first input of a user to a playing interface of a first video, wherein the first input is used for indicating a target audio stream; responding to the first input, and under the condition that the audio stream of the first video comprises the first audio stream, separating the first audio stream from the first video to obtain a second video; and synthesizing the target video according to the second video and the target audio stream.

According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including:

the receiving module is configured to execute receiving of a first input of a playing interface of a first video from a user, wherein the first input is used for indicating a target audio stream; the separation module is configured to perform separation of a first audio stream from a first video to obtain a second video in response to a first input in the case that the first audio stream is included in the audio stream of the first video; a compositing module configured to perform compositing the target video from the second video and the target audio stream.

According to a third aspect of the embodiments of the present disclosure, there is provided a server, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the video processing method as described in the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of a server, enable the server to perform the video processing method according to the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein the instructions of the computer program product, when executed by a processor of a server, enable the server to perform the video processing method according to the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, in response to a first input of a user to a play interface of a first video, in a case that an audio stream of the first video includes the first audio stream, the first audio stream is separated from the first video to obtain a second video, and then a target video is synthesized according to the second video and a target audio stream indicated by the first input. Thus, according to the embodiment of the application, the target audio stream determined by the user input can replace the original first audio stream in the video, so that a new target video is synthesized. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram illustrating a user viewing a video published by a video publisher according to an example embodiment.

Fig. 2 is a schematic diagram of a video processing method, an apparatus, an electronic device and a storage medium application environment according to an example embodiment.

Fig. 3 is a flow diagram illustrating a video processing method according to an example embodiment.

Fig. 4 is a schematic diagram of a live interface for displaying prompt information according to an embodiment of the present application.

Fig. 5 is a flow diagram illustrating another video processing method according to an example embodiment.

Fig. 6 is a block diagram illustrating a video processing apparatus according to an example embodiment.

Fig. 7 is a block diagram illustrating another video processing apparatus according to an example embodiment.

FIG. 8 is a block diagram illustrating a server in accordance with an exemplary embodiment.

Fig. 9 is a block diagram illustrating an apparatus for video processing according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the following, a specific implementation manner of a user watching a video published by a publisher in the related art is described by taking fig. 1 as an example.

As shown in fig. 1, when a video publisher publishes a video through a user terminal 10, the video publisher may add target background music to the video to be published, and transmit the target background music to each user terminal 20 (i.e., devices of video audiences) through a network 300. The target background music is not altered after the video is released. If the video published by the video publisher is a live video, that is, the video publisher is a main broadcast in a live broadcast room, the background music heard by the video audience at the time of watching the live video through the user terminal 20 is the target background music selected by the video publisher. If the distributed video is a short video, the background music is the target background music selected by the video distributor when the video audience sees the video through the user terminal 20.

Therefore, the audio stream of the video seen by the video audience is usually selected by the video publisher, and after the video publisher publishes the video, the audio stream in the video cannot be changed, so that the personalized requirements of the user cannot be met.

The method aims to solve the problem that the information flow in the live interface is large, so that the mentioned user misses the important message for mentioning the user.

The disclosure provides a video processing method, a video processing device, an electronic device and a storage medium. The video processing method, the video processing device, the electronic equipment and the storage medium are used for responding to a first input of a user to a playing interface of a first video, separating the first audio stream from the first video under the condition that the audio stream of the first video comprises the first audio stream to obtain a second video, and synthesizing the target video according to the second video and a target audio stream indicated by the first input. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

Fig. 2 is a schematic view of an application environment of a video processing method, an apparatus, an electronic device, and a storage medium according to one or more embodiments of the present disclosure. As shown in fig. 2, the server 100 is communicatively connected to one or more user terminals 200 via a network 300 for data communication or interaction. The server 100 may be a web server, a database server, or the like. The user end 200 may be, but is not limited to, a Personal Computer (PC), a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The network 300 may be a wired or wireless network.

Specifically, in the case that the first video is a live video, each of the plurality of user terminals 200 may be an anchor device or an audience device, and a live broadcast room may be constructed by using software and hardware resources of the server 100 and the plurality of user terminals 200. In the live broadcasting process of the live broadcasting room, the anchor device in the plurality of user terminals 200 sends the live broadcasting content to the server 100, and then the server 100 sends the live broadcasting content to the audience devices in the plurality of user terminals 200, so that the effects that the live broadcasting user publishes the live broadcasting content to the live broadcasting room and the audience users watch the live broadcasting content published by the anchor user through logging in the live broadcasting room are achieved.

In addition, in the live broadcasting process of the live broadcasting room, the main broadcasting device may select the background music (i.e., the first audio stream) of the live broadcasting room, and each of the plurality of user terminals 200 may select to replace the background music (i.e., the first audio stream) of the live broadcasting content played by the user terminal device with other background music (i.e., the target audio stream), so as to achieve an effect of replacing the background music of the live broadcasting content played by the user terminal device with the background music (i.e., the target audio stream) that the user wants to hear.

The following will explain the video processing method provided by the embodiment of the present disclosure in detail.

The video processing method provided by the embodiment of the present disclosure can be applied to the user side 200, and for convenience of description, the embodiment of the present disclosure takes the user side 200 as an execution main body unless otherwise specified. It is to be understood that the subject matter described is not to be construed as limiting the disclosure.

Next, a video processing method provided by the present disclosure will be described first.

As shown in fig. 3, the video processing method may include the following steps.

S310, receiving a first input of a user to a playing interface of the first video, wherein the first input is used for indicating a target audio stream.

S320, responding to the first input, and under the condition that the audio stream of the first video comprises the first audio stream, separating the first audio stream from the first video to obtain a second video.

And S330, synthesizing the target video according to the second video and the target audio stream.

Specific implementations of the above steps will be described in detail below.

In the embodiment of the disclosure, in response to a first input of a user to a play interface of a first video, in a case that an audio stream of the first video includes the first audio stream, the first audio stream is separated from the first video to obtain a second video, and then a target video is synthesized according to the second video and a target audio stream indicated by the first input. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

Specific implementations of the above steps are described below.

First, S310 is referred to.

In some embodiments of the present disclosure, before the step of receiving the first input of the user to the playing interface of the first video, the following steps may be further included:

receiving a second input of the user to the playing interface; in response to the second input, displaying at least one audio stream identification, the audio stream identification including a target audio stream identification, the target audio stream identification indicating a target audio stream;

correspondingly, the step of receiving the first input of the user to the play interface of the first video may specifically include: a first input of a user identification for a target audio stream is received.

Specifically, first, in response to a second input to the play interface by the user, at least one audio stream identifier including the target audio stream identifier is displayed; here, the feature information (such as the release time, the video topic, the video category, etc.) of the first video may be obtained, some audio streams related to the first video may be determined according to the feature information of the video, and the audio stream identifications of the audio streams related to the first video may be displayed to the user for the user to select from. Then, a first input of the target audio stream identification by the user is received, and an indication target audio stream of the target audio stream identification is determined for synthesizing the target audio stream and the second video into the target video.

As shown in FIG. 4, in response to a second input to the playback interface by the user, at least one audio stream identification including the target audio stream identification is displayed: background music 1, background music 2, background music 3, and background music 4. Then, a first input of the background music 2 (target audio stream identifier) by the user is received, so as to synthesize the second video with the target audio stream corresponding to the target audio stream identifier, and obtain a target video.

Wherein the at least one audio stream identification referred to above may be used to indicate an audio stream local to the electronic device. The target audio stream identification may be used to indicate a target audio stream local to the electronic device. In the case that the target audio stream is a local resource of the electronic device, it may be first detected whether the target audio stream uploaded by the user meets the network security specification.

Therefore, by responding to the second input of the user to the playing interface, displaying at least one audio stream identifier and then receiving the first input of the user to the target audio stream identifier, the user can select the target audio stream which is expected to be applied to the first video, and the user experience is improved.

In other embodiments of the present disclosure, after the step of receiving the first input of the user to the playing interface of the first video, the method may further include the following steps: and responding to the first input, and displaying a target video, wherein the target video is obtained by replacing the first audio stream in the first video with the target audio stream.

Specifically, sending replacement request information to a preset server, wherein the replacement request information is used for requesting to replace a first audio stream in a first video; and receiving response information returned by the preset server, wherein the response information comprises that the target video replaces the first audio stream with the target audio stream.

And then to S320.

In some embodiments of the present disclosure, before the step of separating the first audio stream from the first video, the following steps may be further included:

identifying a source of an audio stream in a first video; an audio stream of a target source is determined as a first audio stream.

Specifically, the sources of the audio stream in the video may include: the existing audio stream in the video before the video publisher publishes the video; an audio stream (i.e., background music) added to the video when the video publisher publishes the video. When a video publisher publishes a video, the source of the audio stream added to the video is usually the video application publishing the video.

Therefore, the first audio stream in the first video can be quickly and accurately identified by identifying the source of the audio stream in the first video and determining the audio stream of the target source as the first audio stream.

sending first request information to a preset server, wherein the first request information is used for requesting to identify a first audio stream in a first video; receiving first response information returned by a preset server, wherein the first response information comprises an identifier corresponding to a first audio stream; and determining the first audio stream according to the identification in the first response information.

Firstly, sending first request information for requesting to identify a first audio stream in a first video to a preset server; after receiving the first request information, the preset server identifies the source of the audio stream in the first video, determines the audio stream of the target source as a first audio stream, and returns first response information including an identifier corresponding to the first audio stream to the client; then, receiving first response information returned by a preset server; and determining the first audio stream according to the identification in the first response information.

Therefore, the first request information for requesting to identify the first audio stream in the first video is sent to the preset server to obtain the identifier corresponding to the first audio stream, so that the first audio stream can be determined quickly and accurately according to the identifier in the first response information.

In some embodiments of the present disclosure, after the step of receiving the first input of the user to the playing interface of the first video, the following steps may be further included:

sending second request information to a preset server, wherein the second request information is used for requesting a target video; and receiving second response information returned by the preset server, wherein the second response information comprises the target video.

Specifically, after the second request information is sent to the preset server, the preset server separates the first audio stream from the first video to obtain the second video when the audio stream of the first video includes the first audio stream. Then, the target video is synthesized from the second video and the target audio stream. Finally, second response information including the target video returned by the preset server can be received.

Therefore, the target video can be quickly acquired by sending the second request information for requesting the target video to the preset server.

In some embodiments of the present disclosure, the first video referred to above is a live video or a short video.

Specifically, it may be detected whether the audio stream of the live video is a single audio stream, if a plurality of audio streams, such as an audio stream of both the anchor and background music (i.e., the first audio stream), are superimposed in the audio stream of the live video. Since the audio stream of the background music selected by the anchor at the time of playing can be directly recognized, the audio stream of the background music can be directly stripped off and replaced with the target audio stream. If the audio stream of the live video is a single background music, it can be directly replaced with the target audio stream. The video processing method of the short video is the same, and is not described herein again.

Wherein, the first video is a live video, and before the step of separating the first audio stream from the first video to obtain the second video, the method may further include the following steps: caching the first video with preset duration to obtain a third video;

correspondingly, separating the first audio stream from the first video to obtain the second video comprises: and separating the first audio stream from the third video to obtain a second video.

Because live broadcasting is carried out in real time, a live video stream with preset duration needs to be cached at a user side to obtain a third video, a first audio stream is separated from the cached third video, and then subsequent synthesis is carried out.

And finally to S330.

Since the appropriate background music will give the user a sense of substitution when watching the video. For example, when watching emotion programs, some emotion songs can be added, and can be naturally blended into emotion stories in the video.

Synthesizing the target video by the second video and the target audio stream. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

In some embodiments of the present disclosure, after the step of synthesizing the target video, the following steps may be further included:

receiving a target video issuing instruction of a user; in response to the issuing instruction, the target video is issued.

After synthesizing the target video from the second video and the target audio stream, the target video may be published in response to a publication instruction of the target video by a user.

In some embodiments of the present disclosure, after the step of synthesizing the target video, the following steps may be further included: and playing the target video.

When the first video is a live video, the first audio stream of the first video is replaced by the target audio stream while the user watches the live video in real time. I.e. as the live video proceeds, the step of composing the target video from the second video and the target audio stream follows. Therefore, the user can listen to the audio which the user wants to hear in real time while watching the live video, and the user experience is improved.

In summary, in the embodiment of the disclosure, in response to a first input of a user to a playing interface of a first video, when an audio stream of the first video includes the first audio stream, the first audio stream is separated from the first video to obtain a second video, and then a target video is synthesized according to the second video and a target audio stream indicated by the first input. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

Based on the video processing method, the disclosure also provides another video processing method. This is explained with reference to fig. 5. Fig. 5 is a flow diagram illustrating another video processing method according to an example embodiment.

For convenience of description, the embodiment shown in fig. 5 is described by taking the server 100 as an execution subject except for special description. It is to be understood that the subject matter described is not to be construed as limiting the disclosure.

Next, another video processing method provided by the present disclosure will be described first.

As shown in fig. 5, the video processing method may include the following steps.

S510, receiving a second request message sent by the user end, where the second request message is used to request to replace the first audio stream in the first video with the target audio stream.

S520, under the condition that the audio stream of the first video includes the first audio stream, separating the first audio stream from the first video to obtain a second video.

S530, synthesizing the target video according to the second video and the target audio stream.

And S540, sending second response information to the user side, wherein the second response information comprises the target video.

Specific implementations of the above steps will be described in detail below.

In the embodiment of the present disclosure, in a case where the first audio stream is included in the audio stream of the first video, the first audio stream is separated from the first video to obtain the second video, and then the target video is synthesized according to the second video and the target audio stream indicated by the first input. Because the target audio stream in the synthesized target video is replaced by the second request information request sent by the user side, the synthesized target video can enable the user to hear the audio which the user wants to hear in the watched video, the watching function of the video is enriched, and the personalized requirements of the user are met.

Specific implementations of the above steps are described below.

First, S510 is referred to.

Here, the first audio stream referred to above may be an audio stream added to a video when the video publisher publishes the video; the target audio stream referred to above may be an audio stream selected by the user.

Then, S520 is referred to.

And under the condition that the audio stream of the first video comprises the first audio stream, separating the first audio stream from the first video to obtain a second video.

Specifically, it may be detected whether the audio stream of the first video is a single audio stream, if a plurality of audio streams, such as an audio stream of both the first video publisher and background music (i.e., the first audio stream), are superimposed in the audio stream of the first video. Since the audio stream of the background music selected by the first video publisher when uploading the first video can be directly identified, the audio stream of the background music (i.e. the first audio stream) can be directly stripped from the first video to obtain the second video.

In a case where the first video is a live video, before the step involving S520, the method may further include the steps of: caching the first video with preset duration to obtain a third video;

Next, S530 is referred to.

Synthesizing the target video by the second video and the target audio stream. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can synthesize the target video according to the personalized requirement of the user.

Finally, S540 is referred to.

And sending second response information comprising the target video to the user terminal.

Based on the video processing method, the disclosure also provides a video processing device. This is explained with reference to fig. 6.

Fig. 6 is a block diagram illustrating a video processing apparatus according to an example embodiment. Referring to fig. 6, the video processing apparatus 600 may include a receiving module 610, a separating module 620, and a synthesizing module 630.

The first receiving module 610 is configured to perform receiving a first input of a playing interface of the first video from a user, where the first input is used for indicating a target audio stream.

A first separation module 620 configured to perform, in response to the first input, in a case where the first audio stream is included in the audio stream of the first video, separating the first audio stream from the first video, resulting in a second video.

A first compositing module 630 configured to perform compositing the target video from the second video and the target audio stream.

In the embodiment of the present disclosure, the video processing apparatus 600 is capable of synthesizing the target video according to the second video and the target audio stream indicated by the first input, by separating the first audio stream from the first video in a case where the first audio stream is included in the audio stream of the first video in response to the first input of the user to the play interface of the first video, obtaining the second video. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

In some embodiments of the present disclosure, the video processing apparatus 600 may further include:

an identification module configured to perform identifying a source of an audio stream in a first video.

A first determination module configured to perform the determination of the audio stream of the target source as the first audio stream.

the first sending module is configured to execute sending first request information to a preset server, wherein the first request information is used for requesting to identify a first audio stream in a first video.

The first receiving module 610 is further configured to perform receiving of first response information returned by the preset server, where the first response information includes an identifier corresponding to the first audio stream.

A second determination module configured to perform determining the first audio stream according to the identification in the first response information.

the first sending module is configured to execute sending of second request information to a preset server, wherein the second request information is used for requesting a target video;

the first receiving module 610 is further configured to perform receiving of second response information returned by the preset server, where the second response information includes the target video.

In some embodiments of the present disclosure, the first receiving module 610 is further configured to perform receiving a second input of the user to the playing interface.

A display module configured to perform displaying, in response to the second input, at least one audio stream identification, the audio stream identification including a target audio stream identification indicating the target audio stream.

The first receiving module 610 is further configured to perform receiving a first input of the target audio stream identification by the user.

In some embodiments of the present disclosure, the first video is a live video or a short video.

a playing module configured to execute playing the target video.

In some embodiments of the present disclosure, the receiving module 610 is further configured to execute receiving a user's issuing instruction for the target video.

Accordingly, the video processing apparatus 600 may further include:

and the issuing module is configured to execute issuing the target video in response to the issuing instruction.

Based on another video processing method shown in fig. 5, the present disclosure also provides another video processing apparatus. This is explained with reference to fig. 7.

Fig. 7 is a block diagram illustrating a video processing apparatus according to an example embodiment. Referring to fig. 7, the video processing apparatus 700 may include a second receiving module 710, a second separating module 720, a second synthesizing module 730, and a second transmitting module 740.

The second receiving module 710 is configured to execute receiving second request information sent by the user end, where the second request information is used to request to replace the first audio stream in the first video with the target audio stream.

A second separating module 720, configured to perform, in a case that the first audio stream is included in the audio stream of the first video, separating the first audio stream from the first video to obtain a second video.

A second compositing module 730 configured to perform compositing a target video from the second video and the target audio stream.

A second sending module 740 configured to execute sending a second response message to the user terminal, where the second response message includes the target video.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 8 is a block diagram illustrating a server in accordance with an exemplary embodiment. Referring to fig. 8, an embodiment of the present disclosure further provides a server, including a processor 810, a communication interface 820, a memory 830 and a communication bus 840, where the processor 810, the communication interface 820 and the memory 830 complete communication with each other through the communication bus 840.

The memory 830 is used for storing instructions executable by the processor 810.

The processor 810, when executing the instructions stored in the memory 830, implements the following steps:

It can be seen that, by applying the embodiment of the present disclosure, in response to a first input of a user to a play interface of a first video, in a case that an audio stream of the first video includes the first audio stream, the first audio stream is separated from the first video to obtain a second video, and then a target video is synthesized according to the second video and a target audio stream indicated by the first input. Because the target audio stream in the synthesized target video is selected by the user according to the requirement of the user, the user can set the audio which the user wants to hear in the watched video according to the personalized requirement of the user, the watching function of the video is enriched, and the personalized requirement of the user is met.

FIG. 9 is a block diagram illustrating an apparatus for data processing according to an example embodiment. For example, the device 900 may be provided as a server. Referring to fig. 9, the server 900 includes a processing component 922 that further includes one or more processors and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by the processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 922 is configured to execute instructions to perform a video processing method as described in any of the above embodiments.

The device 900 may also include a power component 926 configured to perform power management of the device 900, a wired or wireless network interface 950 configured to connect the device 900 to a network, and an input/output (I/O) interface 958. The device 900 may operate based on an operating system stored in the memory 932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In some embodiments of the present disclosure, there is also provided a storage medium, wherein instructions of the storage medium, when executed by a processor of a server, enable the server to perform the video processing method according to any one of the above embodiments.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In some embodiments of the present disclosure, there is further provided a computer program product, wherein instructions of the computer program product, when executed by a processor of a server, enable the server to perform the video processing method according to any of the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video processing method, comprising:

receiving a first input of a user to a playing interface of a first video, wherein the first input is used for indicating a target audio stream;

responding to the first input, and under the condition that the audio stream of the first video comprises a first audio stream, separating the first audio stream from the first video to obtain a second video;

and synthesizing a target video according to the second video and the target audio stream.

2. The method of claim 1, wherein prior to said separating said first audio stream from said first video, said method further comprises:

identifying a source of an audio stream in the first video;

an audio stream of a target source is determined as the first audio stream.

3. The method of claim 1, wherein prior to said separating said first audio stream from said first video, said method further comprises:

sending first request information to a preset server, wherein the first request information is used for requesting to identify the first audio stream in the first video;

receiving first response information returned by the preset server, wherein the first response information comprises an identifier corresponding to the first audio stream;

determining the first audio stream according to the identifier in the first response information.

4. The method of claim 1, wherein after receiving a first input from a user to a play interface for a first video, the method further comprises:

sending second request information to a preset server, wherein the second request information is used for requesting the target video;

and receiving second response information returned by the preset server, wherein the second response information comprises the target video.

5. The method of claim 1, wherein prior to the receiving the first input from the user to the playback interface for the first video, the method further comprises:

receiving a second input of the user to the playing interface;

in response to the second input, displaying at least one audio stream identification, the audio stream identification including the target audio stream identification, the target audio stream identification indicating the target audio stream;

the receiving of the first input of the user to the playing interface of the first video includes:

a first input of a user identification for the target audio stream is received.

6. A video processing method, comprising:

receiving second request information sent by a user side, wherein the second request information is used for requesting to replace a first audio stream in a first video with a target audio stream;

under the condition that the audio stream of the first video comprises the first audio stream, separating the first audio stream from the first video to obtain a second video;

synthesizing a target video according to the second video and the target audio stream;

and sending second response information to the user side, wherein the second response information comprises the target video.

7. A video processing apparatus, comprising:

a first receiving module configured to execute receiving a first input of a playing interface of a first video from a user, wherein the first input is used for indicating a target audio stream;

a first separation module configured to perform, in response to the first input, in a case where a first audio stream is included in an audio stream of the first video, separating the first audio stream from the first video, resulting in a second video;

a first compositing module configured to perform compositing a target video from the second video and the target audio stream.

8. A video processing apparatus, comprising:

the second receiving module is configured to execute receiving of second request information sent by a user side, wherein the second request information is used for requesting to replace a first audio stream in the first video with a target audio stream;

a second separation module configured to separate the first audio stream from the first video to obtain a second video when the first audio stream is included in the audio stream of the first video;

a second compositing module configured to perform compositing a target video from the second video and the target audio stream;

a second sending module configured to execute sending a second response message to the user terminal, where the second response message includes the target video.

9. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video processing method of any of claims 1 to 6.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor of a server, enable the server to perform the video processing method of any one of claims 1 to 6.