CN113473170B - Live audio processing method, device, computer equipment and medium - Google Patents

Live audio processing method, device, computer equipment and medium Download PDF

Info

Publication number
CN113473170B
CN113473170B CN202110807055.3A CN202110807055A CN113473170B CN 113473170 B CN113473170 B CN 113473170B CN 202110807055 A CN202110807055 A CN 202110807055A CN 113473170 B CN113473170 B CN 113473170B
Authority
CN
China
Prior art keywords
audio
audio stream
voice
stream
live
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110807055.3A
Other languages
Chinese (zh)
Other versions
CN113473170A (en
Inventor
何思远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fanxing Huyu IT Co Ltd
Original Assignee
Guangzhou Fanxing Huyu IT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fanxing Huyu IT Co Ltd filed Critical Guangzhou Fanxing Huyu IT Co Ltd
Priority to CN202110807055.3A priority Critical patent/CN113473170B/en
Publication of CN113473170A publication Critical patent/CN113473170A/en
Application granted granted Critical
Publication of CN113473170B publication Critical patent/CN113473170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application discloses a live broadcast audio processing method, a live broadcast audio processing device, computer equipment and a live broadcast audio processing medium, and belongs to the technical field of computers. The method comprises the following steps: in a live broadcasting process of a live broadcasting room, receiving a plurality of paths of audio streams of target songs, wherein the plurality of paths of audio streams comprise a human voice audio stream of a target object, a mixed audio stream of the target object and an original singing audio stream; playing in the live broadcast room based on the first audio stream; and switching the first audio stream into a second audio stream in the multi-channel audio stream in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, and playing the second audio stream in the live broadcasting room. The method can switch between the voice audio stream, the mixed audio stream and the original singing audio stream, realizes the audio stream switching, and enables various sounds to be played in the live broadcasting room, thereby meeting the requirements of different audiences and improving the live broadcasting effect.

Description

Live audio processing method, device, computer equipment and medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a live audio processing method, a live audio processing device, computer equipment and a medium.
Background
With the development of audio processing technology, the application of the audio processing technology is also becoming wider, for example, the audio processing technology is applied to the live broadcast field, the audio processing technology is adopted to process the voice of the host broadcast, and the processed voice is played in the live broadcast room.
In the related art, a main broadcasting terminal mixes a main broadcasting voice with accompaniment audio of a song and transmits the mixed voice to a viewer terminal, so that viewers in a living broadcasting room hear the main broadcasting voice and the accompaniment of the song. However, in this way, only the voice and accompaniment can be played in the live broadcasting room, the played voice is single, and the requirements of different audiences cannot be met, so that the live broadcasting effect is affected.
Disclosure of Invention
The embodiment of the application provides a live broadcast audio processing method, a live broadcast audio processing device, computer equipment and a medium, realizes the switching among multiple paths of audio streams and improves the live broadcast effect. The technical scheme is as follows:
in one aspect, a live audio processing method is provided, the method including:
in a live broadcasting process of a live broadcasting room, receiving a plurality of paths of audio streams of a target song, wherein the plurality of paths of audio streams comprise a human voice audio stream of a target object, a mixed audio stream of the target object and an original singing audio stream, and the mixed audio stream is obtained by mixing the human voice of the target object and accompaniment audio of the target song;
Playing in the live broadcast room based on a first audio stream, wherein the first audio stream is any audio stream in the multi-channel audio stream;
and switching the first audio stream into a second audio stream in the multi-channel audio stream in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, and playing the second audio stream in the live broadcasting room, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of the corresponding audio fragment.
In one possible implementation manner, the switching the first audio stream to the second audio stream in the multiple audio streams in response to the audio quality information corresponding to the voice audio stream meeting a switching condition includes:
and switching the first audio stream to the second audio stream in response to the automatic switching function for the audio stream being in an on state and the audio quality information corresponding to the voice audio stream meeting the switching condition.
In another possible implementation manner, the method further includes, in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, switching the first audio stream to a second audio stream in the multiple audio streams, and before playing based on the second audio stream in the live broadcasting room:
And analyzing the stream message corresponding to the voice audio stream to obtain the audio quality information of the audio fragment.
In another possible implementation manner, the playing in the live room based on the second audio stream includes:
determining a second timestamp adjacent to the first timestamp in the second audio stream based on a first timestamp of a currently played audio clip, the second timestamp being located after the first timestamp;
and when the playing of the currently played audio fragment is finished, playing the audio fragment corresponding to the second timestamp in the second audio stream in the live broadcasting room.
In another possible implementation, the method further includes:
and when playing based on the voice audio stream, responding to the triggering operation of a playing control related to the original voice audio stream, and playing based on the original voice audio stream.
In another aspect, a live audio processing method is provided, the method including:
in a live broadcasting process of a live broadcasting room, acquiring a voice sent by a target object according to a target song, and obtaining a voice audio stream of the target object;
mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;
And sending the voice audio stream and the mixed audio stream to a target server, wherein the target server is used for acquiring an original singing audio stream of the target song, and sending the voice audio stream, the mixed audio stream and the original singing audio stream to a first terminal.
In one possible implementation, before the sending the voice audio stream and the mixed audio stream to the target server, the method further includes:
and respectively identifying each audio fragment in the voice audio stream to obtain audio quality information corresponding to the voice audio stream, wherein the audio quality information is used for representing the audio quality of the corresponding audio fragment.
In another aspect, a live audio processing system is provided, the live audio processing system including a first terminal, a target server, and a second terminal;
the second terminal is used for acquiring the voice of a target object according to a target song in a live broadcasting process of a live broadcasting room to obtain a voice audio stream of the target object, mixing the voice of the target object with accompaniment audio of the target song to obtain a mixed audio stream of the target object, and sending the voice audio stream and the mixed audio stream to the target server;
The target server is configured to obtain an original singing audio stream of the target song, and send the vocal audio stream, the mixed audio stream and the original singing audio stream to the first terminal;
the first terminal is configured to receive the vocal audio stream, the mixed audio stream and the original audio stream in a live broadcast process in the live broadcast room, and play the vocal audio stream in the live broadcast room based on a first audio stream, where the first audio stream is any one of the multiple audio streams;
the first terminal is configured to switch the first audio stream to a second audio stream in the multiple audio streams in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, play the second audio stream in the live broadcasting room, where the second audio stream is different from the first audio stream, and the audio quality information is used to represent audio quality of a corresponding audio segment.
In another aspect, a live audio processing apparatus is provided, the apparatus comprising:
the audio stream receiving module is used for receiving multiple paths of audio streams of a target song in a live broadcast process of a live broadcast room, wherein the multiple paths of audio streams comprise a human voice audio stream of the target object, a mixed audio stream of the target object and an original singing audio stream, and the mixed audio stream is obtained by mixing the human voice of the target object and accompaniment audio of the target song;
The playing module is used for playing based on the first audio stream in the live broadcasting room;
and the audio stream switching module is used for switching the first audio stream into a second audio stream in the multi-channel audio stream in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, playing the second audio stream in the live broadcasting room, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of the corresponding audio fragment.
In one possible implementation manner, the audio stream switching module is configured to switch the first audio stream to the second audio stream in response to an automatic switching function for an audio stream being in an on state, and audio quality information corresponding to the vocal audio stream meeting the switching condition.
In another possible implementation, the apparatus further includes:
and the quality information acquisition module is used for analyzing the stream message corresponding to the voice audio stream to obtain the audio quality information of each audio fragment.
In another possible implementation manner, the audio stream switching module includes:
a time stamp determining unit configured to determine, based on a first time stamp of a currently played audio clip, a second time stamp adjacent to the first time stamp in the second audio stream, the second time stamp being located after the first time stamp;
And the audio clip playing unit is used for playing the audio clip corresponding to the second time stamp in the second audio stream in the live broadcasting room when the playing of the audio clip currently played is finished.
In another possible implementation manner, the playing module is further configured to:
and when playing based on the voice audio stream, responding to the triggering operation of a playing control related to the original voice audio stream, and playing based on the original voice audio stream.
In another aspect, a live audio processing apparatus is provided, the apparatus comprising:
the voice acquisition module is used for acquiring voice sent by a target object according to a target song in the live broadcasting process of a live broadcasting room to obtain voice audio streams of the target object;
the mixing module is used for mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;
the audio stream sending module is used for sending the voice audio stream and the mixed audio stream to a target server, wherein the target server is used for obtaining an original singing audio stream of the target song and sending the voice audio stream, the mixed audio stream and the original singing audio stream to a first terminal.
In one possible implementation, the apparatus further includes:
the quality information acquisition module is used for respectively identifying each audio fragment in the voice audio stream to obtain audio quality information corresponding to the voice audio stream, and the audio quality information is used for representing the audio quality of the corresponding audio fragment.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one program code that is loaded and executed by the processor to implement the operations performed in the live audio processing method as described in the above aspects.
In another aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement operations performed in a live audio processing method as described in the above aspects is provided.
In another aspect, a program code product or program code is provided, the program code product or program code comprising program code stored in a computer readable storage medium, the program code being loaded and executed by a processor to implement the operations performed in the live audio processing method as described in the above aspects.
According to the method, the device, the computer equipment and the storage medium provided by the embodiment of the application, in the live broadcasting process of a live broadcasting room, the human voice audio stream, the mixed audio stream and the original singing audio stream corresponding to the target song are obtained, in the playing process based on the first audio stream, when the audio quality information corresponding to the human voice audio stream meets the switching condition, the audio stream can be switched, the first audio stream is switched to the second audio stream different from the first audio stream, the switching among multiple paths of audio streams is realized, and multiple sounds can be played in the live broadcasting room, so that the requirements of different audiences are met, and the live broadcasting effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a live audio processing system according to an embodiment of the present application;
fig. 2 is a flowchart of a live audio processing method according to an embodiment of the present application;
Fig. 3 is a flowchart of another live audio processing method according to an embodiment of the present application;
fig. 4 is a flowchart of another live audio processing method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a switching audio stream according to an embodiment of the present application;
FIG. 6 is a schematic diagram of generating an audio stream according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a live audio processing device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another live audio processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a live audio processing device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another live audio processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.
It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, the first switching condition may be referred to as a second switching condition and the second switching condition may be referred to as a first switching condition without departing from the scope of the present application.
The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality. For example, the plurality of audio clips includes 3 audio clips, and each audio clip refers to each audio clip of the 3 audio clips, and any one of the 3 audio clips may be the first audio clip, the second audio clip, or the third audio clip.
Fig. 1 is a schematic diagram of a live audio processing system according to an embodiment of the present application. Referring to fig. 1, the live audio processing system includes at least one first terminal 101 (1 in fig. 1 as an example), a target server 102, and a second terminal 103. Each first terminal 101 and the target server 102 are connected through a wireless or wired network, and each second terminal 103 and the target server 102 are connected through a wireless or wired network.
The first terminal 101 and the second terminal 103 have installed thereon a target application served by the target server 102, by which the first terminal 101 and the second terminal 103 can realize functions such as data transmission, message interaction, and the like. Optionally, the first terminal 101 and the second terminal 103 are computers, mobile phones, tablet computers or other terminals. Alternatively, the target application is a target application in the operating systems of the first terminal 101 and the second terminal 103, or a target application provided for a third party. For example, the target application is a live application having a live function, but of course, the live application can also have other functions, such as a shopping function, a navigation function, a game function, and the like. Optionally, the target server 102 is a background server of the target application or a cloud server that provides services such as cloud computing and cloud storage.
Based on the live audio processing system provided in fig. 1, the method provided in the embodiment of the application can be applied to a live singing scene. For example, a host broadcasts and sings in a live broadcasting room through a second terminal, the second terminal collects the voice of the host, mixes the voice with the accompaniment audio of the song to obtain a mixed audio stream, sends the voice audio stream corresponding to the voice and the mixed audio stream to a live broadcasting server, the live broadcasting server also obtains the original singing audio stream of the song, sends the voice audio stream, the mixed audio stream and the original singing audio stream to a first terminal, after receiving each audio stream based on the first terminal, a viewer in the live broadcasting room plays audio fragments in the voice audio stream by default by the first terminal, and then the first terminal can switch the voice audio stream into the mixed audio stream or the original singing audio stream according to the audio quality corresponding to the voice audio stream, or the viewer can manually switch the audio wanted by himself.
Fig. 2 is a flowchart of a live audio processing method according to an embodiment of the present application. The execution main body of the embodiment of the application is a first terminal. Referring to fig. 2, the method includes the steps of:
201. and the first terminal receives the multipath audio streams of the target songs in the live broadcasting process of the live broadcasting room.
The multi-path audio stream comprises a human voice audio stream of a target object, a mixed audio stream of the target object and an original singing audio stream, wherein the mixed audio stream is obtained by mixing the human voice of the target object and accompaniment audio of a target song. The target object is a live host, the target song is any song, the human voice audio stream is obtained by collecting human voice sent by the target object according to the target song, and the original audio stream is the original audio of the target song.
202. The first terminal plays in the live broadcast room based on a first audio stream, wherein the first audio stream is any audio stream in the multipath audio streams.
The first audio stream may be any one of a human audio stream, a mixed audio stream, or an original audio stream. The first audio stream comprises a plurality of audio fragments, and after the first terminal receives the multi-channel audio stream, the audio fragments in the first audio stream are played in the live broadcast room.
203. The first terminal responds to the audio quality information corresponding to the voice audio stream to meet the switching condition, the first audio stream is switched to a second audio stream in the multipath audio streams, the second audio stream is used for playing in the live broadcasting room, and the second audio stream is different from the first audio stream.
Wherein the second audio stream is a different audio stream from the first audio stream in the multiple audio streams. For example, the first audio stream is a human voice audio stream, and the second audio stream is either a mixed audio stream or an original audio stream. The audio quality information is used to represent the audio quality of the corresponding audio clip in the human voice audio stream. The switching condition refers to a condition that a first audio stream is switched to a second audio stream, the first audio stream is a voice audio stream, and the voice audio stream is switched to an original voice audio stream or a mixed audio stream under the condition that the audio quality information meets different switching conditions; the first audio stream is an original singing audio stream, and the voice audio stream is switched to a voice audio stream or a mixed audio stream under the condition that the audio quality information meets different switching conditions; the first audio stream is a mixed audio stream, and the mixed audio stream is switched to a human voice audio stream or an original singing audio stream under the condition that the audio quality information meets different switching conditions.
In the embodiment of the application, the first terminal determines whether to switch the current first audio stream according to the audio quality of each audio fragment in the voice audio stream, and switches the first audio stream into the corresponding second audio stream under the condition that the audio quality information meets the switching condition, thereby realizing the switching of the audio streams.
According to the method provided by the embodiment of the application, in the live broadcasting process of the live broadcasting room, the human voice audio stream, the mixed audio stream and the original singing audio stream corresponding to the target song are obtained, in the playing process based on the first audio stream, when the audio quality information corresponding to the human voice audio stream meets the switching condition, the audio stream can be switched, the first audio stream is switched to the second audio stream different from the first audio stream, the switching among multiple paths of audio streams is realized, and multiple sounds can be played in the live broadcasting room, so that the requirements of different audiences are met, and the live broadcasting effect is improved.
Fig. 3 is a flowchart of another live audio processing method according to an embodiment of the present application. The execution main body of the embodiment of the application is a second terminal. Referring to fig. 3, the method includes the steps of:
301. and the second terminal acquires the voice of the target object according to the target song in the live broadcasting process of the live broadcasting room, and obtains the voice audio stream of the target object.
The target object is a live host, the host sings a target song in a live broadcasting room, and the second terminal acquires the voice of the target object, so that a voice audio stream is obtained, and the voice of the host singing song is obtained.
302. And the second terminal mixes the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object.
The second terminal acquires the accompaniment audio of the target song, and mixes the acquired voice with the accompaniment audio to obtain a mixed audio stream, namely the mixed audio stream comprises the voice and the accompaniment audio.
303. The second terminal transmits the human voice audio stream and the mixed audio stream to the target server.
The target server also obtains an original singing audio stream of the target song, and sends a human voice audio stream, a mixed audio stream and the original singing audio stream to the first terminal, so that the first terminal receives three audio streams, and when playing based on the audio streams, the three audio streams can be switched.
According to the method provided by the embodiment of the application, the voice audio stream is obtained through the collected voice, the voice and the accompaniment audio of the target song are mixed to obtain the mixed audio stream, so that the voice audio stream and the mixed audio stream corresponding to the target song are obtained, the obtained voice audio stream and the mixed audio stream are sent to the first terminal through the target server, the target server also sends the obtained original voice audio stream to the first terminal, so that the subsequent first terminal can switch the audio stream based on the received multipath audio streams, and various sounds can be played in the live broadcasting room, and the live broadcasting effect is improved.
Fig. 4 is a flowchart of another live audio processing method according to an embodiment of the present application. The interaction main body of the embodiment of the application is a first terminal, a second terminal and a target server. Referring to fig. 4, the method includes the steps of:
401. and the second terminal acquires the voice of the target object according to the target song in the live broadcasting process of the live broadcasting room, and obtains the voice audio stream of the target object.
The target object is a live host, and the target song is any song. And in the live broadcasting process of the target object through the second terminal, the second terminal acquires the voice sent by the target object, and processes the voice to obtain a voice audio stream. The second terminal can collect the voice emitted by the target object through the microphone.
In one possible implementation, the second terminal installs a target application, for example, the target application is a live application, a video application, or other applications that can be live. And the target object directly sings the target song through the target application, and the second terminal collects the voice sent by the target object.
In one possible implementation manner, the second terminal collects the voice to the sound card, the sound card performs sound on the voice, that is, the sound card performs tuning on the voice, and then the second terminal obtains a voice audio stream based on the processed voice.
In one possible implementation, the second terminal encodes the collected voice based on the target audio protocol to obtain a voice audio stream, where the voice audio stream includes a plurality of audio data packets and an audio header declaration. The audio data packet comprises an audio fragment corresponding to the voice of a person, the audio head declaration comprises audio format information and decoding information, and the audio head declaration is used for initializing a decoder so that the decoder decodes the audio data packet based on the audio head declaration to obtain the audio fragment in the audio data packet. Wherein the audio header declaration is only required to be sent once when the first audio data packet is sent, and then the audio header declaration is not required to be sent again under the condition that the audio format is not modified, and the audio header declaration is sent again under the condition that the audio format is modified, and the audio header declaration after being sent again comprises the modified audio format information and decoding information.
For example, the target audio protocol is RTMP (Real Time Messaging Protocol, real-time messaging protocol), which is a push protocol designed based on the container structure of FLV (Flash Video, streaming media format), where FLV supports audio format, video format, and script information format, and also supports user-defined data types, on the basis of FLV, the user can customize the data types of multiple audio streams. For example, the respective data types for FLV are shown in table 1 below:
TABLE 1
The live audio is the audio after the mixture of the voice and the accompaniment audio. The left identifier in table 1 is the identifier corresponding to each data type in the audio protocol.
The audio header declaration contains the data as follows:
the audio data packet contains the following data:
in addition, in one possible implementation manner, after the second terminal obtains the voice audio stream, each audio segment in the voice audio stream is identified respectively to obtain audio quality information corresponding to the voice audio stream, and the audio quality information is added to a stream message corresponding to the voice audio stream, so that after the first terminal receives the stream message corresponding to the voice audio stream, the stream message can be parsed to obtain the audio quality information of each audio segment.
Alternatively, the audio quality information is expressed in the form of a reference score, with a larger reference score indicating a higher audio quality of the audio piece and a smaller reference score indicating a poorer audio quality of the audio piece.
Optionally, the second terminal may score each audio clip according to the matching degree between each audio clip in the vocal audio stream and the original song of the target song, to obtain the reference score. Wherein, the higher the matching degree, the higher the reference score; the lower the degree of matching, the lower the reference score. Alternatively, the second terminal may also acquire the reference score of each audio segment in other manners, which is not limited in the manner of acquiring the reference score according to the embodiment of the present application.
In the embodiment of the present application, the audio quality information is only acquired by the second terminal, and in another embodiment, the second terminal sends the voice audio stream to the target server, the target server identifies each audio segment in the voice audio stream, so as to obtain the audio quality information corresponding to the voice audio stream, and the audio quality information is added to the stream packet corresponding to the voice audio stream.
402. And the second terminal mixes the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object, and sends the voice audio stream and the mixed audio stream to the target server.
The mixed audio stream includes the accompaniment audio of the voice and the target song, where the accompaniment audio is stored by the second terminal, or is sent to the second terminal by other devices, or is obtained by other manners, and the embodiment of the present application is not limited thereto.
In one possible implementation manner, because the voice and the accompaniment audio are two paths of audio, when the voice and the accompaniment audio are directly mixed, it is difficult to ensure that the volume corresponding to the voice and the volume corresponding to the accompaniment audio are mutually matched, so that before the voice and the accompaniment audio are mixed, the volume corresponding to the voice and the volume corresponding to the accompaniment audio need to be adjusted first. The second terminal displays a third volume control related to the voice and a fourth volume control related to the accompaniment audio on a live broadcast interface of the live broadcast room, the target object can adjust the volume corresponding to the voice by adjusting the third audio control, and adjust the volume corresponding to the accompaniment audio by adjusting the fourth volume control until the volume corresponding to the voice or the volume corresponding to the accompaniment audio is adjusted to a proper volume. The second terminal responds to the adjustment operation of the third volume control to adjust the volume corresponding to the voice; and in response to the adjustment operation of the fourth volume control, adjusting the volume corresponding to the accompaniment audio, and then the second terminal mixes the human voice with the accompaniment audio based on the volume adjusted by the human voice and the volume adjusted by the accompaniment audio to obtain a mixed audio stream.
In addition, in one possible implementation manner, the second terminal plays the corresponding audio clip in the live broadcast room based on the obtained mixed audio stream, that is, the audio after the target object can hear the played voice and the accompaniment audio.
In the embodiment of the present application, the second terminal is used for audio mixing as an example, and in another embodiment, the second terminal sends the collected voice to the target server, and the target server mixes the voice with accompaniment audio of the target song to obtain a mixed audio stream.
403. The target server obtains an original singing audio stream of a target song and sends a human voice audio stream, a mixed audio stream and the original singing audio stream to the first terminal.
In the embodiment of the application, the target server can identify the received voice audio stream or the mixed audio stream, determine the target song, and then acquire the original singing audio stream of the target song from the song database in the target server, or send a song acquisition request carrying the song identification of the target song to other computer equipment, and the other computer equipment sends the original singing audio stream of the target song to the target server. Of course, the target server may also acquire the original audio stream in other manners, which is not limited in this embodiment of the present application.
In addition, in the live broadcast process, any two paths of audio streams of the voice audio stream, the mixed audio stream and the original audio stream need to be switched, so that the time stamps corresponding to the voice audio stream, the mixed audio stream and the original audio stream need to be consistent, and absolute time stamps are adopted, namely, the first audio fragment in the voice audio stream, the mixed audio stream and the original audio stream corresponds to the same time stamp. For example, the audio clips in the vocal audio stream, the mixed audio stream, and the original audio stream all start from 0 seconds to 300 seconds.
404. The first terminal receives the voice audio stream, the mixed audio stream and the original audio stream sent by the target server.
In the embodiment of the application, in order to ensure that two paths of audio streams can be switched in time when the subsequent audio streams are switched, the first terminal receives three paths of audio streams sent by the target server, and the need of temporarily acquiring the audio streams from the target server during switching is avoided. Optionally, the first terminal establishes three audio buffers to store the voice audio stream, the mixed audio stream and the original audio stream respectively, so that the first terminal can directly acquire the audio stream stored in the buffers when switching the audio stream, thereby improving the switching efficiency.
405. The first terminal plays in the live broadcast room based on the first audio stream.
After the first terminal receives three audio streams sent by the target server, any one of the audio streams can be decoded to obtain each audio fragment in the any one audio stream, and the audio fragments are sequentially played according to the time stamps corresponding to the audio fragments.
In one possible implementation manner, after the first terminal receives the multiple paths of audio streams sent by the target server for the first time, decoding the voice audio streams by default to obtain each audio segment in the voice audio streams, and sequentially playing each audio segment according to the time stamp corresponding to each audio segment.
406. And the first terminal responds to the condition that the audio quality information corresponding to the voice audio stream meets the switching condition, switches the first audio stream into the second audio stream, and plays the second audio stream in the live broadcasting room.
In the embodiment of the application, the first terminal continuously acquires the audio quality information corresponding to the voice audio stream in the playing process based on any audio stream, and switches the first audio stream into the second audio stream under the condition that the audio quality information meets the switching condition by judging whether the audio quality information corresponding to the voice audio stream meets the switching condition, wherein the second audio stream is different from the first audio stream, and continues to play based on the first audio stream under the condition that the audio quality information does not meet the switching condition.
The first audio stream may be a human voice audio stream, an original singing audio stream or a mixed audio stream, and under different switching conditions, the first audio stream may be switched to a corresponding second audio stream, so that the method includes the following switching situations:
first kind: the first audio stream is a human voice audio stream.
In one possible implementation manner, the first terminal switches the human voice audio stream to a mixed audio stream and plays the mixed audio stream in the live broadcasting room under the condition that audio quality information of a plurality of continuous audio fragments in the human voice audio stream meets a first switching condition. The first switching condition means that the audio quality of a plurality of consecutive audio clips is higher than the reference audio quality, and the plurality of consecutive audio clips include the audio clip currently played. Optionally, when the audio quality information includes reference scores of the audio clips, the first terminal switches the human voice audio stream to the mixed audio stream if the reference scores of the plurality of consecutive audio clips in the human voice audio stream are greater than a first threshold. The first threshold is any value, for example, the first threshold is 20, and the voice audio stream is switched to the mixed audio stream when the reference score of the continuous 2 audio clips is greater than 20 time slots.
Optionally, the first terminal switches the vocal audio stream to the mixed audio stream if the audio quality information of the first number of audio pieces in the vocal audio stream satisfies the first switching condition. Wherein the first number of audio segments is a continuous plurality of audio segments, the first number being any number, for example the first number being 2, 3, 4 or other number.
Optionally, after the first terminal switches the audio stream to the mixed audio stream, continuously acquiring the audio quality information of a plurality of continuous audio segments in the vocal audio stream, and when the audio quality information of a plurality of subsequent continuous audio segments still satisfies the first switching condition, continuing to play based on the mixed audio stream by the first terminal, and when the audio quality information of a plurality of subsequent continuous audio segments no longer satisfies the first switching condition, that is, when the audio quality information of a plurality of continuous audio segments satisfies the second switching condition, switching the mixed audio stream to the vocal audio stream or the original vocal audio stream by the first terminal, and the embodiment of switching the mixed audio stream to the vocal audio stream or the original vocal audio stream is referred to as the second condition below.
In another possible implementation manner, the second terminal switches the voice audio stream to the original singing audio stream and plays the voice audio stream in the live broadcasting room based on the original singing audio stream under the condition that the audio quality information of a plurality of continuous audio fragments in the voice audio stream meets the second switching condition. The second switching condition means that the audio quality of a plurality of consecutive audio clips is lower than the reference audio quality, and the plurality of consecutive audio clips include the audio clip currently played. Optionally, when the audio quality information includes reference scores of the audio clips, the first terminal switches the vocal audio stream to the original audio stream if the reference scores of a plurality of consecutive audio clips in the vocal audio stream are smaller than a first threshold. For example, the first threshold is 20, and the vocal audio stream is switched to the original audio stream when the reference score of 3 consecutive audio pieces is less than 20 minutes.
Optionally, the first terminal switches the vocal audio stream to the original audio stream if the audio quality information of the second number of audio pieces in the vocal audio stream satisfies the second switching condition. Wherein the second number of audio segments is a consecutive plurality of audio segments, the second number being any number, e.g. the second number being 2, 3, 4 or other number.
The second number may be the same as or different from the first number, and for example, in order to make the audience in the living room hear the singing of the host, the trigger condition for switching to the mixed audio stream is set to be more relaxed than the trigger condition for switching to the original audio stream, and the first number may be made smaller than the second number.
Optionally, after the first terminal switches the audio stream to the original singing audio stream, continuously acquiring the audio quality information of a plurality of continuous audio segments in the vocal audio stream, where the audio quality information of a plurality of subsequent continuous audio segments still satisfies the second switching condition, the first terminal continues to play based on the original singing audio stream, where the audio quality information of a plurality of subsequent continuous audio segments no longer satisfies the second switching condition, that is, where the audio quality information of a plurality of continuous audio segments satisfies the first switching condition, the first terminal switches the original singing audio stream to the vocal audio stream, and the embodiment of switching from the original singing audio stream to the vocal audio stream or the mixed audio stream is referred to as the third condition below.
Second kind: the first audio stream is a mixed audio stream.
The first terminal acquires the audio quality information of an audio clip corresponding to the audio clip currently played in the voice audio stream in real time in the process of playing based on the mixed audio stream, and determines whether the mixed audio stream needs to be switched based on the audio quality information, wherein the time stamp corresponding to the played audio clip is identical to the time stamp of the audio clip in the acquired voice audio stream, for example, the first terminal plays the audio clip of the 5 th second in the mixed audio stream, and acquires the audio quality information of the audio clip of the 5 th second in the voice audio stream. The first terminal switches the mixed audio stream to the voice audio stream under the condition that the audio quality information of a plurality of continuous audio fragments in the voice audio stream meets the second switching condition, plays the mixed audio stream based on the voice audio stream in a live broadcasting room, or switches the mixed audio stream to the original singing audio stream, and plays the mixed audio stream based on the original singing audio stream in the live broadcasting room.
Optionally, the first terminal acquires audio quality information of a third number of consecutive audio clips in the vocal audio stream, and switches the mixed audio stream to the original audio stream under the condition that the audio quality of the third number of consecutive audio clips is not higher than the reference audio quality; or, in the third number of audio clips, the first terminal switches the mixed audio stream to a human voice audio stream in the case that the audio quality of the other audio clips except the last audio clip is not higher than the reference audio quality. Namely, when the second switching condition is that the audio quality of the continuous third number of audio clips is not higher than the reference audio quality, switching the mixed audio stream into an original singing audio stream; the second switching condition is to switch the mixed audio stream to a human voice audio stream when the audio quality of the other audio clips except the last audio clip in the third number of audio clips is not higher than the reference audio quality. The third number may be the same as or different from the first number and the second number.
For example, the third number is 5, the first terminal acquires the audio quality of 5 consecutive audio clips in the current voice audio stream, and when the audio quality of the first 4 audio clips is not higher than the reference audio quality and the audio quality of the 5 th audio clip is higher than the reference audio quality, the mixed audio stream is switched to the voice audio stream; when none of the audio quality of the consecutive 5 audio pieces is higher than the reference audio quality, the mixed audio stream is switched to the original audio stream.
Third kind: the first audio stream is an original audio stream.
The first terminal acquires the audio quality of an audio clip corresponding to a currently played audio clip in the vocal audio stream in real time in the process of playing based on the original vocal audio stream, and determines whether to switch the mixed audio stream based on the audio quality information, wherein the time stamp corresponding to the played audio clip is the same as the time stamp of the audio clip in the acquired vocal audio stream, for example, the first terminal plays the audio clip of the 5 th second in the original vocal audio stream, and acquires the audio quality of the audio clip of the 5 th second in the vocal audio stream. The first terminal switches the original singing audio stream into the voice audio stream under the condition that the audio quality information of a plurality of continuous audio fragments in the voice audio stream meets a first switching condition, plays the voice audio stream in a live broadcasting room, or switches the original singing audio stream into a mixed audio stream, and plays the voice audio stream in the live broadcasting room based on the mixed audio stream.
Optionally, the first terminal acquires audio quality of a fourth number of consecutive audio clips in the vocal audio stream, and switches the original audio stream into the mixed audio stream when the audio quality of the fourth number of consecutive audio clips is higher than the reference audio quality; or in the fourth number of audio clips, the first terminal switches the original audio stream to a vocal audio stream in case that the audio quality of the other audio clips except the last audio clip is higher than the reference audio quality. Namely, when the first switching condition is that the audio quality of the continuous fourth number of audio fragments is higher than the reference audio quality, switching the original singing audio stream into a mixed audio stream; the first switching condition is that when the audio quality of the other audio clips except the last audio clip in the fourth number of audio clips is higher than the reference audio quality, the original singing audio stream is switched to the human voice audio stream. The fourth number may be the same as or different from the first number, the second number, and the third number.
For example, the fourth number is 5, the first terminal obtains the audio quality of 5 consecutive audio clips in the current voice audio stream, and when the audio quality of the first 4 audio clips is higher than the reference audio quality and the audio quality of the 5 th audio clip is not higher than the reference audio quality, the original voice audio stream is switched to the voice audio stream; when the audio quality of the continuous 5 audio clips is higher than the reference audio quality, the original audio stream is switched to the mixed audio stream.
The second number may be the same as or different from the first number, for example, in order to allow the audience in the live room to hear the singing of the host, the trigger condition for switching to the mixed audio stream may be set to be more relaxed than the trigger condition for switching to the original audio stream, and the first number may be smaller than the second number.
The switching manner corresponding to the first switching condition and the switching manner corresponding to the second switching condition may be implemented singly, or the two switching manners may be combined, that is, the switching conditions include the first switching condition and the second switching condition, when the audio stream is currently played based on the audio stream and the first switching condition is satisfied, the audio stream is switched to the mixed audio stream, when the audio stream is currently played based on the audio stream and the second switching condition is satisfied, the audio stream is switched to the original audio stream, when the audio stream is currently played based on the mixed audio stream and the second switching condition is satisfied, the mixed audio stream is switched to the audio stream or the original audio stream, and when the audio stream is currently played based on the original audio stream and the first switching condition is satisfied, the original audio stream is switched to the audio stream or the mixed audio stream.
In one possible implementation manner, the first terminal displays a prompt message when the audio quality information meets the switching condition and is played to the last audio segment in the voice audio stream, where the prompt message is used to indicate that the audio stream switching fails. That is, in the case of playing to the last audio clip, even if the audio quality satisfies the switching condition, audio stream switching is not performed.
In one possible implementation, the first terminal needs to start the automatic switching function in the live broadcast room first, and performs audio stream automatic switching when the automatic switching function is started, and does not perform audio stream automatic switching when the automatic switching function is not started. That is, the first terminal switches the first audio stream to the second audio stream in response to the automatic switching function for the audio stream being in an on state and the audio quality information corresponding to the human voice audio stream meeting the switching condition.
Optionally, the live broadcast interface of the live broadcast room displayed by the first terminal includes an automatic switching control, when the automatic switching function is in a closed state, the user triggers the automatic switching control, and the terminal sets the automatic switching function to an open state in response to a triggering operation of the automatic switching control, so as to start the automatic switching function.
In the above embodiment, the procedure in which the first terminal automatically switches the audio stream according to the audio quality information is described, and in another example, the user may manually switch the audio stream. The live interface of the live broadcasting room of the first terminal comprises a play control corresponding to each path of audio stream, a user triggers a play control associated with the second audio stream, the first terminal responds to triggering operation of the play control associated with the second audio stream, the first audio stream is switched into the second audio stream, and the live broadcasting room plays based on the second audio stream. For example, when the first terminal plays the audio clip in the voice audio stream, the user triggers a play control associated with the mixed audio stream, and the first terminal switches the voice audio stream to the mixed audio stream and plays the audio clip in the mixed audio stream; or the user triggers a play control associated with the original singing audio stream, the first terminal switches the human voice audio stream into the original singing audio stream, and plays the audio fragment in the original singing audio stream.
For the above-mentioned manner of automatically switching audio streams and manually switching audio streams, when playing based on the switched second audio stream, it is necessary to ensure continuity between audio before and after switching, that is, switching of audio streams should follow a continuous and stable increasing principle. For example, referring to fig. 5, the first terminal decodes the received vocal audio stream, the mixed audio stream and the original vocal audio stream respectively, so as to obtain time stamps corresponding to four audio segments of the vocal audio stream as ID1, ID2, ID3 and ID4 respectively, time stamps corresponding to four audio segments in the mixed audio stream as T1, T2, T3 and T4 respectively, and time stamps corresponding to four audio segments in the original vocal audio stream as F1, F2, F3 and F4 respectively, where ID1, T1 and F1 are equal, ID2, T2 and F2 are equal, ID3, T3 and F3 are equal, and ID4, T4 and F4 are equal, and when switching, switching is performed in a manner of sequentially increasing time stamps sequentially.
Therefore, in one possible implementation manner, during the switching process, the first terminal determines, based on the first timestamp of the currently played audio segment, a second timestamp adjacent to the first timestamp in the second audio stream, where the second timestamp is located after the first timestamp, and when the playing of the currently played audio segment is finished, in the live broadcast room, the audio segment corresponding to the second timestamp in the second audio stream is played, so as to ensure that the audio segment after the switching is a segment adjacent to the audio segment before the switching.
For example, when switching from a vocal audio stream to a mixed audio stream, the timestamps corresponding to four audio segments in the vocal audio stream are ID1, ID2, ID3 and ID4, respectively, and the audio segment corresponding to ID1 is currently being played, and the timestamps corresponding to four audio segments in the mixed audio stream are T1, T2, T3 and T4, respectively, then there are three cases when switching: first case: t1< ID1< T2< ID2, playing the audio clip corresponding to T2 after switching; second case: if ID1< ID2< T1< T2, after playing the audio clip corresponding to ID2, switching the audio stream, and playing the audio clip corresponding to T1 after switching; third case: t1< T2< T3< T4< ID1, then the audio stream is not switched.
For example, when a first time stamp of an audio clip in a voice audio stream is 50 seconds before switching and a second time stamp adjacent to the first time stamp in an original audio stream is 51 seconds, an audio clip corresponding to 51 seconds in the original audio stream is played when the current audio clip is played. For another example, when the first time stamp of the audio clip in the voice audio stream before switching is 50 seconds and the second time stamp adjacent to the first time stamp in the original audio stream is 52 seconds, the audio clip corresponding to 52 seconds in the original audio stream is played when the playing of the current audio clip and the audio clip corresponding to 51 seconds is finished.
In addition, before the first terminal can realize the mode of switching the audio streams, when the first terminal plays based on the voice audio streams, the first terminal responds to the triggering operation of the playing control related to the original singing audio streams and plays based on the original singing audio streams. The first terminal can play the voice audio stream and the original voice audio stream, optionally, the first terminal decodes the voice audio stream and the original voice audio stream respectively to obtain each audio segment in the voice audio stream and each audio segment in the original voice audio stream, then mixes each audio segment in the decoded voice audio stream and each audio segment in the original voice audio stream to obtain mixed audio, and plays the mixed audio.
Optionally, in order to ensure that the volumes of the audio clips in the vocal audio stream and the audio clips in the original audio stream are matched, a live broadcast interface of the live broadcast room displayed by the first terminal includes a first volume control associated with the vocal audio stream and a second volume control associated with the original audio stream, and the user can adjust the audio controls, so as to adjust the volumes corresponding to the corresponding audio streams. The user executes adjustment operation on the first audio control, and the first terminal responds to the adjustment operation on the first volume control to adjust the volume corresponding to the voice audio stream; and the user executes adjustment operation on the first audio control, and adjusts the volume corresponding to the original singing audio stream in response to the adjustment operation on the second volume control.
Referring to the schematic diagram of generating an audio stream shown in fig. 6, in the related art, a sound card processes a collected voice of a target object in a hard mixing manner, and when the sound card processes in the hard mixing manner, the sound card collects the voice, acquires accompaniment audio of a target song, outputs a mixed audio stream obtained by mixing the voice and the accompaniment audio, sends the mixed audio stream to a first terminal through a target server, and the first terminal receives the mixed audio stream, but cannot only receive the voice. In the embodiment of the application, the sound card processes the collected human voice of the target object in a soft mixing mode, and when the sound card processes in the soft mixing mode, the sound card collects the human voice, acquires the accompaniment audio of the target song, outputs the human voice audio stream and the mixed audio stream, sends the human voice audio stream and the mixed audio stream to the first terminal through the target server, and the target server also acquires the original singing audio stream and sends the original singing audio stream to the first terminal.
In the method provided by the embodiment of the application, in the live broadcasting process of the live broadcasting room, the second terminal obtains the voice audio stream through the collected voice, mixes the voice with the accompaniment audio of the target song to obtain the mixed audio stream, so as to obtain the voice audio stream and the mixed audio stream corresponding to the target song, the target server sends the obtained voice audio stream and the mixed audio stream to the first terminal, and the target server also sends the obtained original voice audio stream to the first terminal, the first terminal can obtain the voice audio stream, the mixed audio stream and the original voice audio stream, and in the playing process based on the first audio stream, when the audio quality information corresponding to the voice audio stream meets the switching condition, the audio stream can be switched to the second audio stream different from the first audio stream, so that the switching among multiple audio streams is realized, and various sounds can be played in the live broadcasting room, thereby meeting the requirements of different audiences and improving the live broadcasting effect.
Fig. 7 is a schematic structural diagram of a live audio processing device according to an embodiment of the present application. Referring to fig. 7, the apparatus includes:
the audio stream receiving module 701 is configured to receive, during a live broadcast process in a live broadcast room, a plurality of audio streams of a target song, where the plurality of audio streams include a human voice audio stream of the target object, a mixed audio stream of the target object, and an original singing audio stream, where the mixed audio stream is obtained by mixing a human voice of the target object with an accompaniment audio of the target song;
A playing module 702, configured to play based on the first audio stream in the live broadcast room;
the audio stream switching module 703 is configured to switch the first audio stream to a second audio stream of the multiple audio streams in response to the audio quality information corresponding to the vocal audio stream meeting the switching condition, and play the second audio stream in the live broadcasting room, where the second audio stream is different from the first audio stream, and the audio quality information is used to represent the audio quality of the corresponding audio clip.
According to the device provided by the embodiment of the application, in the live broadcasting process of the live broadcasting room, the human voice audio stream, the mixed audio stream and the original singing audio stream corresponding to the target song are obtained, in the playing process based on the first audio stream, when the audio quality information corresponding to the human voice audio stream meets the switching condition, the audio stream can be switched, the first audio stream is switched to the second audio stream different from the first audio stream, the switching among multiple paths of audio streams is realized, and multiple sounds can be played in the live broadcasting room, so that the requirements of different audiences are met, and the live broadcasting effect is improved.
In another possible implementation manner, referring to fig. 8, the audio stream switching module 703 is configured to switch the first audio stream to the second audio stream in response to the automatic switching function for the audio stream being in an on state and the audio quality information corresponding to the vocal audio stream meeting the switching condition.
In another possible implementation, referring to fig. 8, the apparatus further includes:
the quality information obtaining module 704 is configured to parse a stream packet corresponding to the voice audio stream to obtain audio quality information of each audio segment.
In another possible implementation, referring to fig. 8, the audio stream switching module 703 includes:
a timestamp determining unit 7031, configured to determine, based on a first timestamp of the currently played audio clip, a second timestamp adjacent to the first timestamp in the second audio stream, where the second timestamp is located after the first timestamp;
the audio clip playing unit 7032 is configured to play, in the live broadcast room, an audio clip corresponding to the second timestamp in the second audio stream when the playing of the currently played audio clip is finished.
In another possible implementation, the playing module 702 is further configured to:
when playing based on the voice audio stream, responding to the triggering operation of the playing control related to the original singing audio stream, and playing based on the original singing audio stream.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.
It should be noted that: in the live audio processing apparatus provided in the foregoing embodiment, when processing live audio, only the division of the foregoing functional modules is used for illustration, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the live audio processing device and the live audio processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not repeated herein.
Fig. 9 is a schematic structural diagram of a live audio processing device according to an embodiment of the present application. Referring to fig. 9, the apparatus includes:
the voice acquisition module 901 is configured to acquire voice sent by a target object according to a target song in a live broadcast process of a live broadcast room, so as to obtain a voice audio stream of the target object;
a mixing module 902, configured to mix a human voice of a target object with accompaniment audio of a target song to obtain a mixed audio stream of the target object;
the audio stream sending module 903 is configured to send a vocal audio stream and a mixed audio stream to a target server, where the target server is configured to obtain an original vocal audio stream of a target song, and send the vocal audio stream, the mixed audio stream, and the original vocal audio stream to the first terminal.
According to the device provided by the embodiment of the application, the voice audio stream is obtained through the collected voice, the voice and the accompaniment audio of the target song are mixed to obtain the mixed audio stream, so that the voice audio stream and the mixed audio stream corresponding to the target song are obtained, the obtained voice audio stream and the mixed audio stream are sent to the first terminal through the target server, the target server also sends the obtained original voice audio stream to the first terminal, so that the subsequent first terminal can switch the audio streams based on the received multipath audio streams, and various sounds can be played in the live broadcasting room, and the live broadcasting effect is improved.
In another possible implementation, referring to fig. 10, the apparatus further includes:
the quality information obtaining module 904 is configured to identify each audio segment in the voice audio stream, so as to obtain audio quality information corresponding to the voice audio stream, where the audio quality information is used to represent audio quality of the corresponding audio segment.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.
It should be noted that: in the live audio processing apparatus provided in the foregoing embodiment, when processing live audio, only the division of the foregoing functional modules is used for illustration, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the live audio processing device and the live audio processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not repeated herein.
The embodiment of the application also provides a terminal, which comprises a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to realize the operations executed in the live audio processing method of the embodiment.
Fig. 11 is a schematic structural diagram of a terminal 1100 according to an embodiment of the present application. The terminal 1100 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1100 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.
The terminal 1100 includes: a processor 1101 and a memory 1102.
The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 1101 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1101 to implement the live audio processing method provided by the method embodiments of the present application.
In some embodiments, the terminal 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, a positioning assembly 1108, and a power supply 1109.
A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 may also include NFC (Near Field Communication, short-range wireless communication) related circuitry, which is not limiting of the application.
The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be one and disposed on the front panel of the terminal 1100; in other embodiments, the display 1105 may be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1105 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.
The location component 1108 is used to locate the current geographic location of the terminal 1100 to enable navigation or LBS (Location Based Service, location based services). The positioning component 1108 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati positioning system of Russia, or the Galileo positioning system of the European Union.
A power supply 1109 is used to supply power to various components in the terminal 1100. The power source 1109 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyroscope sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.
The acceleration sensor 1111 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 1100. For example, the acceleration sensor 1111 may be configured to detect components of gravitational acceleration in three coordinate axes. The processor 1101 may control the display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1111. Acceleration sensor 1111 may also be used for the acquisition of motion data of a game or a user.
The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1112 may collect a 3D motion of the user on the terminal 1100 in cooperation with the acceleration sensor 1111. The processor 1101 may implement the following functions based on the data collected by the gyro sensor 1112: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 1113 may be disposed at a side frame of the terminal 1100 and/or at a lower layer of the display screen 1105. When the pressure sensor 1113 is disposed at a side frame of the terminal 1100, a grip signal of the terminal 1100 by a user may be detected, and the processor 1101 performs a right-left hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the display screen 1105, the processor 1101 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 1114 is used to collect a fingerprint of the user, and the processor 1101 identifies the identity of the user based on the collected fingerprint of the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 may be disposed at the front, rear, or side of the terminal 1100. When a physical key or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 may be integrated with the physical key or vendor Logo.
The optical sensor 1115 is used to collect the ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the display screen 1105 based on the intensity of ambient light collected by the optical sensor 1115. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1105 is turned down. In another embodiment, the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1115.
A proximity sensor 1116, also referred to as a distance sensor, is provided on the front panel of the terminal 1100. The proximity sensor 1116 is used to collect a distance between the user and the front surface of the terminal 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 gradually decreases, the processor 1101 controls the display 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1116 detects that the distance between the user and the front surface of the terminal 1100 gradually increases, the processor 1101 controls the display screen 1105 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 11 is not limiting and that terminal 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.
The embodiment of the application also provides a server, which comprises a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to realize the operations executed in the live audio processing method of the embodiment.
Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1201 and one or more memories 1202, where at least one program code is stored in the memories 1202, and the at least one program code is loaded and executed by the processors 1201 to implement the methods provided in the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The embodiment of the present application also provides a computer readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the operations performed in the live audio processing method of the above embodiment.
Embodiments of the present application also provide a program code product or program code comprising program code stored in a computer readable storage medium. The program code is loaded and executed by a processor to implement the operations performed in the live audio processing method of the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing is merely an alternative embodiment of the present application and is not intended to limit the embodiment of the present application, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the embodiment of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method of live audio processing, the method comprising:
in a live broadcasting process of a live broadcasting room, receiving a plurality of paths of audio streams of a target song, wherein the plurality of paths of audio streams comprise a human voice audio stream of a target object, a mixed audio stream of the target object and an original singing audio stream, and the mixed audio stream is obtained by mixing the human voice of the target object and accompaniment audio of the target song;
playing in the live broadcast room based on a first audio stream, wherein the first audio stream is any audio stream in the multi-channel audio stream;
and switching the first audio stream into a second audio stream in the multi-channel audio stream in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, and playing the second audio stream in the live broadcasting room, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of the corresponding audio fragment.
2. The method of claim 1, wherein switching the first audio stream to a second audio stream of the multiple audio streams in response to the audio quality information corresponding to the human voice audio stream satisfying a switching condition comprises:
And switching the first audio stream to the second audio stream in response to the automatic switching function for the audio stream being in an on state and the audio quality information corresponding to the voice audio stream meeting the switching condition.
3. The method of claim 1, wherein the switching the first audio stream to a second audio stream of the multiple audio streams in response to the audio quality information corresponding to the human voice audio stream satisfying a switching condition, the method further comprising, prior to playing in the live room based on the second audio stream:
and analyzing the stream message corresponding to the voice audio stream to obtain the audio quality information of the audio fragment.
4. A method according to any of claims 1-3, wherein said playing in said live room based on said second audio stream comprises:
determining a second timestamp adjacent to the first timestamp in the second audio stream based on a first timestamp of a currently played audio clip, the second timestamp being located after the first timestamp;
and when the playing of the currently played audio fragment is finished, playing the audio fragment corresponding to the second timestamp in the second audio stream in the live broadcasting room.
5. The method according to claim 1, wherein the method further comprises:
and when playing based on the voice audio stream, responding to the triggering operation of a playing control related to the original voice audio stream, and playing based on the original voice audio stream.
6. A live audio processing method, the method further comprising:
in a live broadcasting process of a live broadcasting room, acquiring a voice sent by a target object according to a target song, and obtaining a voice audio stream of the target object;
mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;
the method comprises the steps of sending the voice audio stream and the mixed audio stream to a target server, wherein the target server is used for obtaining an original voice audio stream of a target song, sending the voice audio stream, the mixed audio stream and the original voice audio stream to a first terminal, enabling the first terminal to play in a live broadcasting room based on a first audio stream in multiple paths of audio streams, responding to audio quality information corresponding to the voice audio stream to meet a switching condition, switching the first audio stream into a second audio stream in the multiple paths of audio streams, playing in the live broadcasting room based on the second audio stream, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing audio quality of corresponding audio fragments, and the multiple paths of audio streams comprise the voice audio stream, the mixed audio stream and the original voice audio stream.
7. The method of claim 6, wherein prior to the sending the human voice audio stream and the mixed audio stream to a target server, the method further comprises:
and respectively identifying each audio fragment in the voice audio stream to obtain audio quality information corresponding to the voice audio stream, wherein the audio quality information is used for representing the audio quality of the corresponding audio fragment.
8. The live broadcast audio processing system is characterized by comprising a first terminal, a target server and a second terminal;
the second terminal is used for acquiring the voice of a target object according to a target song in a live broadcasting process of a live broadcasting room to obtain a voice audio stream of the target object, mixing the voice of the target object with accompaniment audio of the target song to obtain a mixed audio stream of the target object, and sending the voice audio stream and the mixed audio stream to the target server;
the target server is configured to obtain an original singing audio stream of the target song, and send the vocal audio stream, the mixed audio stream and the original singing audio stream to the first terminal;
The first terminal is configured to receive the vocal audio stream, the mixed audio stream and the original audio stream in a live broadcast process in the live broadcast room, and play the vocal audio stream in the live broadcast room based on a first audio stream, where the first audio stream is any one of multiple audio streams;
the first terminal is configured to switch the first audio stream to a second audio stream in the multiple audio streams in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, play the second audio stream in the live broadcasting room, where the second audio stream is different from the first audio stream, and the audio quality information is used to represent audio quality of a corresponding audio segment.
9. A live audio processing apparatus, the apparatus comprising:
the audio stream receiving module is used for receiving multiple paths of audio streams of a target song in a live broadcast process of a live broadcast room, wherein the multiple paths of audio streams comprise a human voice audio stream of the target object, a mixed audio stream of the target object and an original singing audio stream, and the mixed audio stream is obtained by mixing the human voice of the target object and accompaniment audio of the target song;
The playing module is used for playing based on a first audio stream in the live broadcasting room, wherein the first audio stream is any audio stream in the multiple paths of audio streams;
and the audio stream switching module is used for switching the first audio stream into a second audio stream in the multi-channel audio stream in response to the audio quality information corresponding to the voice audio stream meeting a switching condition, playing the second audio stream in the live broadcasting room, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of the corresponding audio fragment.
10. A live audio processing apparatus, the apparatus comprising:
the voice acquisition module is used for acquiring voice sent by a target object according to a target song in a live broadcasting process of a live broadcasting room to obtain voice audio streams of the target object;
the mixing module is used for mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;
the audio stream sending module is configured to send the vocal audio stream and the mixed audio stream to a target server, where the target server is configured to obtain an original audio stream of the target song, send the vocal audio stream, the mixed audio stream, and the original audio stream to a first terminal, so that the first terminal plays the vocal audio stream in the live broadcasting room based on a first audio stream of multiple audio streams, and switch the first audio stream to a second audio stream of the multiple audio streams in response to audio quality information corresponding to the vocal audio stream meeting a switching condition, play the vocal audio stream in the live broadcasting room based on the second audio stream, where the second audio stream is different from the first audio stream, and the audio quality information is used to represent audio quality of corresponding audio segments, where the multiple audio streams include the vocal audio stream, the mixed audio stream, and the original audio stream.
11. A computer device comprising a processor and a memory, the memory having stored therein at least one program code that is loaded and executed by the processor to implement operations performed in a live audio processing method as claimed in any one of claims 1 to 5 or to implement operations performed in a live audio processing method as claimed in any one of claims 6 to 7.
12. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement operations performed in a live audio processing method as claimed in any one of claims 1 to 5 or to implement operations performed in a live audio processing method as claimed in any one of claims 6 to 7.
CN202110807055.3A 2021-07-16 2021-07-16 Live audio processing method, device, computer equipment and medium Active CN113473170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110807055.3A CN113473170B (en) 2021-07-16 2021-07-16 Live audio processing method, device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110807055.3A CN113473170B (en) 2021-07-16 2021-07-16 Live audio processing method, device, computer equipment and medium

Publications (2)

Publication Number Publication Date
CN113473170A CN113473170A (en) 2021-10-01
CN113473170B true CN113473170B (en) 2023-08-25

Family

ID=77880873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110807055.3A Active CN113473170B (en) 2021-07-16 2021-07-16 Live audio processing method, device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN113473170B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000003183A (en) * 1999-06-07 2000-01-07 Yamaha Corp Karaoke machine
CN105161120A (en) * 2015-08-27 2015-12-16 广州酷狗计算机科技有限公司 Original and accompanying singing switching method and apparatus
CN106024033A (en) * 2016-06-15 2016-10-12 北京小米移动软件有限公司 Playing control method and apparatus
CN106548792A (en) * 2015-09-17 2017-03-29 阿里巴巴集团控股有限公司 Intelligent sound box device, mobile terminal and music processing method
WO2017101260A1 (en) * 2015-12-15 2017-06-22 广州酷狗计算机科技有限公司 Method, device, and storage medium for audio switching
CN107093419A (en) * 2016-02-17 2017-08-25 广州酷狗计算机科技有限公司 A kind of dynamic vocal accompaniment method and apparatus
WO2018130577A1 (en) * 2017-01-10 2018-07-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder, audio encoder, method for providing a decoded audio signal, method for providing an encoded audio signal, audio stream, audio stream provider and computer program using a stream identifier
CN108848394A (en) * 2018-07-27 2018-11-20 广州酷狗计算机科技有限公司 Net cast method, apparatus, terminal and storage medium
CN109348239A (en) * 2018-10-18 2019-02-15 北京达佳互联信息技术有限公司 Piece stage treatment method, device, electronic equipment and storage medium is broadcast live
CN110267081A (en) * 2019-04-02 2019-09-20 北京达佳互联信息技术有限公司 Method for stream processing, device, system, electronic equipment and storage medium is broadcast live

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000003183A (en) * 1999-06-07 2000-01-07 Yamaha Corp Karaoke machine
CN105161120A (en) * 2015-08-27 2015-12-16 广州酷狗计算机科技有限公司 Original and accompanying singing switching method and apparatus
CN106548792A (en) * 2015-09-17 2017-03-29 阿里巴巴集团控股有限公司 Intelligent sound box device, mobile terminal and music processing method
WO2017101260A1 (en) * 2015-12-15 2017-06-22 广州酷狗计算机科技有限公司 Method, device, and storage medium for audio switching
CN107093419A (en) * 2016-02-17 2017-08-25 广州酷狗计算机科技有限公司 A kind of dynamic vocal accompaniment method and apparatus
CN106024033A (en) * 2016-06-15 2016-10-12 北京小米移动软件有限公司 Playing control method and apparatus
WO2018130577A1 (en) * 2017-01-10 2018-07-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder, audio encoder, method for providing a decoded audio signal, method for providing an encoded audio signal, audio stream, audio stream provider and computer program using a stream identifier
CN108848394A (en) * 2018-07-27 2018-11-20 广州酷狗计算机科技有限公司 Net cast method, apparatus, terminal and storage medium
CN109348239A (en) * 2018-10-18 2019-02-15 北京达佳互联信息技术有限公司 Piece stage treatment method, device, electronic equipment and storage medium is broadcast live
WO2020078142A1 (en) * 2018-10-18 2020-04-23 北京达佳互联信息技术有限公司 Live streaming segment processing method and apparatus, electronic device and storage medium
CN110267081A (en) * 2019-04-02 2019-09-20 北京达佳互联信息技术有限公司 Method for stream processing, device, system, electronic equipment and storage medium is broadcast live

Also Published As

Publication number Publication date
CN113473170A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN108810576B (en) Live wheat-connecting method and device and storage medium
CN108683927B (en) Anchor recommendation method and device and storage medium
CN108093268B (en) Live broadcast method and device
CN108419113B (en) Subtitle display method and device
WO2019114514A1 (en) Method and apparatus for displaying pitch information in live broadcast room, and storage medium
CN109729372B (en) Live broadcast room switching method, device, terminal, server and storage medium
CN109874043B (en) Video stream sending method, video stream playing method and video stream playing device
CN112118477B (en) Virtual gift display method, device, equipment and storage medium
CN110139116B (en) Live broadcast room switching method and device and storage medium
CN109587549B (en) Video recording method, device, terminal and storage medium
CN111246236B (en) Interactive data playing method, device, terminal, server and storage medium
CN110418152B (en) Method and device for carrying out live broadcast prompt
CN113596516B (en) Method, system, equipment and storage medium for chorus of microphone and microphone
CN112583806B (en) Resource sharing method, device, terminal, server and storage medium
CN113271470B (en) Live broadcast wheat connecting method, device, terminal, server and storage medium
CN110958464A (en) Live broadcast data processing method and device, server, terminal and storage medium
CN111402844B (en) Song chorus method, device and system
CN111045945B (en) Method, device, terminal, storage medium and program product for simulating live broadcast
CN111010588B (en) Live broadcast processing method and device, storage medium and equipment
CN113556481B (en) Video special effect generation method and device, electronic equipment and storage medium
CN111726670A (en) Information interaction method, device, terminal, server and storage medium
CN114845129B (en) Interaction method, device, terminal and storage medium in virtual space
CN113473170B (en) Live audio processing method, device, computer equipment and medium
CN113141538B (en) Media resource playing method, device, terminal, server and storage medium
CN112492331B (en) Live broadcast method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant