CN116170613A

CN116170613A - Audio stream processing method, computer device and computer program product

Info

Publication number: CN116170613A
Application number: CN202211096763.1A
Authority: CN
Inventors: 雷勇; 黄斯亮; 张田博; 刘腾飞; 欧阳金凯; 王玉奎; 王磊; 李贤茂; 冯伟赞; 谢光前
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-05-26

Abstract

The application relates to the technical field of audio processing, and provides an audio streaming method, computer equipment and a computer program product, which can enable a spectator to quickly acquire high-quality online chorus audio in an online chorus process. The method comprises the following steps: in the chorus process, acquiring dry audio streams which are acquired by a plurality of singing terminals in a virtual room and aim at the same chorus content, and determining the quality parameters of the audio streams of the dry audio streams; based on the audio stream quality parameters of each dry audio stream, respectively acquiring audio stream quality information of each dry audio stream, and determining a plurality of paths of target dry audio streams of which the audio stream quality information meets a preset quality condition; and mixing the multipath target dry audio streams to obtain a chorus audio stream facing the audience side in the virtual room.

Description

Audio stream processing method, computer device and computer program product

Technical Field

The present invention relates to the field of audio processing technology, and in particular, to an audio stream processing method, a computer device, and a computer program product.

Background

With the development of real-time audio and video technology, users can participate in online chorus interaction through related applications to realize online chorus in different places. In the related art, when the multi-place chorus is different, a plurality of singers can alternatively sing the chorus in sections, namely, each singer sings own section respectively, so that the alternative chorus of each singer in sections is realized, but the chorus mode is single, and a viewer can not obtain high-quality online chorus audio.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio stream processing method, apparatus, computer device, computer readable storage medium, and computer program product.

In a first aspect, the present application provides an audio stream processing method. The method comprises the following steps:

in the chorus process, acquiring dry audio streams which are acquired by a plurality of singing terminals in a virtual room and aim at the same chorus content, and determining the quality parameters of the audio streams of the dry audio streams;

based on the audio stream quality parameters of each dry audio stream, respectively acquiring audio stream quality information of each dry audio stream, and determining a plurality of paths of target dry audio streams of which the audio stream quality information meets a preset quality condition;

and mixing the multipath target dry audio streams to obtain a chorus audio stream facing the audience side in the virtual room.

In a second aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a third aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

In the audio stream processing method, the computer equipment and the computer program product, in the chorus process, the server side can acquire the dry audio streams which are acquired by the singing ends in the virtual room and aim at the same chorus content, determine the audio stream quality parameters of each dry audio stream, further respectively acquire the audio stream quality information of each dry audio stream based on the audio stream quality parameters of each dry audio stream, determine multiple paths of target dry audio streams of which the audio stream quality information meets the preset quality condition, and mix the multiple paths of target dry audio streams to obtain the chorus audio stream facing the audience end in the virtual room. In the scheme of the application, the audio content in the chorus audio stream can be enriched by mixing the multipath target dry audio streams which come from different singing ends and aim at the same chorus content, and the quality of the finally obtained chorus audio stream can be effectively ensured by screening and then mixing the multipath dry audio streams, so that a spectator can quickly acquire high-quality online chorus audio.

Drawings

FIG. 1 is an application environment diagram of an audio stream processing method in one embodiment;

FIG. 2 is a flow chart of a method of processing an audio stream according to an embodiment;

FIG. 3 is a flow chart illustrating steps for mixing dry audio streams in one embodiment;

FIG. 4 is a flowchart illustrating steps for aligning dry audio streams in one embodiment;

FIG. 5 is a flow chart of another method of audio stream processing in one embodiment;

FIG. 6 is a flowchart illustrating another step of aligning dry audio streams in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The audio stream processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1, the application environment can comprise a server side and a plurality of singing ends, the server side can be communicated with each singing end through a network, the server side can be provided with a corresponding data storage system, the system can store data required to be processed by the server side, such as audio streams acquired from the singing ends, and the data storage system can be integrated on the server side or deployed on a cloud or other network servers.

In the embodiment of the application, the user can start the singing end and participate in chorus interaction of the virtual room, and in the chorus process, the singing end can collect a human voice signal generated when the user sings songs to obtain a corresponding dry audio stream, and then the dry audio stream can be uploaded to the server. Therefore, the server can acquire the dry audio streams which are acquired by the singing terminals and aim at the same chorus content in the virtual room, the server can further determine the audio stream quality parameters of all the dry audio streams, then evaluate the quality condition of all the dry audio streams based on the audio stream quality parameters of all the dry audio streams to obtain the audio stream quality information of all the dry audio streams, and determine multi-channel target dry audio streams with the audio stream quality information meeting the preset quality condition from the multi-channel dry audio streams. After obtaining the multiple target dry audio streams, the server may mix the multiple target dry audio streams to obtain a chorus audio stream for the viewer in the virtual room.

The singing end and the audience end in the virtual room can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, the internet of things devices can be smart speakers, smart televisions, intelligent vehicle-mounted devices and the like, and the portable wearable devices can be smart watches, smart bracelets, head-mounted devices and the like. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, an audio stream processing method is provided, which is illustrated by taking the application of the method to the server in fig. 1 as an example, and may include the following steps:

s201, in the chorus process, acquiring the dry audio streams which are acquired by the singing terminals in the virtual room and aim at the same chorus content, and determining the audio stream quality parameters of the dry audio streams.

The audio stream quality parameter may be a parameter for evaluating the quality of the audio stream, and may include a parameter for determining the degree of data transmission stuck or fluency and a parameter for determining the quality of audio sound, for example.

In particular implementations, a user may launch a client and access a virtual room, which may also be referred to as a virtual song room, through the client, and multiple clients may access the same virtual room for online real-time interaction. In this embodiment, a client in the virtual room that participates in the chorus interaction may be referred to as a singing end, a singing end that does not participate in the chorus interaction may be referred to as a spectator end, for example, after receiving a chorus interaction initiation request sent from a client in the virtual room that has a host identity (i.e., a host) or a client that has a manager identity (e.g., a house pipe), the server may send chorus invitation information to each client in the virtual room, and if the user confirms participation in the chorus interaction, the client may be triggered to send corresponding confirmation information, the server may determine the client that sends the confirmation information as a singing end, and the client that does not respond to the chorus invitation information (e.g., the client replies with a rejection instruction or does not reply over time) may confirm the same as a spectator end.

In the chorus process, each user participating in chorus can sing the same content, namely, a plurality of users can sing the chorus, in the user singing process, the corresponding client can acquire the current voice signal to obtain the corresponding dry voice audio stream, and the dry voice audio stream is uploaded to the server. Thus, in the chorus process, the server may acquire the dry audio streams collected by each of the multiple singing ends in the virtual room, and the dry audio streams collected by each of the singing ends may be the dry audio streams for the same chorus content.

After the multiple paths of dry audio streams from different singing ends are obtained, the server can further determine the audio stream quality parameters of each path of dry audio stream. Specifically, for example, after the singing end collects the dry audio stream, the singing end may correspondingly obtain at least part of the audio stream quality parameters of the dry audio stream, and when the dry audio stream is sent to the server, send the obtained at least part of the audio stream quality parameters to the server. Of course, the server may also analyze the dry audio stream after receiving the dry audio stream, and obtain an audio stream quality parameter of the dry audio stream according to the analysis result.

S202, based on the audio stream quality parameters of each dry audio stream, respectively acquiring the audio stream quality information of each dry audio stream, and determining multiple paths of target dry audio streams, wherein the audio stream quality information meets the preset quality condition.

After the audio stream quality parameters of each path of dry audio stream are obtained, for each path of dry audio stream, the server side can obtain the audio stream quality information of the path of dry audio stream based on the audio stream quality parameters, for example, a plurality of preset audio stream quality grades can be obtained, the obtained audio stream quality parameters are compared with the audio stream quality parameters of each audio stream quality grade, and further the audio stream quality information of the dry audio stream can be determined based on the comparison result; for another example, the audio stream parameters of each path of the dry audio stream may be ranked, and the audio stream quality information of each path of the dry audio stream may be determined according to the ranking result of the audio stream parameters.

Then, the server may acquire a preset quality condition, determine whether the audio stream quality information of each path of dry audio stream meets the preset quality condition, and determine the multipath dry audio streams as target dry audio streams according to the multipath dry audio streams with the audio stream quality information meeting the preset quality condition, where the dry audio streams with other audio stream quality information not meeting the preset quality condition may be ignored temporarily and no subsequent processing is performed.

And S203, mixing the multipath target dry audio streams to obtain a chorus audio stream facing the audience in the virtual room.

In this step, after the multi-path target dry audio stream is obtained, the multi-path target dry audio stream may be mixed, and the multi-path target dry audio stream may be synthesized into one path of audio stream, so that the path of audio stream obtained after mixing may be used as a chorus audio stream facing each audience in the virtual room. Further, after the service end obtains the chorus audio stream, the chorus audio stream can be sent to each audience end in the virtual room, so that the audience listening to the chorus on the line can obtain chorus audio with excellent quality in time while realizing the chorus on the line.

In some examples, after obtaining multiple paths of dry audio streams, the server may mix all the received dry audio streams and send the audio streams obtained after mixing to the client, however, in reality, due to differences of sound collection equipment, network transmission or singing skills of users, there will be a certain difference in audio stream quality of each path of dry audio streams, if all the dry audio streams are uniformly mixed, the audio stream quality obtained after final mixing is also lower; in addition, under the condition that the number of the singing ends exceeds a preset number, if the number of the singing ends is more than a preset number, the server side receives a large number of dry audio streams from a plurality of the singing ends in a short time, if all the dry audio streams are mixed, the time for the audience side to acquire the chorus audio streams is obviously increased, and the chorus audio streams may also have the conditions of blocking segments and the like in the transmission process due to the large data volume, so that the audience is prevented from acquiring high-quality chorus audio, and the listening feeling of the audience is influenced.

In the application, after receiving the multipath dry audio streams, the multipath target dry audio streams with excellent audio stream quality can be screened out from the multipath dry audio streams by acquiring the dry audio streams with the audio stream quality information meeting the preset quality condition, and then the multipath target dry audio streams are mixed, so that on one hand, the quality of the finally acquired chorus audio streams can be effectively ensured, and the situation that the dry audio streams with poor quality are sent to the audience side is avoided, and on the other hand, the time for acquiring the chorus audio streams is shortened and the data quantity of the chorus audio streams to be sent is reduced by reducing the number of the audio streams to be mixed, so that the subsequent audience side can be promoted to quickly acquire the chorus audio streams.

In this embodiment, in the chorus process, the server may acquire the dry audio streams for the same chorus content acquired by each of the multiple singing ends in the virtual room, determine the audio stream quality parameters of each dry audio stream, further may acquire the audio stream quality information of each dry audio stream based on the audio stream quality parameters of each dry audio stream, determine multiple target dry audio streams whose audio stream quality information meets the preset quality condition, and mix the multiple target dry audio streams to obtain the chorus audio stream for the audience end in the virtual room. In the scheme of the application, the audio content in the chorus audio stream can be enriched by mixing the multipath target dry audio streams which come from different singing ends and aim at the same chorus content, and the quality of the finally obtained chorus audio stream can be effectively ensured by screening and then mixing the multipath dry audio streams, so that a spectator can quickly acquire high-quality online chorus audio.

In one embodiment, after determining that the audio stream quality information satisfies the multi-path target dry audio stream of the preset quality condition in S202, the method may further include the steps of:

and returning at least one target dry audio stream in the multiple target dry audio streams to each singing end.

In the specific implementation, after the service end obtains the multipath dry audio streams, if the obtained multipath dry audio streams are directly returned to the singing end, larger pressure is brought to the bandwidth and performance of the singing end, and under the condition that the delay difference of each path of dry audio streams is larger, the multipath dry audio streams heard by a user through the singing end are noisy; or after determining the multipath target dry audio streams meeting the preset quality condition, if the screened multipath target dry audio streams are mixed and then the mixed chorus audio streams are sent to the singing end, the time for a certain singing end to acquire the audio streams of other singing ends is obviously increased, and in an example, the sending of the chorus audio streams to the singing end is increased by 2-3 seconds, so that the experience of the singing end is seriously affected.

Based on this, in this embodiment, after the multiple paths of target dry audio streams are acquired, at least one path of target dry audio paths summarized by the multiple paths of target dry audio streams may be returned to each singing end, for example, N paths (N is greater than or equal to 2) of target dry audio streams may be acquired from the multiple paths of target dry audio streams and sent to the corresponding singing ends, and the target dry audio streams received by different singing ends may be the same, or may also be sent according to bandwidth parameters of each singing end or configuration information preset by the singing ends.

In the embodiment, at least part of the target audio streams obtained after screening are sent to the singing end, so that the singing end can be ensured to timely acquire high-quality audio streams from other singing ends in the chorus process, and the singing experience of a user is ensured.

In one embodiment, as shown in fig. 3, the step of mixing the multiple target dry audio streams in S203 to obtain a chorus audio stream facing the viewer in the virtual room may include the following steps:

s301, determining a main path dry sound audio stream and a secondary path dry sound audio stream in the multipath target dry sound audio streams.

As an example, the main trunk audio stream may be an audio stream that is emphasized as a main body among the multiple trunk audio streams, and the auxiliary trunk audio stream may be an audio stream that needs to be weakened.

Specifically, in the process of multi-person chorus, the difference of factors such as intensity, timbre, tone and the like of sound can enhance the contrast of singing content and enrich the chorus effect finally presented. In this embodiment, after the multiple target dry audio streams are acquired, the main dry audio stream and the auxiliary dry audio stream in the multiple target dry audio streams may be determined. Specifically, for example, a main path dry audio stream in the multiple paths of target dry audio streams may be determined according to a preset main path dry audio stream screening rule, and then, other audio streams except the main path dry audio stream in the multiple paths of target dry audio streams may be determined as auxiliary path dry audio streams.

S302, reducing the volume of the auxiliary trunk audio stream, and mixing the main trunk audio stream and the auxiliary trunk audio stream with the reduced volume to obtain a chorus audio stream facing the audience in the virtual room.

After the auxiliary trunk audio stream is determined, the audio frequency of the auxiliary trunk audio stream may be reduced, for example, the auxiliary trunk audio stream may be attenuated according to an existing algorithm, and in an alternative embodiment, the reduction amplitude of the volume reduction may be determined according to the delay degree of the auxiliary trunk audio stream, where the reduction amplitude may be positively correlated with the delay degree of the auxiliary trunk audio stream, that is, the higher the delay degree is, the greater the reduction amplitude of the volume of the auxiliary trunk audio stream is, so that the influence of the auxiliary trunk audio stream with high delay on the final chorus effect may be weakened.

After the volume of the auxiliary trunk audio stream is reduced, the main trunk audio stream and the auxiliary trunk audio stream with reduced volume can be mixed, for example, multiple paths of trunk audio streams can be mixed according to a preset mixing algorithm, and a chorus audio stream facing to a spectator end in the virtual room is obtained.

In this embodiment, after the volume of the auxiliary path dry audio stream is reduced, the dry audio streams of the main path and the auxiliary path are mixed, so that the contrast between audio streams in the finally obtained chorus audio stream can be increased, the expression effect of the chorus audio stream is enhanced, and the quality of the chorus audio stream finally obtained by the audience terminal is improved.

In one embodiment, the determining the main channel dry audio stream in the multi-channel target dry audio stream in S301 may include the steps of:

acquiring a current chorus mode of a virtual room, and determining at least one target singing end serving as a collarband in the chorus mode from a plurality of singing ends; the target dry audio stream from the target singer is determined as the main dry audio stream.

The chorus mode may be information indicating a chorus mode or chorus form, such as chorus including a treble part and a bass part, chorus sung by a specified user, or the like, among others.

In a specific implementation, a chorus mode may be selected for the virtual room, after determining multiple target dry audio streams, at least one target singing end serving as a collarband in the current chorus mode may be determined from multiple singing ends, for example, a singing end associated with a user designating the current chorus mode as the collarband is used as the target singing end, and further, the target dry audio stream from the target singing end may be determined as the main dry audio stream.

In this embodiment, by selecting the main dry audio stream from the multiple target dry audio streams according to the chorus mode of the virtual room, different interaction modes of the virtual room can be combined, the sound of the song singed by the appointed object in the chorus process can be enhanced, and the interactivity and the interestingness of online chorus can be improved.

In another embodiment, the audio stream quality parameter may include a delay time, and determining the main dry audio stream of the multi-channel target dry audio stream in S202 may include the steps of: obtaining delay time of each path of target dry sound audio stream from the audio stream quality parameters of each path of target dry sound audio stream; and determining the multipath target dry audio streams with delay time smaller than the time threshold and delay time approaching degree larger than the approaching degree threshold as the main dry audio streams.

After the multipath target dry audio streams are obtained, the delay time of each path of target dry audio stream can be read from the audio stream quality parameters of each path of target dry audio stream. Then, the delay time of each path of target dry audio stream can be compared with a preset time threshold value, and a target dry audio stream with the multipath delay time smaller than the time threshold value is obtained; for the multi-path target dry audio streams with the delay time smaller than the time threshold, the delay time approaching degree between every two paths of target dry audio streams can be determined, for example, the difference value between the delay times is determined, and further, the multi-path target dry audio streams with the delay time approaching degree larger than the approaching degree threshold can be determined as main path dry audio streams.

In this embodiment, by determining the multi-path target dry audio stream, in which the delay time is less than the time threshold and the delay time proximity is greater than the proximity threshold, as the main dry audio stream, the low-delay dry audio stream can be used as the main dry audio stream, the regularity between the main dry audio streams is ensured, and the quality of the chorus audio stream acquired later is improved.

In one embodiment, as shown in fig. 4, before S203, the method may further include the steps of:

s401, acquiring accompaniment progress information carried by each path of target dry audio stream; the accompaniment progress information is used for indicating the playing progress of accompaniment of the chorus song when the corresponding singing end collects the target dry audio stream.

Specifically, the singing end may acquire song accompaniment of the chorus song to be chorused from the service end in advance before chorus starts, and acquire a unified global clock among a plurality of singing ends, for example, may acquire NTP (internet time protocol, network Time Protocol) time. When chorus starts, each singing end can use the global clock as a reference to agree with song accompaniment of chorus songs played at the same time, thereby realizing relatively simultaneous starting and reducing time difference of starting singing among different users.

In the singing process of the user, besides collecting the corresponding dry audio stream, the singing end can also determine the current accompaniment progress information of the chorus song when the dry audio stream is collected, for example, the current playing progress time of accompaniment, and can also represent the accompaniment progress in a percentage form. Further, when the dry audio stream is transmitted to the server, the corresponding accompaniment progress information can be transmitted together. And the server side can read accompaniment progress information carried by the target dry audio streams after determining each path of target dry audio streams.

S401, carrying out alignment processing on each path of target dry sound audio stream based on accompaniment progress information to obtain aligned multipath target dry sound audio streams.

In practical application, when a user sings a song, the user often performs corresponding singing along with accompaniment, based on which, in this step, accompaniment progress information can be used as a reference standard for aligning different target dry audio streams, that is, alignment processing can be performed on each path of target dry audio streams based on accompaniment progress information, for example, target dry audio streams with the same accompaniment progress information can be aligned, thus, aligned multipath target dry audio streams can be obtained, and aligned target dry audio streams can be mixed later.

In this embodiment, the alignment processing may be performed on each path of target dry audio stream, so that the regularity of each path of dry audio stream may be improved under the condition of multiple people line chorus, so as to avoid noisy and chaotic mixing of different dry audio streams, and effectively improve the audio stream quality of the finally obtained chorus audio stream.

In one embodiment, the step of obtaining the audio stream quality information of each dry audio stream based on the audio stream quality parameters of each dry audio stream in S202 may include the following steps:

based on the quality parameters of the audio streams of each dry audio stream, respectively obtaining the tone quality detection result and the delay degree of each dry audio stream; and determining the audio stream quality information of each dry audio stream based on the tone quality detection result and the delay degree of each dry audio stream.

In particular, the audio stream quality parameters may include parameters for determining a degree of data transmission stuck or fluency, and parameters for determining audio quality, and the parameters for determining a degree of data transmission stuck or fluency may include at least one of: round Trip Time (RTT) of audio stream data, uplink packet loss rate, NTP Time.

In this step, after the audio stream quality parameters of each path of dry audio stream are obtained, the tone quality detection result and the delay degree of each path of dry audio stream can be obtained based on the audio stream quality parameters of each path of dry audio stream, where the tone quality detection result can indicate the tone quality of the dry audio stream itself, for example, whether the amplitude of each frequency point corresponding to the dry audio stream is uniform, balanced and full, whether the frequency response curve is straight, whether the tone quality of the sound is accurate, and the like; the delay degree can indicate whether the data transmission of the dry audio stream is smooth in the transmission process.

Further, for each of the dry audio streams, the quality information of the dry audio stream can be determined based on the result of the sound quality detection and the degree of delay of the dry audio stream, for example, a dry audio stream having excellent sound quality and low delay can be determined to have excellent audio stream quality information, and a dry audio stream having poor sound quality and high delay can be determined to have poor audio stream quality information.

In this embodiment, by determining the audio stream quality information of each dry audio stream based on the quality detection result and the delay degree of each dry audio stream, a basis can be provided for subsequent screening of dry audio streams with excellent quality and smooth transmission, and the quality of the finally obtained chorus audio stream can be effectively improved.

In one embodiment, the determining in S202 that the audio stream quality information satisfies the multi-path target dry audio stream of the preset quality condition may include the following steps:

determining a screening time interval of the target dry audio stream, and acquiring the current time and the history time of the last determination of the target dry audio stream; and under the condition that the time interval between the current time and the historical time meets the screening time interval, determining that the quality information of the audio stream meets the multipath target dry audio stream of the preset quality condition.

In this step, a screening time interval of the target dry audio stream may be determined, and a current time and a history time of last screening of the target dry audio stream may be acquired. Illustratively, the screening time interval may be a predetermined time range, such as 20 seconds or 30 seconds.

Then, a time interval between the current time and the historical time can be determined, and whether the time interval meets a predetermined screening time interval is judged; if yes, determining the multipath target dry audio stream with the audio stream quality information meeting the preset quality condition from the multipath dry audio stream again, and if not, continuing waiting until the time interval between the current time and the historical time meets the screening time interval.

In this embodiment, under the condition that the time interval between the current time and the historical time satisfies the screening time interval, it is determined that the quality information of the audio stream satisfies the multipath target dry audio stream of the preset quality condition, so that errors caused by short-time fluctuation or change of the target dry audio stream can be avoided, and the short-time stability of the target dry audio stream and the quality of the chorus audio stream are improved.

In order to enable those skilled in the art to better understand the above steps, the embodiments of the present application will be exemplified below by way of an example, but it should be understood that the embodiments of the present application are not limited thereto.

As shown in fig. 5, after the host user or the management user in the virtual room initiates the chorus, each of singer a and singer B … … singer N to participate in the chorus may send response information, and the client side, i.e. the singer side, of each singer may obtain NTP time from the server side and pull accompaniment of the chorus song in advance to obtain local accompaniment. At the beginning of chorus, each singer can start playing local accompaniment according to global unified NTP time.

In the chorus process, each singer can collect the voice signal of the chorus song by the singer while playing the accompaniment, and can send the collected voice signal to the server in the form of audio stream and send the accompaniment progress information and other audio stream quality parameters to the server. Therefore, the server can take the received audio streams from different singing ends as the dry audio streams and determine the audio stream quality parameters of each dry audio stream. In an example, the server may include a Real-time audio and video communication (Real-Time Communication) server and other background processing devices, where the server may receive the quality parameters of the dry audio stream and the audio stream through the Real-time audio and video communication server, and transmit the parameters to the background processing device in time to perform processing and mixing processing, so as to implement online Real-time chorus of different places of multiple singing ends, and ensure the quality of data transmission.

After the server side obtains each path of dry audio stream, a path can be selected and aligned from multiple paths of dry audio streams based on the audio stream quality parameters of each path of dry audio stream and a preset dry audio stream path selection strategy, and then the aligned multiple paths of dry audio streams are mixed according to a chorus mixing algorithm, wherein the server side can timely return the id of the selected dry audio stream to the singing end, so that the singing end can timely pull the high-quality dry audio stream according to the id, and singing experience is improved.

In the audio mixing process, as shown in fig. 6, taking the singer including A, B, C, D, E as an example, the server may receive the corresponding dry audio stream a, dry audio stream B, dry audio stream C, dry audio stream D, and dry audio stream E, where each dry audio stream may carry accompaniment progress information and flow into different queues, and further align the queues according to the accompaniment progress information, where the aligned standard may be that the offset between the different queues is less than several milliseconds (e.g. 300 milliseconds), and then determine the dry audio stream of the card Duan Shao with low delay according to RTT or NTP time carried by each dry audio stream, and designate 3-5 paths of dry audio streams therein as main path dry audio streams and the other paths as auxiliary path dry audio streams, and then mix the audio.

After mixing, the chorus audio stream with the accompaniment added can be sent to each audience terminal in the virtual room, for example, the chorus audio stream can be sent to an interface machine of the real-time audio-video communication server, and the audience terminal in the virtual room can pull the chorus audio stream through the interface machine.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing the dry audio stream. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of audio stream processing.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the steps of the other embodiments described above are also implemented when the processor executes a computer program.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

In one embodiment, the computer program, when executed by a processor, also implements the steps of the other embodiments described above.

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of audio stream processing, the method comprising:

2. The method of claim 1, wherein said mixing the multiple target dry audio streams to obtain a chorus audio stream that is directed to a viewer in the virtual room comprises:

determining a main path dry sound audio stream and a secondary path dry sound audio stream in the multipath target dry sound audio streams;

and reducing the volume of the auxiliary trunk audio stream, and mixing the main trunk audio stream and the auxiliary trunk audio stream with the reduced volume to obtain a chorus audio stream facing the audience side in the virtual room.

3. The method of claim 2, wherein said determining a main stem audio stream of the multiple target stem audio streams comprises:

acquiring a current chorus mode of the virtual room, and determining at least one target singing end serving as a collarband in the chorus mode from the multiple singing ends;

and determining the target dry audio stream from the target singing end as a main dry audio stream.

4. The method of claim 2, wherein the audio stream quality parameter comprises a delay time, and wherein the determining a main stem audio stream of the multiple target stem audio streams comprises:

obtaining delay time of each path of target dry sound audio stream from the audio stream quality parameters of each path of target dry sound audio stream;

and determining the multipath target dry audio streams with delay time smaller than the time threshold and delay time approaching degree larger than the approaching degree threshold as the main dry audio streams.

5. The method of claim 1, further comprising, prior to said mixing of said multiple target dry audio streams to obtain a chorus audio stream that is directed to a viewer in said virtual room:

acquiring accompaniment progress information carried by each path of target dry audio stream; the accompaniment progress information is used for indicating the playing progress of accompaniment of the chorus song when the corresponding singing end collects the target dry audio stream;

and carrying out alignment processing on each path of target dry sound audio stream based on the accompaniment progress information to obtain aligned multipath target dry sound audio streams.

6. The method according to claim 1, wherein the obtaining the audio stream quality information of each dry audio stream based on the audio stream quality parameter of each dry audio stream, respectively, comprises:

respectively acquiring a tone quality detection result and a delay degree of each dry audio stream based on the audio stream quality parameters of each dry audio stream;

and determining the audio stream quality information of each dry audio stream based on the tone quality detection result and the delay degree of each dry audio stream.

7. The method of claim 1, wherein determining the multi-channel target dry audio stream for which the audio stream quality information satisfies the preset quality condition comprises:

determining a screening time interval of the target dry audio stream, and acquiring the current time and the history time of the last determination of the target dry audio stream;

and under the condition that the time interval between the current time and the historical time meets the screening time interval, determining a multipath target dry audio stream of which the audio stream quality information meets a preset quality condition.

8. The method according to any one of claims 1-7, further comprising, after said determining that the audio stream quality information satisfies the multi-path target dry audio stream of the preset quality condition:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.