CN113194335A

CN113194335A - Streaming media transmission method, transmission equipment and playing equipment

Info

Publication number: CN113194335A
Application number: CN202110734068.2A
Authority: CN
Inventors: 肖凯; 卢日; 吴振中
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-07-30
Anticipated expiration: 2041-06-30
Also published as: CN113194335B

Abstract

The embodiment of the invention provides a streaming media transmission method, transmission equipment and playing equipment, wherein the streaming media transmission method is applied to first transmission equipment and a plurality of second transmission equipment, the first transmission equipment is accessed with the playing equipment, and a plurality of paths of media streams subscribed by the playing equipment are respectively stored in the plurality of second transmission equipment. The first transmission equipment respectively obtains audio characteristic values corresponding to the multiple paths of media streams in a first time period from the multiple second transmission equipment, determines a group of media streams of which the audio characteristic values meet set conditions, obtains the group of media streams from the second transmission equipment corresponding to the group of media streams, and sends the group of media streams to the playing equipment. By the scheme, the network bandwidth required by the first transmission equipment for pulling the media stream can be saved, and the processing pressure of the first transmission equipment is also reduced.

Description

Streaming media transmission method, transmission equipment and playing equipment

Technical Field

The present invention relates to the field of streaming media technologies, and in particular, to a streaming media transmission method, a transmission device, and a playing device.

Background

With the continuous development of internet technology, currently, real-time transmission of media streams such as audio and video is required in many application scenarios, such as application scenarios of live broadcast, audio and video conference, online education, and the like.

Taking an online conference scene as an example, when there are many people participating in a conference, for each participant, media streams (such as audio streams) of other participants need to be subscribed, and the media streams of all other participants are acquired locally for playing, so that the overhead of network bandwidth is large, and the overhead of various processing resources and storage resources is also large.

Disclosure of Invention

The embodiment of the invention provides a streaming media transmission method, transmission equipment and playing equipment, which can realize low-cost transmission of media streams.

In a first aspect, an embodiment of the present invention provides a streaming media transmission method, which is applied to a first transmission device and a plurality of second transmission devices, where the first transmission device has access to a playing device, and multiple paths of media streams subscribed by the playing device are stored in the plurality of second transmission devices, respectively; the method comprises the following steps:

the first transmission equipment respectively acquires audio characteristic values corresponding to the multiple paths of media streams in a first time period from the multiple second transmission equipment;

the first transmission equipment determines a group of media streams of which the audio characteristic values meet set conditions;

the first transmission equipment acquires the group of media streams from second transmission equipment corresponding to the group of media streams;

the first transmission device sends the group of media streams to the playing device.

In a second aspect, an embodiment of the present invention provides a streaming media transmission apparatus, which is applied to a first transmission device and a plurality of second transmission devices, where the first transmission device has access to a playing device, and multiple paths of media streams subscribed by the playing device are stored in the plurality of second transmission devices respectively; the apparatus is located at the first transmission device and comprises:

a first obtaining module, configured to obtain, from the multiple second transmission devices, audio feature values corresponding to the multiple media streams in a first time period respectively;

the determining module is used for determining a group of media streams of which the audio characteristic values meet set conditions;

a second obtaining module, configured to obtain the set of media streams from a second transmission device corresponding to the set of media streams;

a sending module, configured to send the group of media streams to the playback device.

In a third aspect, an embodiment of the present invention provides a first transmission device, located in a distributed transmission network, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the streaming media transmission method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of a transmission device, the processor is enabled to implement at least the streaming media transmission method according to the first aspect.

In a fifth aspect, an embodiment of the present invention provides a streaming media transmission method, which is applied to a first transmission device and a plurality of second transmission devices, where the first transmission device has access to a playing device, and multiple paths of media streams subscribed by the playing device are stored in the plurality of second transmission devices, respectively; the method comprises the following steps:

each second transmission device determines an audio characteristic value corresponding to a local path of media stream in a first time period;

and each second transmission device sends the determined audio characteristic value to the first transmission device so that the first transmission device determines a group of media streams of which the audio characteristic values meet set conditions, and sends the group of media streams acquired from the corresponding second transmission device to the playing device.

In a sixth aspect, an embodiment of the present invention provides a streaming media transmission apparatus, which is applied to a distributed transmission network, where the distributed transmission network includes a first transmission device and a plurality of second transmission devices, the first transmission device has a playback device connected thereto, and the plurality of second transmission devices respectively store multiple paths of media streams subscribed by the playback device; the apparatus is located in each second transmission device, and comprises:

the determining module is used for determining an audio characteristic value corresponding to a local path of media stream in a first time period;

and the sending module is used for sending the determined audio characteristic value to the first transmission equipment so as to enable the first transmission equipment to determine a group of media streams of which the audio characteristic values meet set conditions, and sending the group of media streams acquired from the corresponding second transmission equipment to the playing equipment.

In a seventh aspect, an embodiment of the present invention provides a second transmission device, located in a distributed transmission network, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the streaming media transmission method according to the fifth aspect.

In an eighth aspect, the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of a transmission device, the processor is enabled to implement at least the streaming media transmission method according to the fifth aspect.

In a ninth aspect, an embodiment of the present invention provides a streaming media transmission method, which is applied to a playing device, and the method includes:

when the number of the subscribed multi-path media streams is determined to be greater than a set threshold value, sending a first subscription signaling to a first transmission device which is accessed, so that the first transmission device obtains audio characteristic values of the multi-path media streams respectively corresponding to the first time period from a plurality of second transmission devices respectively, and determines a group of media streams of which the audio characteristic values meet set conditions, wherein the multi-path media streams subscribed by the playing device are stored in the plurality of second transmission devices respectively;

receiving session description protocol information corresponding to the group of media streams sent by the first transmission device;

processing session description protocol information corresponding to the group of media streams, wherein the session description protocol information comprises synchronous information source identifiers corresponding to the group of media streams;

and receiving the group of media streams sent by the first transmission device, wherein the synchronization source identifiers corresponding to the group of media streams are adopted in different groups of media streams received in different scheduling periods.

In a tenth aspect, an embodiment of the present invention provides a streaming media transmission apparatus, located in a playing device, including:

a sending module, configured to send a first subscription signaling to a first access transmission device when it is determined that the number of subscribed multiple media streams is greater than a set threshold, so that the first transmission device obtains audio feature values, corresponding to the multiple media streams, in a first time period from multiple second transmission devices, respectively, and determines a group of media streams whose audio feature values meet a set condition, where the multiple media streams subscribed by the playing device are stored in the multiple second transmission devices, respectively;

a receiving module, configured to receive session description protocol information corresponding to the group of media streams sent by the first transmission device;

a processing module, configured to process session description protocol information corresponding to the group of media streams, where the session description protocol information includes a synchronization source identifier corresponding to the group of media streams;

the receiving module is further configured to receive the group of media streams sent by the first transmission device, where different groups of media streams received in different scheduling periods all use synchronization source identifiers corresponding to the group of media streams.

In an eleventh aspect, an embodiment of the present invention provides a playback device, including: the device comprises a memory, a processor, a communication interface and a player; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the streaming media transmission method according to the ninth aspect.

In a twelfth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, having executable codes stored thereon, which when executed by a processor of a playback device, cause the processor to implement at least the streaming media transmission method according to the ninth aspect.

The embodiment of the invention adopts a certain distributed transmission network to realize the transmission of the media stream. The distributed transmission network comprises a plurality of transmission devices, and for a certain playing device needing to subscribe the multi-path media stream, the distributed transmission network comprises a first transmission device accessed by the playing device and a plurality of second transmission devices respectively storing the multi-path media stream. The playing device pulls the subscribed media stream from the second transmission device through the first transmission device accessed by the playing device.

Specifically, each second transmission device periodically calculates an audio characteristic value (such as an audio energy value) of a currently received media stream, and sends the calculated audio characteristic value to the first transmission device. Therefore, the first transmission device can regularly receive a plurality of audio characteristic values corresponding to the plurality of media streams sent by the second transmission devices, the first transmission device screens out a group of media streams meeting the set conditions from the plurality of media streams according to the received plurality of audio characteristic values (the group of media streams includes at least one media stream), then the first transmission device pulls out a corresponding one of the media streams from the corresponding second transmission device, and sends the pulled group of media streams to the playing device for playing, that is, the playing device only outputs the audio of the group of media streams, and the remaining other media streams cannot be pulled out from the corresponding second transmission devices, and also cannot be sent to the playing device for playing.

Compared with the media stream, the control message of the audio characteristic value occupies less bandwidth resources, and the first transmission device only performs pulling of the media stream with a small part of audio characteristic values meeting the conditions on the basis of the received multiple audio characteristic values each time, so that the consumption of network bandwidth resources in the pulling process of the media stream is obviously reduced. The playing device does not need to receive all the multi-path media streams for playing each time, and the resource consumption condition of the playing device is also relieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic diagram of a media streaming process according to an embodiment of the present invention;

fig. 2a and fig. 2b are schematic diagrams of another streaming media transmission process provided by the embodiment of the invention;

fig. 3 is an interaction flowchart of a streaming media transmission method according to an embodiment of the present invention;

fig. 4 is an interaction flowchart of another streaming media transmission method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an implementation corresponding to the embodiment shown in FIG. 4;

fig. 6 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a first transmission device corresponding to the streaming media transmission apparatus provided in the embodiment shown in fig. 6;

fig. 8 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a second transmission device corresponding to the streaming media transmission apparatus provided in the embodiment shown in fig. 8;

fig. 10 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a playing device corresponding to the streaming media transmission apparatus provided in the embodiment shown in fig. 10.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The embodiment of the invention adopts a distributed transmission network to realize the transmission of the media stream. In practical application, the distributed transmission Network may adopt a Global real-time transmission Network (GRTN for short) to realize real-time transmission of media streams, and meet the real-time transmission requirements of application scenarios such as live broadcast, video conference, online education and the like on the media streams. The media stream may be an audio stream, an audio-video stream, or the like.

The distributed transmission network comprises a plurality of transmission devices (or transmission nodes), and the transmission of media streams to playing devices located at different positions is realized through wireless communication capacity among different transmission devices. In the embodiment of the present invention, the playback device refers to a user terminal device that plays a media stream.

In the following, an online conference scenario involving multiple persons is taken as an example, and how to use a distributed transmission network to transmit a media stream generated during a conference is described with reference to fig. 1.

In fig. 1, it is assumed that user a, user B, user C, and user D are performing an online audio conference, where each user needs to subscribe to the media streams (audio streams) of other users in order to receive the media streams of other users for playing. Taking user a as an example, user a as one of the participants needs to subscribe to media streams of the other three participants.

Taking the user a as an example, when the conference is started, and after the user a enters the conference, it is assumed that the playing device of the user a is accessed to the transmission device 1 in the distributed transmission network, so as to subscribe the media streams of other participants through the transmission device 1. After the conference is started, the terminal devices of the user B, the user C and the user D start to acquire media streams of corresponding users, and push the acquired media streams to the transmission device in the distributed transmission network, assuming that the media stream of the user B is pushed to the transmission device 2, the media stream of the user C is pushed to the transmission device 3, and the media stream of the user D is pushed to the transmission device 4. That is, it is assumed here that the transmission device 2 is a source transmission device corresponding to the media stream of the user B, the transmission device 3 is a source transmission device corresponding to the media stream of the user C, and the transmission device 4 is a source transmission device corresponding to the media stream of the user D.

Based on the above assumed situation, after knowing that the user a needs to subscribe to the media streams of the user B, the user C, and the user D, the transmission device 1 determines that the media streams of the three users are respectively pushed to the transmission device 2, the transmission device 3, and the transmission device 4, and establishes the feed-back paths respectively corresponding to the three transmission devices, as shown in fig. 1.

The transmission device 1 pulls the media stream to the user B from the transmission device 2 via the back-to-source path with the transmission device 2. Similarly, the transmission device 1 pulls the media stream to the user C from the transmission device 3 through the source return path with the transmission device 3, and the transmission device 1 pulls the media stream to the user D from the transmission device 4 through the source return path with the transmission device 4. Then, the transmission device 1 sends the acquired three media streams to the playing device of the user a, and the playing device plays the media streams.

Fig. 1 illustrates a situation of an online conference in which only four persons participate, in fact, for example, a department or an enterprise may organize an online conference in which many persons participate in an online conference by way of an online conference, at this time, each participant needs to subscribe media streams of all remaining participants, for example, if a certain online conference in which the user a participates includes 20 persons in addition to itself, if the media streams of the 20 persons are acquired by the transmission device 1 and all the media streams of the 20 persons are sent to the playing device of the user a, the media streams of the 20 persons are simultaneously played in the playing device, and a situation of sound confusion and noise may occur.

Therefore, in an optional embodiment, after the transmission device 1 acquires the media streams of the 20 people, the audio feature values of the 20 media streams are calculated respectively, N media streams with audio feature values topN are selected from the audio feature values, and the N media streams are sent to the playing device of the user a for playing, so as to improve listening experience. The number of N is less than the total number of participants, and the value of N can be set according to actual requirements, such as 3 and 6. In practical applications, in an audio-video conference scene, only a small number of participants (the number of participants corresponds to N) generally speak in a short time, and other participants do not speak, so that the media stream of the participant who does not speak may not be sent to the playing device of the user a. In the embodiment of the present invention, the audio feature value is mainly used for reflecting the size of the sound, and may include, but is not limited to, an audio energy value, a decibel value, and the like.

However, in the above alternative embodiment, the transmission device 1 still needs to pull the 20 media streams to the local, and locally calculates the audio feature values corresponding to the 20 media streams at set time intervals, which may result in large bandwidth waste and CPU overhead. Since the transmission device 1 needs to pull all 20 media streams back to the source even if only 3 media streams (N = 3) need to be output to the playback device finally, there is a waste of bandwidth. In addition, the number of the playing devices accessing the transmission device 1 may not be unique, the number of the media streams subscribed by each playing device is different, each path of the media stream subscribed by each playing device accessing the transmission device 1 needs to calculate the audio characteristic value at the transmission device 1, the calculation amount is large, and the processing pressure of the CPU of the transmission device 1 may be increased.

To this end, the embodiment of the present invention provides another optimized solution. In this solution, the calculation of the audio feature values of the media stream is separated from the selection and output of the media stream to save network bandwidth and computational pressure on the transmission device to which the playback device is connected.

The implementation of this solution is illustrated below in connection with fig. 2a and 2 b.

In fig. 2a and 2B, it is still assumed that the user a, the user B, the user C, and the user D organize an audio-video conference, and the case that the user a subscribes to media streams of the remaining three users is still taken as an example for explanation. As shown in fig. 2a, after the conference starts, the terminal device of the user B continuously collects the media stream of the user B, and assumes that the generated data packet, i.e. the media stream data collected in the corresponding 20ms, is sent to the transmission device 2 connected thereto at the frequency of once 20 ms. Similarly, the terminal device of user C also sends the data packet containing the media stream of user C to the transmission device 3 that it accesses at the frequency of once 20ms, and the terminal device of user D also sends the data packet containing the media stream of user D to the transmission device 4 that it accesses at the frequency of once 20 ms.

Alternatively, the transmission device 2, the transmission device 3, and the transmission device 4 are configured to perform the calculation of the audio feature value once every set time interval (for example, once every 100 ms) from the reception of the above-described media stream.

Specifically, at each calculation time, the audio characteristic values may be calculated for at least one data packet received within a previously set time period, and then the maximum value or the average characteristic value is determined from the obtained at least one audio characteristic value as the current calculation result. For example, taking the transmission device 2 as an example, calculating the current calculation time to be T1, the set time length is, for example, 60ms, and based on the assumption that the transmission frequency of the data packets indicates that 3 data packets can be received within 60ms, the transmission device 2 may calculate the audio feature values of the recently received 3 data packets at time T1, respectively, to obtain 3 audio feature values, and then select one of the most recently received data packets as the audio feature value of the media stream of the user B at time T1. The calculation processes of the transmission device 3 and the transmission device 4 are the same, and are not described in detail.

Therefore, the transmission device 2, the transmission device 3, and the transmission device 4 determine the current audio feature value of the corresponding one of the media streams every 100 ms.

When a conference starts, the playing device of the user a accesses the transmission device 1, and informs the transmission device 1 that it needs to subscribe to media streams of the remaining three users, and the transmission device 1 establishes a source return path with the transmission device 2, the transmission device 3, and the transmission device 4, respectively. The back-source path is used not only for the transmission of the media stream but also for the transmission of the audio feature values described above.

For example, as shown in fig. 2a, it is assumed that at time T1, the audio feature values calculated by the transmission device 2, the transmission device 3, and the transmission device 4 are E1, E2, and E3, respectively, and the transmission device 2, the transmission device 3, and the transmission device 4 transmit the audio feature values to the transmission device 1 through the back-to-source path with the transmission device 1, respectively. The transmitting device 1 sorts the received three audio feature values and determines top2 (in this example, N = 2), and it is assumed that top2 is the media streams of user B and user C, respectively. Based on the determination result, the transmission apparatus 1 informs the transmission apparatus 2 to output the media stream of the user B through the back-to-source path with the transmission apparatus 2, and informs the transmission apparatus 3 to output the media stream of the user C through the back-to-source path with the transmission apparatus 3, and informs the transmission apparatus 4 to stop outputting the media stream of the user D through the back-to-source path with the transmission apparatus 4. In this way, the transmission device 1 sends the received media streams of the user B and the user C to the playing device of the user a for playing, and at this time, the playing device of the user a does not receive and output the media stream of the user D.

Assuming that time T1+ 100ms = time T2, when time T2 is reached, the transmission apparatus 2, the transmission apparatus 3, and the transmission apparatus 4 perform calculation of the audio feature values of the media streams of the user B, the user C, and the user D, which are received respectively, again, as shown in fig. 2B, assuming that at time T2, the audio feature values calculated by the transmission apparatus 2, the transmission apparatus 3, and the transmission apparatus 4, respectively, are E4, E5, and E6, and the transmission apparatus 2, the transmission apparatus 3, and the transmission apparatus 4, respectively, transmit the audio feature values to the transmission apparatus 1 through a back-to-source path with the transmission apparatus 1. The transmitting device 1 sorts the received three audio feature values and determines top2 (in this example, N = 2), assuming that top2 is the media streams of user B and user D, respectively. Based on the determination result, the transmission apparatus 1 continues to receive the media stream of the user B output by the transmission apparatus 2 through the back-to-source path with the transmission apparatus 2, the transmission apparatus 1 informs the transmission apparatus 3 to stop outputting the media stream of the user C through the back-to-source path with the transmission apparatus 3, and the transmission apparatus 1 informs the transmission apparatus 4 to output the media stream of the user D through the back-to-source path with the transmission apparatus 4. In this way, the transmission device 1 sends the received media streams of the user B and the user D to the playing device of the user a for playing, and at this time, the playing device of the user a does not receive and output the media stream of the user C.

As can be seen from the foregoing examples, in the process of pulling the multiple media streams subscribed by the user, the transmission device accessed by the playing device of a certain user does not directly pull the media streams based on the source return paths respectively corresponding to the multiple media streams, but first obtains the current audio feature values of the multiple media streams through each source return path, determines several media streams that need to be currently obtained based on the obtained audio feature values, and then pulls the several media streams, so that bandwidth resources occupied by the pulled media streams can be reduced, and the audio feature values are not calculated locally, and only the audio feature values need to be filtered (e.g., sorted) locally, so as to reduce the local processing pressure.

Still taking the scenario of the four users performing the audio/video conference as an example, it should be noted that, for compatibility with the conventional media stream pulling process, optionally, at an initial stage, for example, when the conference is started (assumed to be at time T0), the transmission device 1 may pull the three media streams of the user B, the user C, and the user D to the local based on the source return paths between the transmission device 2, the transmission device 3, and the transmission device 4, perform the current audio feature value calculation of the three media streams locally once, and determine which media stream needs to be output to the playing device of the user a according to the calculation result. Then, at time T0+100ms = T1, the audio feature values calculated at time T1 and transmitted by the three transmission devices are received through the source return paths.

Summarizing the schematic situations of fig. 2a and fig. 2b, the streaming media transmission method provided by the embodiment of the present invention can be summarized as follows: the streaming media transmission method is applied to a distributed transmission network, the distributed transmission network comprises a first transmission device and a plurality of second transmission devices, the first transmission device is accessed with a playing device, and a plurality of paths of media streams subscribed by the playing device are respectively stored in the plurality of second transmission devices, namely, the paths of media streams are initially pushed to the plurality of second transmission devices.

Based on this, in terms of the first transmission device accessing a certain playing device, the execution process of the first transmission device is as follows:

For each second transmission device storing one of the media streams, the execution process of each second transmission device is as follows:

In practical application, the audio characteristic value of the media stream may be configured to be calculated every set time interval, and the calculation target at each time is the media stream data received before the current calculation time within the set time length. For example, in the above example, the time T1 is a calculation time, which is separated from the last calculation time by a set time interval, and the calculation objects at the time T1 are: one or more packets containing the media stream received within 60ms before time T1. At this time, the first time period is as follows: the time period from 60ms before time T1 to time T1.

It should be noted that, as described above, "obtaining the audio feature values corresponding to the multiple media streams in the first time period" is performed, where the first time period may be consistent or may not be completely consistent for the multiple media streams. For example, assuming that the set first time period is T1 to T2, the first time period corresponding to the audio feature value of one media stream may actually be T1+30ms to T2, and the first time period corresponding to the audio feature value of the other media stream may actually be T1+10ms to T2+30 ms.

The audio feature value meets a set condition, where the condition may be topN or greater than a set feature threshold.

The group of media streams refers to a group of media streams corresponding to a first time period, and the group of media streams is composed of at least one media stream of which the audio characteristic value determined from the multiple media streams meets the set condition. The audio feature values of the media streams in the second time period may change (for example, a person who originally speaks does not speak but speaks another person), so that a group of media streams to be pulled determined based on the audio feature values of the media streams calculated in the next scheduling period will change.

In the solution provided by the above embodiment of the present invention, the delivery of the audio feature value is separated from the subscription and delivery of the media stream containing the audio, and the audio feature value is periodically delivered as a message, so that the cost is very low; the media stream data is efficiently transmitted according to the requirement, and the bandwidth resource and the storage resource of the playing equipment are saved.

In the above description, in a scenario where multiple persons participate in an audio/video conference, one of the participants needs to subscribe to media streams of the remaining other participants, and the above solution is provided to reduce the overhead of processing pressure on bandwidth resources and accessed transmission equipment during transmission of the subscribed multiple media streams. An optional scheme for implementing media stream subscription is specifically described below with reference to the following embodiments, and by using the scheme, the processing pressure of the playing device of the user in the process of subscribing to the media stream can be reduced.

Fig. 3 is an interactive flowchart of a streaming media transmission method according to an embodiment of the present invention, where the method still employs a distributed transmission network, where the distributed transmission network includes a first transmission device and a plurality of second transmission devices, the first transmission device has a playback device connected thereto, and the plurality of second transmission devices respectively store multiple channels of media streams subscribed by the playback device. Based on this assumption, as shown in fig. 3, the method may include the steps of:

301. and when the playing device determines that the number of the subscribed multi-path media streams is greater than a set threshold value, sending a first subscription signaling to the first transmission device.

302. The first transmission equipment respectively obtains audio characteristic values corresponding to the multiple paths of media streams in a first time period from the multiple second transmission equipment, and determines a first group of media streams of which the audio characteristic values meet set conditions.

In practical application, still taking an audio-video conference as an example, after a certain user accesses the audio-video conference by operating the own playing device, the corresponding audio-video conference software in the playing device can learn the number scale of participants, for example, the total number of participants is displayed in a conference group, so that the user can learn the number scale, or, assuming that all or most of other participants have accessed the conference, the user can learn the approximate number scale according to the accessed number of participants.

When the number of participants is found to be greater than a set threshold value, that is, the number of media streams to be subscribed is determined to be greater than the set threshold value, the playing device of the user may send a preset first subscription signaling to the accessed first transmission device.

The first subscription signaling is a new subscription signaling, different from the conventional subscription signaling.

The traditional subscription signaling is in one-to-one correspondence with one media stream, which indicates which media stream needs to be subscribed, and how many traditional subscription signaling are sent out when how many media streams need to be subscribed. The processing procedure of the first transmission device after receiving the conventional subscription signaling and the first subscription signaling is different. In this embodiment, a process of the conventional subscription signaling is not described first, and is described in detail in the following embodiments.

In an optional embodiment, when the playback device already knows each media stream that needs to be subscribed, the first subscription signaling may carry identification information of each media stream that needs to be subscribed, where the identification information may be represented by a Uniform Resource Locator (URL) and a Mobile Station Identifier (MSID).

After receiving the first subscription signaling sent by the playing device, the first transmission device learns what each subscribed media stream is, queries and determines the transmission device to which each media stream is pushed, that is, the plurality of second transmission devices, establishes a feed back path between the first transmission device and the plurality of second transmission devices, obtains the audio feature values corresponding to each media stream in a first time period through the feed back path, and selects a first group of media streams of which the audio feature values meet the conditions. For example, media streams with audio feature values topN are selected to form a first group of media streams, where N is, for example, a preset value such as 3 or 6.

The second transmission devices are named for distinguishing from the first transmission devices, and if the subscribed multiple media streams are pushed to different transmission devices respectively, the second transmission devices correspond to the multiple media streams one to one.

The "first time period" may be a time range from a time when the audio feature value of the multiple media streams is calculated for the first time to a time when a time length is set before. Assuming that the time of the first calculation is T0 and the set time duration is 60ms, the audio feature values of the media streams received in the 60ms are calculated. In practice, a time of at least 60ms after the conference is started may be set as the first calculation time. The above 60ms is only an example, and other values such as 200ms may be used.

303. The first transmission device sends session description protocol information corresponding to the first group of media streams to the playing device, wherein the session description protocol information contains synchronous source identifiers corresponding to the first group of media streams.

After determining the first group of media streams, the first transmission device sends Session Description Protocol (SDP for short) information corresponding to the first group of media streams to the playing device.

SDP is used to describe a multimedia communication session (session) including session establishment, session requests, and parameter negotiation. SDP is not used for the transmission of media data, but only for parameter negotiation of two communication terminals, including media type, format and all other session related attributes. The SDP describes the above initialization parameters in the form of a character string.

Briefly, the SDP information includes various negotiation parameter information related to media stream transmission between the first transmission device and the playing device, where the negotiation parameter information includes a synchronization source identifier (SSRC) corresponding to the media stream. An SSRC is used to identify a synchronization source, i.e. a media stream.

In practical application, the first transmission device may send an SDP packet to the playing device, where the SDP packet carries the SSRCs corresponding to the media streams in the first group of media streams.

304. And the playing equipment processes the session description protocol information corresponding to the first group of media streams.

305. The first transmission device sends a first set of media streams to the playback device.

And after receiving the SDP information, the playing equipment processes the SDP information. In short, local configuration is performed according to various parameters carried in the SDP information, so as to make the receiving of the media stream accurate.

The detailed processing procedure of the SDP information can refer to the related art, and only two points are briefly described here: firstly, the playing device can analyze N SSRCs from the N SSRCs, and store the analyzed SSRCs locally; second, the playback device may create an N-way media channel (audio track) corresponding to the first group of media streams according to the SDP information, where it is assumed that the N-way media streams are included in the first group of media streams. In practical application, the first transmission device may send N SDP packets to the playing device, where the N SDP packets are respectively used to carry information such as parameter negotiation corresponding to the N media streams.

After the playing device completes processing the SDP information, that is, after preparation for receiving the media stream is made, the first transmission device sends the N media streams pulled from the corresponding second transmission device to the playing device for playing. It can be understood that, for any one of the media streams, a corresponding SSRC is carried in a Real-time Transport Protocol (RTP) packet encapsulating the media stream. After receiving RTP packets corresponding to the N paths of media streams, the playing equipment analyzes the SSRC from the RTP packets, compares the SSRC with a locally stored SSRC, and if the SSRC is consistent with the locally stored SSRC, analyzes the media streams for playing; if the RTP packet is inconsistent with the RTP packet, the RTP packet can be discarded, and error prompt information can be fed back.

306. And responding to the arrival of the second time period, the first transmission equipment acquires audio characteristic values corresponding to the multiple paths of media streams in the second time period from the multiple second transmission equipment, determines a second group of media streams of which the audio characteristic values meet set conditions, and modifies synchronous source identifiers corresponding to the second group of media streams into synchronous source identifiers corresponding to the first group of media streams.

307. The first transmission device transmits the second group of media streams modified by the synchronization source identifier to the playing device.

As described above, the value of N is often much smaller than the total number of people of the actual participant, and in the process of the conference, the speaker will change constantly, and the N media streams output to the playback device should also change dynamically, so each second transmission device can periodically calculate the audio feature value of each current media stream in a periodic manner, so that the first transmission device can determine the N media streams that should be pulled at different times according to the audio feature values at different times.

Specifically, assuming that the current second time period has arrived, at this time, the first transmission device may obtain the audio feature values of the media streams calculated by the second transmission devices in the scheduling period, and determine a second group of media streams whose audio feature values meet the set conditions, where the second group of media streams still includes N media streams, but the N media streams may not be completely the same as the N media streams included in the first group of media streams.

At this time, when the first transmission device sends the second group of media streams to the playing device, the SSRC corresponding to each path of media stream in the second group of media streams is modified to the SSRC corresponding to each path of media stream in the first group of media streams. For example, the N SSRCs included in the first group of media streams are respectively represented as: SSRC1, SSRC2, SSRC3, SSRC 4. The original SSRCs corresponding to the N media streams included in the second group of media streams are: SSRCa, SSRCb, SSRCc, SSRCd, then the requirement is modified to: SSRC1, SSRC2, SSRC3, SSRC 4.

And the first transmission equipment sends the second group of media streams modified by the SSRC to the playing equipment, and the playing equipment analyzes the SSRC contained in the second group of media streams, finds out that the second group of media streams are matched with the SSRC stored in the local storage before and plays the second group of media streams. Therefore, it can be seen that, assuming that the second group of media streams is different from the first group of media streams, that is, the first transmission device dynamically updates the media streams subscribed by the playing device at that time based on the audio feature values of the media streams, and the change is only perceived by the first transmission device, and the playing device does not perceive the change, that is, the playing device does not need to additionally perform other processing, and only needs to receive and play each group of media streams sequentially sent by the first transmission device.

In addition, according to the above scheme, assuming that the total number of users participating in the audio/video conference is M, through the implementation of the above scheme, the playing device only needs to process SDP information corresponding to N media streams, where N is less than M, and does not need to process SDP information for other media streams, thereby reducing the overhead of performing SDP information processing at the playing end.

In practice, in a large online conference in which a large number of users participate, many people do not speak in a short time, and therefore, it is not necessary that each participant subscribes (acquires) media streams of all other participants all the time, and media streams of a small number (e.g., N) of participants can be dynamically subscribed at different times, which is why only the playback device needs to perform N SDP information processing. In short, the playing device completes the preparation for receiving the media stream based on the SDP information of some N media streams at first, and establishes N media channels. And multiplexing the N media channels by the corresponding N media streams at different subsequent moments.

As described above, in practical applications, sometimes when a user accesses a conference, many participants may not access the conference, and at this time, for the playing device of the user, the overall staff size of the conference cannot be accurately known. If the staff size of the conference is smaller, the traditional subscription processing process can be adopted, and if the staff size of the conference is larger, the scheme provided by the embodiment of the invention can be adopted. Based on this, fig. 4 shows an alternative embodiment.

Fig. 4 is an interactive flowchart of another streaming media transmission method according to an embodiment of the present invention, where the method still employs a distributed transmission network, where the distributed transmission network includes a first transmission device and a plurality of second transmission devices, the first transmission device has a playback device connected thereto, and the plurality of second transmission devices respectively store multiple paths of media streams subscribed by the playback device. Based on this assumption, as shown in fig. 4, the method may include the steps of:

401. the playing device sends at least two second subscription signaling to the first transmission device, and the at least two second subscription signaling correspond to at least two paths of media streams in the multi-path media streams respectively.

402. And the first transmission equipment respectively sends the session description protocol information corresponding to the at least two paths of media streams to the playing equipment.

403. The playing device processes the received at least two pieces of session description protocol information.

404. And the playing equipment determines that the number of the at least two media streams reaches the set threshold value and determines that media streams needing to be subscribed exist.

405. The playing device sends a first subscription signaling to the first transmission device.

406. And the first transmission equipment respectively acquires audio characteristic values corresponding to the at least two media streams in a first time period from second transmission equipment corresponding to the at least two media streams, and determines a first group of media streams of which the audio characteristic values meet set conditions.

407. The first transmission device sends session description protocol information corresponding to the first group of media streams to the playing device, wherein the session description protocol information contains synchronous source identifiers corresponding to the first group of media streams.

408. The playing device processes the session description protocol information corresponding to the first group of media streams, and deletes the processing result of the session description protocol information corresponding to the at least two paths of media streams.

409. And the playing device sends another second subscription signaling to the first transmission device, wherein the another second subscription signaling corresponds to any one of the media streams except the at least two media streams in the multiple media streams.

410. And the first transmission equipment sends the session description protocol information corresponding to any path of media stream to the playing equipment.

411. And the playing device discards the session description protocol information corresponding to any path of media stream.

412. The first transmission device sends a first set of media streams to the playback device.

413. And responding to the arrival of the second time period, the first transmission equipment respectively acquires the audio characteristic values corresponding to the multiple paths of media streams in the second time period from the multiple second transmission equipment, determines a second group of media streams of which the audio characteristic values meet set conditions, and modifies the synchronous source identifiers corresponding to the second group of media streams into the synchronous source identifiers corresponding to the first group of media streams.

414. The first transmission device transmits the second group of media streams modified by the synchronization source identifier to the playing device.

For ease of understanding, the implementation of the above-described embodiment is illustrated in conjunction with fig. 5. In fig. 5, it is assumed that the threshold set in step 404 is 16. And representing the second subscription signaling as sub (meaning subscription), and subb represents the second subscription signaling triggered for the ith media stream. onsub represents response signaling of the first transmission device to the second subscription signaling, onsubi represents response to ubi, and the SDPi included in the response represents SDP information corresponding to the ith media stream. The first subscription signaling is represented in fig. 5 as: sub-dummy audio, the corresponding response signaling of which is denoted as sub-dummy audio.

Assuming that a user accesses the conference through the playing device and finds that other people have accessed the conference, the playing device will sequentially initiate subscriptions for the accessed users. As shown in fig. 5, the playing device first sends a sub1 subscription signaling, where the subscription signaling carries identification information corresponding to the media stream of the first person (i.e. the first media stream), such as URL + MSID. After receiving the sub1, the transmission device 1 feeds back a sub1 response signaling, where the response signaling carries the SDP1 corresponding to the first media stream. And after receiving the first media stream, the playing device processes the SDP1 corresponding to the first media stream.

In parallel, the playing device also initiates similar subscription signaling and responds SDP processing for other users who have accessed the conference. For example, it is assumed in fig. 5 that the subscription process for the 16 th media stream is similar until the subscription for the 16 th media stream is initiated, and details are not repeated.

After the playing device completes the subscription processing of the 16 th media stream, if it is found that there are subsequent media streams to be subscribed, for example, it is found that at least 17 users have accessed a conference, as shown in fig. 5, at this time, the playing device sends a sub dummy audio subscription signaling to the first transmission device, and after receiving the subscription signaling, the first transmission device respectively establishes a feed back path between the second transmission devices corresponding to the first 16 th media streams, so as to obtain audio feature values of the 16 th media streams in the first time period from the corresponding second transmission devices, and determine a first group of media streams whose audio feature values meet the set conditions. Assuming that the number of media streams included in a set of media streams is 6, the first set of media streams includes 6 of the first 16 media streams. The first transmission device carries SDP information corresponding to the 6 media streams in the sub dummy audio signaling, which is denoted as SDP _ top 6. The playing device receives the SDP information corresponding to the 6 paths of media streams and then processes the SDP information, and deletes the processing result of the SDP information corresponding to the first 16 paths of media streams. In short, the playing device will create a media channel corresponding to the 6 media streams and delete the 16 media channels that have been created. The playback device also stores the 6 SSRCs corresponding to the 6 media streams.

Then, in response to the access of other subsequent users, such as the 17 th user, the 18 th user, and the like, the playback device still initiates a corresponding subscription through sub signaling, so as to let the first transmission device know what the media stream still needs to be subscribed. The first transmission device feeds back the SDP information corresponding to one path of corresponding media stream in the same way as the first 16 paths of sub-signaling, and only after receiving the SDP information corresponding to other subsequent paths of media streams, the playing device discards the received SDP information, that is, the SDP information is not processed.

The subscription processing of the multi-path media stream is completed through the above process. For the transmission process of the media stream, reference may be made to the descriptions in the foregoing other embodiments, which are not described herein again.

In summary, according to the above scheme, in the subscription process of the media stream, the processing overhead of the playback device can be reduced.

The streaming media transmission scheme provided by the embodiment of the invention can be applied to various application scenes, such as an audio and video conference scene, an online education (teachers give lessons to students in a live broadcasting mode) scene, a live broadcasting scene with wheat and the like.

The streaming media transmission apparatus according to one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.

Fig. 6 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention, where the apparatus is located in a first transmission device in the foregoing distributed transmission network, and as shown in fig. 6, the apparatus includes: the device comprises a first acquisition module 11, a determination module 12, a second acquisition module 13 and a sending module 14.

A first obtaining module 11, configured to obtain, from the multiple second transmission devices, audio feature values corresponding to the multiple media streams in a first time period respectively.

A determining module 12, configured to determine a set of media streams whose audio feature values meet a set condition.

A second obtaining module 13, configured to obtain the set of media streams from a second transmission device corresponding to the set of media streams.

A sending module 14, configured to send the set of media streams to the playback device.

Optionally, the first obtaining module 11 is specifically configured to: receiving a first subscription signaling sent by the playing device, where the first subscription signaling is sent by the playing device when it is determined that the number of the subscribed multiple media streams is greater than a set threshold; and respectively acquiring audio characteristic values corresponding to the multiple media streams in a first time period from the multiple second transmission devices based on the first subscription signaling.

Optionally, the sending module 14 is further configured to: and sending session description protocol information corresponding to the group of media streams to the playing device so that the playing device processes the session description protocol information corresponding to the group of media streams, wherein the session description protocol information comprises synchronous information source identifiers corresponding to the group of media streams.

Optionally, the first obtaining module 11 is further configured to: responding to the arrival of a second time period, respectively acquiring audio characteristic values corresponding to the multiple paths of media streams in the second time period from the multiple second transmission devices; determining another group of media streams of which the audio characteristic values meet set conditions; modifying the synchronous source identifier corresponding to the other group of media streams into the synchronous source identifier corresponding to the group of media streams; and sending the modified other group of media streams to the playing device.

Optionally, the first obtaining module 11 is further configured to: receiving at least two second subscription signaling sent by the playing device, where the at least two second subscription signaling correspond to at least two media streams in the multiple media streams, respectively; respectively sending session description protocol information corresponding to the at least two paths of media streams to the playing device so that the playing device processes the received at least two session description protocol information; receiving a first subscription signaling sent by the playing device, where the first subscription signaling is sent by the playing device when determining that the number of the at least two paths of media streams has reached the set threshold and determining that there are media streams to be subscribed; respectively acquiring audio characteristic values corresponding to the at least two media streams in a first time period from second transmission equipment corresponding to the at least two media streams based on the first subscription signaling; the session description protocol information corresponding to the group of media streams is further configured to enable the playing device to delete the processing result of the session description protocol information corresponding to the at least two media streams.

Optionally, the obtaining module 11 may be further configured to: receiving another second subscription signaling sent by the playing device, where the another second subscription signaling corresponds to any one of the multiple media streams except the at least two media streams; and sending session description protocol information corresponding to any path of media stream to the playing device, wherein the playing device discards the session description protocol information corresponding to any path of media stream after receiving the session description protocol information corresponding to any path of media stream.

The apparatus shown in fig. 6 may perform the steps performed by the first transmission device in the foregoing embodiment, and the detailed performing process and technical effect refer to the description in the foregoing embodiment, which are not described herein again.

In one possible design, the structure of the streaming media transmission apparatus shown in fig. 6 may be implemented as a first transmission device located in the distributed transmission network, as shown in fig. 7, where the first transmission device may include: a first processor 21, a first memory 22, a first communication interface 23. Wherein the first memory 22 has stored thereon executable code which, when executed by the first processor 21, makes the first processor 21 at least operable to carry out the steps performed by the first transmission device as in the previous embodiments.

Fig. 8 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention, where the apparatus is located in a second transmission device in the foregoing distributed transmission network, and as shown in fig. 8, the apparatus includes: a determining module 31 and a sending module 32.

The determining module 31 is configured to determine an audio feature value corresponding to a local one-way media stream in a first time period.

The sending module 32 is configured to send the determined audio feature value to the first transmission device, so that the first transmission device determines a group of media streams whose audio feature values meet a set condition, and sends the group of media streams obtained from the corresponding second transmission device to the playback device.

The apparatus shown in fig. 8 may perform the steps performed by the second transmission device in the foregoing embodiment, and the detailed performing process and technical effect refer to the description in the foregoing embodiment, which are not described herein again.

In one possible design, the structure of the streaming media transmission apparatus shown in fig. 8 may be implemented as a second transmission device located in the distributed transmission network, as shown in fig. 9, where the second transmission device may include: a second processor 41, a second memory 42, a second communication interface 43. Wherein the second memory 42 has stored thereon executable code which, when executed by the second processor 41, makes the second processor 41 at least operable to carry out the steps performed by the second transmission device as in the previous embodiments.

Fig. 10 is a schematic structural diagram of a streaming media transmission apparatus according to an embodiment of the present invention, where the apparatus is located in a playing device, and as shown in fig. 10, the apparatus includes: a sending module 51, a receiving module 52 and a processing module 53.

A sending module 51, configured to send a first subscription signaling to an accessed first transmission device when it is determined that the number of the subscribed multiple media streams is greater than a set threshold, so that the first transmission device obtains, from multiple second transmission devices, audio feature values corresponding to the multiple media streams in a first time period, respectively, and determines a group of media streams whose audio feature values meet a set condition, where the multiple second transmission devices and the first transmission device are located in a distributed transmission network, and the multiple second transmission devices store the multiple media streams subscribed by the playing device, respectively.

A receiving module 52, configured to receive session description protocol information corresponding to the group of media streams sent by the first transmission device.

The processing module 53 is configured to process session description protocol information corresponding to the group of media streams, where the session description protocol information includes a synchronization source identifier corresponding to the group of media streams.

The receiving module 52 is further configured to receive the group of media streams sent by the first transmission device, where the synchronization source identifiers corresponding to the group of media streams are all used in different groups of media streams received in different scheduling periods.

Optionally, the sending module 51 is specifically configured to: and respectively sending at least two second subscription signaling to the first transmission equipment, wherein the at least two second subscription signaling respectively correspond to at least two paths of media streams in the multi-path media streams. The processing module 53 is further configured to: if receiving, by the receiving module 52, session description protocol information corresponding to each of the at least two media streams respectively sent by the first transmission device, processing the received at least two session description protocol information; if it is determined that the number of the at least two media streams has reached the set threshold and it is determined that there are media streams to be subscribed, a sending module 51 sends a first subscription signaling to the first transmission device, so that the first transmission device obtains audio feature values corresponding to the at least two media streams in a first time period from second transmission devices corresponding to the at least two media streams, and determines a group of media streams whose audio feature values meet a set condition. The processing module 53 is further configured to delete the processing result of the session description protocol information corresponding to each of the at least two media streams after receiving, by the receiving module 52, the session description protocol information corresponding to the group of media streams sent by the first transmission device.

Optionally, the sending module 51 is further configured to: and sending another second subscription signaling to the first transmission device, where the another second subscription signaling corresponds to any one of the media streams except the at least two media streams in the multi-path media stream. The receiving module 52 is further configured to: and receiving session description protocol information corresponding to any one path of media stream sent by the first transmission equipment. The processing module 53 is further configured to: and discarding the session description protocol information corresponding to any path of media stream.

The apparatus shown in fig. 10 may perform the steps performed by the playing device in the foregoing embodiment, and the detailed performing process and technical effect refer to the description in the foregoing embodiment, which are not described herein again.

In a possible design, the structure of the streaming media transmission apparatus shown in fig. 10 may be implemented as a playing device, that is, a terminal device with a media stream playing function, as shown in fig. 11, where the playing device may include: a third processor 61, a third memory 62, a third communication interface 63, a player 64. Wherein the third memory 62 has stored thereon executable code, which when executed by the third processor 61, makes the third processor 61 at least to implement the steps performed by the playback device as in the previous embodiments.

In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium, which stores executable code thereon, and when the executable code is executed by a processor of a transmission device, the processor is enabled to implement at least the streaming media transmission method as provided in the foregoing embodiment.

An embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of a playing device, the processor is enabled to implement at least the streaming media transmission method as provided in the foregoing embodiment.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A streaming media transmission method is characterized in that the method is applied to a distributed transmission network, the distributed transmission network comprises a first transmission device and a plurality of second transmission devices, the first transmission device is accessed with a playing device, and a plurality of paths of media streams subscribed by the playing device are respectively stored in the plurality of second transmission devices; the method comprises the following steps:

2. The method according to claim 1, wherein the obtaining, by the first transmission device, the audio feature values corresponding to the multiple media streams in the first time period from the multiple second transmission devices respectively comprises:

the first transmission device receives a first subscription signaling sent by the playing device, wherein the first subscription signaling is sent by the playing device when the playing device determines that the number of the subscribed multi-path media streams is greater than a set threshold value;

and the first transmission equipment respectively acquires the audio characteristic values corresponding to the multiple media streams in the first time period from the multiple second transmission equipment based on the first subscription signaling.

3. The method of claim 2, further comprising:

the first transmission device sends session description protocol information corresponding to the group of media streams to the playing device, so that the playing device processes the session description protocol information corresponding to the group of media streams, where the session description protocol information includes synchronization source identifiers corresponding to the group of media streams, and each group of media streams corresponding to different time periods adopts the synchronization source identifier.

4. The method of claim 3, further comprising:

responding to the arrival of a second time period, the first transmission equipment respectively acquires audio characteristic values corresponding to the multiple paths of media streams in the second time period from the multiple second transmission equipment;

the first transmission equipment determines another group of media streams of which the audio characteristic values meet set conditions;

the first transmission equipment modifies the synchronous source identifier corresponding to the other group of media streams into the synchronous source identifier corresponding to the group of media streams;

and the first transmission device sends the modified other group of media streams to the playing device.

5. The method of claim 3, wherein the receiving, by the first transmission device, the first subscription signaling sent by the playback device, comprises:

the first transmission device receives at least two second subscription signaling respectively sent by the playing device, wherein the at least two second subscription signaling respectively correspond to at least two paths of media streams in the multi-path media streams;

the first transmission device sends session description protocol information corresponding to the at least two media streams to the playing device respectively so that the playing device processes the received at least two session description protocol information;

the first transmission device receives a first subscription signaling sent by the playing device, wherein the first subscription signaling is sent by the playing device when the playing device determines that the number of the at least two paths of media streams has reached the set threshold value and determines that media streams needing subscription exist;

the acquiring, by the first transmission device, audio feature values corresponding to the multiple media streams in a first time period from the multiple second transmission devices respectively based on the first subscription signaling includes:

the first transmission equipment respectively acquires audio characteristic values corresponding to the at least two media streams in a first time period from second transmission equipment corresponding to the at least two media streams based on the first subscription signaling;

the session description protocol information corresponding to the group of media streams is further configured to enable the playing device to delete the processing result of the session description protocol information corresponding to the at least two media streams.

6. The method of claim 5, further comprising:

the first transmission device receives another second subscription signaling sent by the playing device, where the another second subscription signaling corresponds to any one of the multiple media streams except the at least two media streams;

the first transmission device sends session description protocol information corresponding to any one of the media streams to the playing device, wherein the playing device discards the session description protocol information corresponding to any one of the media streams after receiving the session description protocol information corresponding to any one of the media streams.

7. A streaming media transmission method is characterized in that the method is applied to a first transmission device and a plurality of second transmission devices, the first transmission device is accessed with a playing device, and a plurality of channels of media streams subscribed by the playing device are respectively stored in the plurality of second transmission devices; the method comprises the following steps:

8. A streaming media transmission method is applied to a playing device, and the method comprises the following steps:

9. The method of claim 8, wherein sending a first subscription signaling to the first accessing transmission device when determining that the number of subscribed multi-path media streams is greater than a set threshold value comprises:

respectively sending at least two second subscription signaling to the first transmission device, wherein the at least two second subscription signaling respectively correspond to at least two media streams in the multi-path media streams;

if receiving session description protocol information respectively corresponding to the at least two paths of media streams respectively sent by the first transmission equipment, processing the received at least two session description protocol information;

if the number of the at least two media streams is determined to reach the set threshold value and it is determined that there are media streams to be subscribed, sending a first subscription signaling to the first transmission device, so that the first transmission device obtains audio feature values corresponding to the at least two media streams in a first time period from second transmission devices corresponding to the at least two media streams, and determining a group of media streams whose audio feature values meet set conditions;

after receiving the session description protocol information corresponding to the group of media streams sent by the first transmission device, the method further includes:

and deleting the processing result of the session description protocol information corresponding to the at least two media streams.

10. The method of claim 9, further comprising:

sending another second subscription signaling to the first transmission device, where the another second subscription signaling corresponds to any one of the media streams except the at least two media streams in the multi-path media stream;

receiving session description protocol information corresponding to any one path of media stream sent by the first transmission equipment;

and discarding the session description protocol information corresponding to any path of media stream.

11. A transmission apparatus, comprising: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the streaming method of any of claims 1 to 6, or the streaming method of claim 7.

12. A playback device, comprising: the device comprises a memory, a processor, a communication interface and a player; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the streaming media transmission method of any of claims 8 to 10.