CN115942021A

CN115942021A - Audio and video stream synchronous playing method and device, electronic equipment and storage medium

Info

Publication number: CN115942021A
Application number: CN202310142484.2A
Authority: CN
Inventors: 郭晓; 李向荣; 刘杨; 郑强; 吕亚东; 王栋
Original assignee: Cctv New Media Culture Media Beijing Co ltd
Current assignee: Cctv New Media Culture Media Beijing Co ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-04-07
Anticipated expiration: 2043-02-17
Also published as: CN115942021B

Abstract

The present disclosure relates to a method and a device for synchronously playing audio and video streams, an electronic device and a storage medium, wherein the method for synchronously playing audio and video streams comprises the following steps: acquiring an audio stream from a first server; receiving, from the first service end, a video delay time, determined by a second terminal, for playing a video stream by the second terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal; determining an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal; and playing the audio stream synchronously with the video stream played by the second terminal based on the video delay time and the audio delay time. According to the audio and video stream synchronous playing method and device, the electronic equipment and the storage medium, the problem that synchronous audio listening is difficult to obtain when videos are played in a large screen is solved, and synchronous playing of sound and pictures among multiple terminals can be achieved.

Description

Audio and video stream synchronous playing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of multimedia and communication technologies, and in particular, to a method and an apparatus for synchronously playing audio and video streams, an electronic device, and a storage medium.

Background

With the development of information and communication technology, the way of information transmission is diversified, and the information transmission of video is usually realized by means of video pictures and audio sounds.

In existing audio and video playback schemes, video pictures and audio sounds are typically played on the same device, however, in some scenarios, when the audio of such a device is not readily available, the communication of video information may be affected. For example, in a car theater, a user may stop the vehicle near the car theater and listen to movie sound through an outdoor audio device or an in-car fm radio; for another example, in a large city screen, a user may view video content played on the large city screen outdoors.

However, when playing video in such a large screen as a car theater, a city large screen, it may be difficult to obtain a good audio listening experience, for example, in a car theater, the sound quality of outdoor sound or in-car fm radio is poor, and the original spatial audio effect in the movie cannot be restored; for a large urban screen, the user may not be able to listen to the accompanying sound of the video synchronously in a noisy urban environment.

Disclosure of Invention

The present disclosure provides a method and an apparatus for synchronously playing audio and video streams, an electronic device, and a storage medium, so as to at least solve the problem in the related art that it may be difficult to obtain synchronous audio listening when playing video in a large screen. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an audio and video stream synchronous playing method applied to a first terminal, the audio and video stream synchronous playing method including: acquiring an audio stream from a first server, wherein the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first server, the audio stream comprises audio data and a standard timestamp, and the video stream comprises video data and the standard timestamp; receiving, from the first service end, a video delay time, determined by a second terminal, for playing a video stream by the second terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal; determining an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal; and playing the audio stream synchronously with the video stream played by the second terminal based on the video delay time and the audio delay time.

According to a second aspect of the embodiments of the present disclosure, there is provided an audio and video stream synchronous playing method applied to a first service end, where the audio and video stream synchronous playing method includes: transmitting an audio stream to a first terminal, and transmitting a video stream to a second terminal, wherein the video stream corresponds to the audio stream, the audio stream comprises audio data and standard time stamps, and the video stream comprises video data and the standard time stamps; and sending the video delay time of the second terminal for playing the video stream, which is determined by the second terminal, to the first terminal, so that the first terminal can play the audio stream synchronously with the second terminal based on the video delay time and the audio delay time of the first terminal for playing the audio stream, wherein the video delay time is the difference between the standard timestamp and the local timestamp of the second terminal, and the audio delay time is the difference between the standard timestamp and the local timestamp of the first terminal.

According to a third aspect of the embodiments of the present disclosure, there is provided an audio and video stream synchronous playing device applied to a first terminal, the audio and video stream synchronous playing device including: an obtaining unit, configured to obtain an audio stream from a first server, where the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first server, the audio stream includes audio data and a standard timestamp, and the video stream includes video data and the standard timestamp; a receiving unit configured to receive, from the first service end, a video delay time for playing a video stream by a second terminal, where the video delay time is determined by the second terminal, and the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal; a determining unit configured to determine an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal; a playing unit configured to play the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an audio/video stream synchronization playing apparatus, which is applied to a first service end, the audio/video stream synchronization playing apparatus including: a stream transmission unit configured to transmit an audio stream to a first terminal and a video stream to a second terminal, wherein the video stream corresponds to the audio stream, the audio stream includes audio data and a standard time stamp, and the video stream includes video data and the standard time stamp; a time sending unit configured to send a video delay time, determined by a second terminal, for playing a video stream by the second terminal to the first terminal, so that the first terminal plays the audio stream synchronously with the playing of the video stream by the second terminal based on the video delay time and an audio delay time for playing the audio stream by the first terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal, and the audio delay time is a difference between the standard timestamp and the local timestamp of the first terminal.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions, wherein the processor-executable instructions, when executed by the processor, cause the processor to perform the audio video stream synchronized playback method according to an exemplary embodiment of the present disclosure.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions of the computer-readable storage medium, when executed by a processor of a server, enable the server to execute the audio/video stream synchronous playing method according to the exemplary embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the first terminal may receive the video delay time on the second terminal from the first service terminal, and may synchronize its played audio stream with the video stream played on the second terminal based on the audio delay time on the first terminal and the video delay time on the second terminal, implementing a sound-picture synchronization playing scheme between the multiple terminals, which enables the corresponding audio content to be synchronously heard through the first terminal even when the video is played in the second terminal such as a large screen.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic diagram of an example of a multi-terminal audio and video playback scenario, according to an example embodiment of the present disclosure.

Fig. 2 is a schematic flow chart diagram of an example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic flowchart of a step of acquiring an audio play address in an audio play method according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic diagram of another example of a multi-terminal audio and video playback scene according to an exemplary embodiment of the present disclosure.

Fig. 5 is a schematic flowchart of a step of playing an audio stream in synchronization with a video stream in an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 6 is a schematic flowchart of an example of an audio-video stream synchronous playing method according to an exemplary embodiment of the present disclosure.

Fig. 7 is a schematic diagram of a manual adjustment of a painting and sound synchronization interface of a first terminal according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic flowchart of a step in which a terminal plays an audio stream in an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic diagram illustrating a decoding flow of an ADM audio file at a third terminal according to an exemplary embodiment of the present disclosure.

Fig. 10 is a schematic diagram illustrating a decoding flow at a first terminal for an ADM audio file according to an exemplary embodiment of the present disclosure.

Fig. 11 is a schematic diagram of an overall flow of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 12 is a schematic flowchart of another example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 13 is a schematic flowchart of another example of an audio-video stream synchronized playback method according to an exemplary embodiment of the present disclosure.

Fig. 14 is a schematic flowchart of still another example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 15 is a schematic block diagram of an example of an audio playback apparatus according to an exemplary embodiment of the present disclosure.

Fig. 16 is a schematic block diagram of another example of an audio playback apparatus according to an exemplary embodiment of the present disclosure.

Fig. 17 is a schematic block diagram of still another example of an audio playback apparatus according to an exemplary embodiment of the present disclosure.

Fig. 18 is a schematic block diagram of an example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure.

Fig. 19 is a schematic block diagram of another example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure.

Fig. 20 is a schematic block diagram of an electronic device according to an example embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in other sequences than those illustrated or described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; and (3) comprises A and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; and (3) executing the step one and the step two.

In view of the foregoing, an audio playing method, an audio playing apparatus, an audio and video stream synchronous playing method, an audio and video stream synchronous playing apparatus, an electronic device, and a computer-readable storage medium according to exemplary embodiments of the present disclosure will be provided below with reference to the accompanying drawings.

It should be noted that, although the playing scenes of the car cinema and the urban large screen are described as an example, it should be understood that the application scenes of the audio playing method, the audio playing device, the audio and video stream synchronous playing method, and the audio and video stream synchronous playing device according to the present disclosure are not limited thereto, and may also be applied to other application scenes in which any multiple terminals respectively play the video stream and the audio stream.

Fig. 1 is a schematic diagram of a multi-terminal audio and video playback scenario according to an exemplary embodiment of the present disclosure.

As shown in fig. 1, the audio and video playing scenario may include a first terminal 110, a second terminal 120, and a first service end 210, the first terminal 110 may be, for example, but not limited to, a portable terminal, such as a smart phone, a tablet computer, a notebook computer, a digital assistant, a wearable device, and the like, and the first terminal 110 may include software, such as an application client, running on a physical device for playing audio. The second terminal 120 can be, for example, but not limited to, a city large screen, a large screen of a car theater, etc., and the second terminal 120 can include software running on a physical device, such as an application client, for playing video.

The first service end 210 may be, for example but not limited to, a server for pushing a video stream and/or an audio stream. Here, the first server 210 may include an independently operating server, a distributed server, or a server cluster composed of a plurality of servers.

In addition, normal transmission of data may be ensured between any two of the first terminal, the second terminal and the first service end through establishing a communication connection, for example, through the network 300, where the communication connection may be any manner, such as cellular data, WIFI, bluetooth, and the like, and the disclosure is not particularly limited thereto.

According to an exemplary embodiment of the present disclosure, a first terminal may acquire a video identity of a video stream being played by a second terminal, then may transmit the video identity to a first server, and acquire an audio playing address of an audio stream corresponding to the video identity from the first server. The first terminal can acquire the audio stream from the first server based on the audio playing address and play the audio stream synchronously with the video stream played by the second terminal.

According to a first aspect of exemplary embodiments of the present disclosure, there is provided an audio playing method that may be applied to a first terminal, for example, the first terminal as shown in fig. 1.

As shown in fig. 2, an audio playing method according to an exemplary embodiment of the present disclosure may include the steps of:

in step S210, a video identity of a video stream being played by the second terminal may be obtained.

Here, the second terminal may be, for example, a large screen in a city or a large screen in a car theater, but it is not limited thereto, and it may be any video playback terminal. The user of the first terminal can view the video content played on the second terminal. In addition, the video stream played by the second terminal may be in any format, which may include pictures and audio, or may include only pictures without audio, and accordingly, the second terminal may have a display and a speaker, or may have only a display without a speaker. The video stream may be, for example, but not limited to, a movie, a television program, and the like.

The video identity may be a unique identity corresponding to the video stream, which may be, for example, but not limited to, a two-dimensional code or the like.

As an example, a user of a first terminal may scan a two-dimensional code displayed on a second terminal, such as a car cinema screen or a city large screen, using a code scanning function of a client on the first terminal.

In step S220, the video identity may be sent to the first server, and an audio playing address of an audio stream corresponding to the video identity may be obtained from the first server.

In this step, the first terminal may send the obtained video identity to a first service end, which may be, for example and without limitation, a server for pushing a video stream to the second terminal and/or pushing an audio stream to the first terminal. In an example, the first server may include a single server for pushing both the video stream and the audio stream corresponding to the video stream; in another example, the first server may include a first sub-server and a second sub-server, the first sub-server may be used to push an audio stream corresponding to a video stream, and the second sub-server may be used to push a video stream.

Here, the Audio stream may be, for example and without limitation, a spatial Audio live stream generated based on an Audio Definition Model (ADM), and for such a spatial Audio live stream, may be a panoramic Audio file, which may obtain a better Audio playing effect. However, the present disclosure is not so limited and the audio stream may be in other formats.

In addition, the video stream and the audio stream described in the present disclosure may include, but are not limited to, a live video program and a corresponding audio, a prerecorded video program and a corresponding audio, and the like, which are not particularly limited by the present disclosure.

The first service end can identify which video stream is played on the second terminal in response to receiving the video identity sent by the first terminal, so that an audio stream corresponding to the video stream can be determined, and an audio playing address of the audio stream can be returned to the first terminal. Under the condition that the first service end comprises a first sub-service end and a second sub-service end, the first terminal can send the video identity to the first sub-service end, and the first sub-service end sends the audio playing address to the first terminal.

As an example, as shown in fig. 3, in this step S220, the first terminal may acquire an audio play address by:

in step S310, a first play request for acquiring play information of the second terminal may be generated based on the video identity, where the first play request includes the video identity and the location information of the first terminal.

Here, the first terminal may generate the first play request based on the video identity of the video stream and a current location of the first terminal.

In step S320, the first playing request may be sent to the first service end, so that the first service end determines, based on the first playing request, a second terminal closest to the first terminal.

Here, since the first play request includes the location information of the first terminal, the first server may determine a second terminal closest to the first terminal based on the first play request. In this way, in a scenario of multiple second terminals, it can be identified which second terminal is specifically viewed by the user of the first terminal, so that the audio stream can be accurately pushed.

In particular, in some scenarios, there may be multiple second terminals playing the same video stream, the locations of these second terminals may be fixed, and the first server may acquire or know the locations of these second terminals. The second terminal may play the same video stream synchronously or asynchronously, for example, in an application scenario of a large screen in a city, the same program may be played on multiple large screens in the city.

Under the condition of asynchronous playing, when the first server pushes the audio stream to the first terminal, the first server needs to know which second terminal the user of the first terminal is watching at present, and because the video identity corresponds to the video stream and is irrelevant to the second terminal, the first server needs to know which second terminal the user is watching besides determining which video stream the user is watching through the video identity, so as to determine the video stream playing progress of the second terminal, and thus push the audio stream matched with the playing progress of the second terminal. In the case of synchronous playing, although the playing schedules of the video streams pushed by the first server to the plurality of second terminals are the same, due to network delay, communication failure, and the like, the time of the video streams received by different second terminals may be different, so that there is a difference in the playing schedules, and therefore the first server also needs to know the second terminal being watched by the user, so as to accurately push the audio stream matched with the playing schedule of the second terminal in consideration of the network delay and communication failure of the second terminal.

Here, since it is difficult for the first server to directly determine the second terminal that the user is watching, according to an exemplary embodiment of the present disclosure, the first terminal may send the video identifier and the current location information of the first terminal together, so that the first server determines the second terminal that the user is most likely and most easily watching according to the location of the first terminal. Therefore, the first terminal does not need to acquire the position of the second terminal first and then send the position to the first service end, the operation of the first terminal can be simplified, and the first service end can determine the second terminal which is watched by the user with lower communication cost.

In addition, in a case that the first service includes a first sub-service and a second sub-service, the first play request may be sent to the first sub-service for pushing the video stream, and the first sub-service may establish communication with the second terminal to push the video stream thereto and receive feedback information from the second terminal, so that the first terminal may directly request the video play information related to the second terminal from the first sub-service.

In step S330, video playing information may be received from the first service.

The first server can determine the video playing information of the corresponding second terminal according to the first playing request and send the video playing information to the first terminal. Here, the video playing information may include an identification of the second terminal and an identification of an audio stream corresponding to the video stream, so that the first terminal may determine which second terminal the user is watching in particular and which audio stream of the video content the user is watching after receiving the video playing information.

In addition, the video playing information may further include, but is not limited to, a name of the second terminal, a video time delay for the second terminal to play the video stream, and the like. Here, the video time delay of the second terminal playing the video stream may be used by the first terminal or a third terminal (to be described below) to play the audio stream in synchronization with the second terminal playing the video stream, which will be described in detail below.

Taking a playing scene of an automobile cinema/city large screen as an example, the first terminal may start the position obtaining authority, so that a first playing request for obtaining current automobile cinema/city large screen information may be initiated to the first service end by carrying the position information of the first terminal, and after receiving the effective request, the first service end may return video playing information of the automobile cinema/city large screen to the first terminal, which may include parameters such as a large screen name, a large screen ID, a sound and picture delay duration, and an audio live stream ID.

In step S340, a second play request for obtaining an audio play address may be generated based on the video play information.

Here, the second play request may include an identification of the second terminal and an identification of the audio stream.

In step S350, the second play request may be sent to the first service, and the audio play address is obtained from the first service.

Specifically, the first terminal may send the second play request to the first service end to request the first service end to obtain an audio play address for playing the audio stream. Here, the audio stream may be a spatial audio live stream, and the audio play address may be, for example, a play address of the spatial audio live stream.

In addition, in a case that the first service includes a first sub-service and a second sub-service, the second play request may be sent to the second sub-service for pushing the audio stream, and the second sub-service may establish communication with the first terminal to push the audio stream thereto, so that the first play request and the second play request of the first terminal may be sent to different sub-services respectively, thereby implementing a video play and audio play split management at the first service, and relieving the pressure of the first service, and in this way, the first sub-service and the second sub-service may be managed by different multimedia providers/managers, for example, the provider/manager of the video stream may be different from the provider/manager of the audio stream, thereby allowing multi-party cooperation to be implemented, giving a more flexible video stream and audio stream play scheme, and also facilitating targeted maintenance of the video or audio. However, the exemplary embodiments of the present disclosure are not limited thereto, and the first server may also be one server for pushing both a video stream and an audio stream and for receiving both a first play request and a second play request.

In step S230, an audio stream may be obtained from the first server based on the audio playing address, and the audio stream may be played in synchronization with the video stream played by the second terminal.

In this step, the first terminal may obtain the audio stream from the first server based on the audio play address, for example, may obtain the audio stream from the second sub-server. In this way, the audio stream can be played on the first terminal, and the video stream can be played on the second terminal, so that a scheme of watching videos and audios by different terminals is realized, and a user can listen to the sound corresponding to the videos played on a large screen through the first terminal even in a scene such as an automobile cinema or a large screen in a city.

In addition, as described above, the Audio stream according to the embodiment of the present disclosure may be a spatial Audio live stream, which may be generated based on an Audio Definition Model (ADM), defined in the recommendation ITU-R bs.2076-1 issued by the international telecommunications union, describing the structure of an Audio metadata Model, may accurately describe the format and content of an Audio file, and ensure compatibility of all systems. The model specifies how to generate a standard defined panoramic audio file. The realization of the spatial audio live stream in terms of playing and rendering is different from that of ordinary audio, so that the audio stream played according to the exemplary embodiment of the present disclosure can obtain better audio playing effect. However, in the existing audio playing technology, a design scheme for rendering and playing spatial audio by a sound system such as a vehicle-mounted terminal is lacking, and for this, a specific playing and rendering manner of an audio stream will be described in detail below.

In the above, the first terminal may further include a third terminal and a second server in the video stream and audio stream playing scenario according to an exemplary embodiment of the present disclosure.

Specifically, as shown in fig. 4, the audio and video playing scene may include the first terminal 110, the second terminal 120, the third terminal 130, the first service end 210 and the second service end 220, where the first terminal 110, the second terminal 120 and the first service end 210 have been described above in detail, and are not described here again. The third terminal 130 can be, for example, but not limited to, a car terminal, and it can also include software running on a physical device, such as an application client, for playing audio. The second server 220 may be, for example, but not limited to, a server for managing user information and user play information. Here, the second server 220 may include a server operating independently, a distributed server, or a server cluster composed of a plurality of servers.

The normal transmission of data may be ensured by establishing a communication connection, such as the network 310, between any two of the first terminal, the second terminal, the third terminal, the first server and the second server, where the communication connection may be any manner, such as cellular data, WIFI, bluetooth, and the like, and the disclosure is not particularly limited thereto.

In this example, the first terminal may further send the audio playing information to the second server, so that the second server sends the audio playing information to the third terminal, so as to allow the third terminal to obtain the audio stream from the first server based on the audio playing information, and play the audio stream synchronously with the video stream played by the second terminal.

Here, the audio play information may include the audio play address described above, or include both the audio play address and the audio play progress of the first terminal. In addition, the audio playing information may further include the video playing information received by the first terminal from the first service end, so that the third terminal can acquire the playing condition at the second terminal.

Specifically, the first terminal, such as the portable terminal, may upload the obtained address of the live audio stream, the audio playing progress (if playing has started) of the first terminal, and parameters such as the user account to the second server for storage, and after an audio playing client on a third terminal, such as the in-vehicle terminal, logs in using the same user account, the third terminal may obtain the live audio stream and the playing progress (if playing has started) from the second server. If the third terminal obtains the address and the playing progress of the audio live stream through the above mechanism, the third terminal can continue to play the audio stream synchronously with the playing of the video stream by the second terminal similarly to the first terminal. Therefore, the user can select to play the audio stream through the first terminal or the third terminal, the audio playing under two scenes of walking or driving a vehicle is facilitated, and multiple selections of multi-terminal playing are provided for the user.

According to the above audio playing scheme, the first terminal may send the video identity of the video stream played by the second terminal to the first server, and obtain the audio playing address of the audio stream corresponding to the video stream from the first server, so that the audio stream may be played on the first terminal, so that a user can listen to the audio stream played on the first terminal while watching the video stream played by the second terminal, thereby implementing cooperative playing among multiple terminals, and enabling the user to clearly listen to the corresponding audio content through the first terminal even when playing video in the second terminal, such as a large screen.

Having described the scheme in which the multi-terminal plays the audio stream according to the exemplary embodiment of the present disclosure, a specific process of playing the audio stream in synchronization with the video stream and a specific scheme in which the terminal plays and renders the audio stream will be specifically described below. It should be noted that the synchronized playback scheme and the audio stream playback and rendering scheme described below can be applied to the first terminal as well as the third terminal.

In an example, the audio stream includes audio data and a standard time stamp shared with the video stream, and as shown in fig. 5, the step of playing the audio stream synchronously with the playing of the video stream by the second terminal may include the steps of:

in step S510, the audio stream may be decoded, resulting in a standard timestamp.

Here, the standard time stamp is a time stamp added when the audio stream and the video stream are generated, and the audio stream and the video stream can share the standard time stamp, so that when played in accordance with the standard time stamp, the sound-picture synchronization of the video picture and the audio sound can be ensured.

Specifically, in this step, in the case where the audio stream is acquired by the first terminal or the third terminal, the first terminal or the third terminal may perform a decapsulation process on the audio stream to generate a data object, where the data object may include a standard timestamp of the current data frame and audio data, and the audio data is compressed data to be decoded. A corresponding audio decoder may be selected to decode the compressed data, and Pulse Code Modulation (PCM) data is generated through the processing of the decoder, and the generated PCM data enters the buffer queue together with the previously parsed standard timestamp.

In step S520, a difference between the standard timestamp and the local timestamp may be processed to obtain an audio delay time.

In this step, the first terminal or the third terminal may perform timestamp alignment. Specifically, the audio decoder of the first terminal or the third terminal may decode the acquired audio stream, such as an ADM audio stream, the decoder continuously acquires PCM data and standard time stamps from the buffering queue, and the standard time stamps acquired by decoding may be subjected to difference processing with the local time stamps to obtain an audio delay time Δ T, that is, Δ T = standard time stamp-local time stamp of audio decoding.

In step S530, the audio stream may be played in synchronization with the second terminal playing the video stream based on the audio delay time.

In this step, in response to the audio delay time being greater than 0, the audio stream may be played after waiting for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time may be discarded from the audio stream, and the audio stream after discarding the audio data may be played.

Specifically, if Δ T > 0, the audio decoding work thread of the first terminal or the third terminal can be made to rest for a time equal to Δ T, in which case, the video stream played by the second terminal can be considered. If Δ T < 0, audio data of the same length as Δ T may be discarded from the already decoded buffer queue.

In another example, the audio stream can be played synchronously with the video stream according to the audio and video stream synchronous playing method shown in fig. 6. Here, it should be noted that the audio and video stream synchronous playing method shown in fig. 6 may be used to implement playing an audio stream synchronously with a video stream in the audio playing method shown in fig. 2, but is not limited thereto, and may also be used in other scenes where a video stream and an audio stream are played synchronously or may be implemented separately.

The audio and video stream synchronous playing method shown in fig. 6 may be applied to the first terminal, and specifically, the method may include the following steps:

in step S610, an audio stream may be acquired from the first server, where the audio stream corresponds to a video stream being played by the second terminal, the second terminal receives the video stream from the first server, the audio stream includes audio data and standard timestamps, and the video stream includes video data and standard timestamps.

As an example, the first terminal may obtain an audio stream from the first server by the method shown in fig. 2.

Specifically, as shown in fig. 2, the first terminal may obtain the audio stream from the first server by: step S210, acquiring a video identity of a video stream being played by a second terminal; step S220, sending the video identity to a first server, and acquiring an audio playing address of an audio stream corresponding to the video identity from the first server; step S230, acquiring an audio stream from the first server based on the audio playing address.

Here, the specific implementation of each step has been described in detail in the foregoing, and can be implemented in the same manner as described in the foregoing, so that the detailed description is omitted. However, the present exemplary embodiment is not limited thereto, and the first terminal may also acquire the audio stream to be played in synchronization with the video stream in other ways.

In step S620, a video delay time for the second terminal to play the video stream, which is determined by the second terminal, may be received from the first service terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal.

Specifically, the audio/video encoder of the first server may add the standard timestamp to the video stream and the audio stream respectively during encoding, and then perform video streaming and audio streaming, where the video stream is streamed to a second terminal such as a large city screen or an automobile cinema player, and the audio stream is streamed to the first terminal by the above manner of acquiring the identity. In this process, when the video decoder of the second terminal decodes the video stream, the second terminal acquires the standard timestamp, and may perform difference processing on the standard timestamp and the local timestamp of the second terminal to obtain the video delay time delayTime1. The first service end may continuously obtain the video delay time delayTime1 from the second terminal through the timer according to a predetermined first timing task.

Meanwhile, the first terminal may also obtain the video delay time delayTime1 from the first server every other synchronization processing period according to a predetermined second timing task by using the timer. Here, the time interval of the first timing task may be the same as the time interval of the second timing task to enable accurate realization of the transmission of the video delay time from the second terminal to the first server to the first terminal and to synchronize the decoding of the first terminal with the second terminal.

In step S630, an audio delay time for the first terminal to play the audio stream may be determined based on a difference between the standard timestamp and the local timestamp of the first terminal.

In this step, the audio decoder of the first terminal may decode the audio stream, and continuously obtain the PCM data and the standard timestamp from the decoded audio frame buffer sequence, so that the difference between the obtained standard timestamp and the local timestamp may be processed to obtain an audio delay time delayTime2, that is, delayTime2= audio decoding timestamp — local timestamp.

In step S640, the audio stream may be played in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time.

In this step, the first terminal may perform difference processing on the audio delay time delayTime2 and the video delay time delayTime1 to obtain a sound-picture asynchronous time Δ T, that is, Δ T = delayTime2-delayTime1, where the sound-picture asynchronous time Δ T is time that needs to be adjusted.

Specifically, in response to the audio-visual asynchronous time being greater than 0 (i.e., Δ T > 0), the audio stream may be played after waiting a time equal to the audio delay time, e.g., the audio decoding worker thread may rest for a time equal to Δ T; in response to the sound-picture asynchronism time being less than 0 (i.e., Δ T < 0), audio data of a time equal to the audio delay time may be dropped from the audio stream, and the audio stream after the audio data is dropped may be played, e.g., audio data of a time equal to Δ T may be dropped from the buffered audio frames. In this way, synchronous playing of the audio stream and the video stream can be realized, so that a user can synchronously listen to the corresponding audio stream from the first terminal while watching the video stream played by the second terminal.

Furthermore, according to an exemplary embodiment of the present disclosure, the time interval of the second timing task may be adjusted in real time during the process that the first terminal receives the video delay time from the first service terminal.

Specifically, under the condition that audio delay time and video delay time are subjected to difference processing to obtain audio-video asynchronous time, the video delay time can be received from the first service end at a first time interval in response to the audio-video asynchronous time meeting a first preset condition; receiving video delay time from the first service terminal at a second time interval in response to the audio-video asynchronous time meeting a second preset condition; and receiving the video delay time from the first service terminal at a third time interval in response to the audio-video asynchronous time meeting a third preset condition.

Here, the first preset condition may be: the sound-picture asynchronous time is in a first preset interval, or the frequency that the sound-picture asynchronous time is outside the first preset interval is lower than the preset frequency within the preset time; the second preset condition may be: the sound-picture asynchronous time obtained twice continuously is greater than or equal to the upper limit value of a first preset interval, or is less than or equal to the lower limit value of the first preset interval; the third preset condition may be: the sound and picture asynchronous time obtained twice continuously is larger than or equal to a first threshold value, or is smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, and the second threshold value is smaller than the lower limit value of the first preset interval.

Here, the first time interval may be greater than the second time interval, and the second time interval may be greater than the third time interval.

Specifically, in the data transmission process, due to reasons such as network jitter, the absolute value of the audio-video asynchronous time Δ T may be unstable, and if the absolute value of the audio-video asynchronous time Δ T is found to be large during a certain time of synchronous processing, it indicates that the audio-video asynchronous experience of the first terminal is obvious after the last time of synchronous processing before the current time of synchronous processing.

For this, the first preset interval may represent an interval in which the audio and the video are not synchronized and are not easily perceived by a human, and may be-100 ms < [ delta ] T < 25ms, for example; when the audio-video asynchronous time delta T is greater than or equal to the upper limit value of the first preset interval or is less than or equal to the lower limit value of the first preset interval, for example, the delta T is greater than or equal to 25ms or less than or equal to-100 ms, the audio-video asynchronism is easy to perceive; and under the condition that the audio-video asynchronous time DeltaT is greater than or equal to a first threshold value or is less than or equal to a second threshold value, for example, deltaT is greater than or equal to 90ms or DeltaT is less than or equal to-185 ms, the unsynchronized audio and video can be obviously felt or even can not be accepted.

In order to solve the above problem, according to an exemplary embodiment of the present disclosure, a mechanism for adaptively adjusting a synchronization processing period may be added to adjust a time interval for acquiring the video delay time delayTime1 from the server.

For example, in the normal case, -100ms < Δ T < 25ms, or Δ T ≧ 25ms or Δ T ≦ -100ms at an occasional time, the period is 20s by default, i.e., the first terminal acquires video delay time delayTime1 from the service end every 20 s. If more than two times of continuous synchronous processing are more than or equal to 25ms or less than or equal to minus 100ms, the period is adaptively adjusted to 10s, namely the first terminal acquires the video delay time delayTime1 from the service end every 10 s. If more than two times of continuous synchronous processing are more than or equal to 90ms or less than or equal to-185 ms, the period is adaptively adjusted to 5s, that is, the first terminal acquires the video delay time delayTime1 from the service end every 5 s.

According to the synchronous playing scheme, the first terminal can receive the video delay time on the second terminal from the first service terminal, and the audio stream played by the first terminal can be synchronized with the video stream played on the second terminal based on the audio delay time on the first terminal and the video delay time on the second terminal, so that the sound-picture synchronous playing scheme among multiple terminals is realized, corresponding audio contents can be synchronously received and heard through the first terminal even when the video is played in the second terminal such as a large screen, and the problem that synchronous audio listening can be difficultly obtained when the video is played in the large screen is solved.

In addition, according to the exemplary embodiment of the present disclosure, a first terminal manual compensation mechanism may be added on the basis of a system automatic synchronization mechanism.

Specifically, in response to receiving an audio waiting instruction input by a user, the audio stream may be played after waiting for a duration corresponding to the waiting time in the audio waiting instruction; in response to receiving an audio discarding instruction input by a user, audio data with a duration corresponding to the discarding time in the audio discarding instruction may be discarded from the audio stream, and the audio stream after discarding the audio data may be played.

Here, the audio wait instruction may be an instruction to instruct local audio playback to wait for a specified wait time, and the audio discard instruction may be an instruction to discard audio data of the specified discard time from the local audio playback.

For example, as shown in fig. 7, the audio waiting instruction and the audio discarding instruction may be respectively input to the first terminal through a key control in the interactive interface of the first terminal.

Specifically, as shown in fig. 7, if the user subjectively experiences that the video lags behind the audio, the "+" button may be clicked, and the audio playing progress may be manually adjusted in a manner that the local audio playing is subjected to waiting processing, where the adjusted value is the time that the local audio playing needs to wait; if the user subjectively feels that the audio lags behind the video, the user can click the key-in and manually adjust the audio playing progress, the adjustment mode is to discard the audio decoding data to be played for a certain time length, and the adjusted value is the time length of the audio data to be discarded in the local audio playing.

As an example, the waiting time and the discarding time may be fixed and the same, for example in the example shown in fig. 7, the granularity of the adjusted time is 50ms, i.e. the "+" key or the "-" key may wait 50ms or discard 50ms per click.

Several examples of implementing audio and video stream synchronous playing are described above, and specific schemes for playing and rendering audio streams will be described below.

As an example, the accompanying recording of a second terminal, such as a city large screen or a car cinema, may be a spatial audio file of the ADM standard of 10 channels, each channel containing channel position information described using format metadata. The spatial audio file is stored in the first service terminal and is obtained by the first terminal. Based on such a spatial audio file, a three-dimensional sound playing scheme may be employed to play audio.

Specifically, as shown in fig. 8, the first terminal or the third terminal may play the audio stream by:

in step S810, a channel of a speaker for playing an audio stream may be determined.

For example, the audio renderer of the first terminal or the third terminal may first access the speaker drivers of the current terminal to obtain the speaker type, which may have different channels, e.g. the speaker type where the first terminal is stereo, the speaker type where the second terminal, such as a car client, is 5.1 channel or 7.1 channel.

In step S820, rendering of channel orientations may be performed on the audio data parsed from the audio stream, so as to obtain channel metadata of the audio data.

In step S830, spatial position information of each channel audio in the audio data may be determined based on the channel metadata.

In steps S820 and S830, the audio frame whose timestamp is calibrated by the synchronized playback method may enter a renderer to render the channel direction, and the original spatial position information of each channel may be recovered by analyzing the metadata of each channel format.

In step S840, channel conversion may be performed on the channel audio according to the channel and spatial position information of the speaker, so as to obtain a converted channel audio.

In step S850, an audio file adapted to the channel of the speaker may be generated based on the converted channel audio.

In steps S840 and S850, channels such as the original 10 channels containing spatial position information may be converted to channels of the current device speaker array by a channel conversion algorithm to complete the input-output channel conversion, so that a playable PCM file of the adapted device speaker type may be generated.

In step S860, the audio file may be played through the channel of the speaker.

In this step, the PCM file data may be written into a sound card buffer of the terminal for audio playback.

In the case that the audio stream is a spatial audio stream generated based on ADM, playing based on the playing method of the exemplary embodiment of the present disclosure can obtain better playing effect and listening experience. An example of a specific flow of decoding of the ADM audio file by the third terminal and the first terminal will be described in detail below with reference to fig. 9 and 10.

Referring to fig. 9, an ADM rendering program in a Microprocessor (MCU) within the third terminal first reads an ADM audio file, then parses and decodes the ADM audio file, then performs digital audio signal restoration and mixing of each audio object (i.e., audio data) according to the ADM audio file description, and encodes into a digital audio stream (e.g., dolby AC3 (Dolby AC 3) digital audio stream). The encoded audio signal (e.g., dolby AC3 audio signal) is then output to a Digital Sound card (e.g., DAC Sound card) via an integrated circuit built-in audio bus (I2S, inter-IC Sound) Interface, and transmitted to a decoder (e.g., dolby decoder) via a Digital audio transmission Interface (e.g., S/PDIF (Sony/Philips Digital Interface Format) Interface) of the Digital Sound card, decoded by the decoder, and output to an external stereo speaker (e.g., without limitation, 5.1 channels (including FL channel, FR channel, CEN channel, SL channel, SR channel)) via an output Interface (e.g., RAC Interface).

Here, for the ADM audio stream, the third terminal, such as a vehicle-mounted terminal, decodes and renders the ADM audio stream, so that better rendering and playing effects can be achieved compared with the existing vehicle-mounted playing mode, and the vehicle-mounted playing stereo audio is matched with a large-screen video picture, so that the panorama audio video watching of the different-end playing is achieved.

Fig. 10 is a schematic diagram illustrating a decoding flow of an ADM audio file by a first terminal according to an exemplary embodiment of the present disclosure.

Referring to fig. 10, the ADM rendering program in the first terminal first reads the ADM audio file, then parses and decodes the ADM audio file, and then realizes the restoration and mixing of the digital audio signal of each audio object according to the description of the ADM audio file, and encodes the digital audio signal into a digital audio stream (e.g., a WAV format digital audio stream). And then output to a virtual sound card within the mobile client that decodes the digital audio stream for output to the speaker or headphones of the mobile client device (e.g., without limitation, including left channel (L) and right channel (R)). Here, the virtual sound card may be, for example, an Android (Android) sound card or the like.

According to an exemplary embodiment of the present disclosure, the client may further receive virtual scene position information of each audio data from the server, and generate an interactive interface based on the received virtual scene position information of each audio data, wherein an icon corresponding to each audio data is displayed at a virtual scene position corresponding to the audio data in the interactive interface.

As shown in fig. 11, for the second terminal, it may play a video stream, which may be, for example, but not limited to, pushed by the server, and during playing the video stream, a video identity for acquiring an audio stream corresponding to the video stream may be shown in a video picture, for example, a two-dimensional code of a program played in the video stream may be shown.

For the first terminal, the video identity may be obtained by scanning a two-dimensional code, and based on the video identity and the first terminal position information, a request may be made to a server (e.g., a first server) to obtain video playing information of the second terminal, for example, information of a large screen. The first terminal may receive video playing information of the second terminal from the server, and parse out relevant information of the second terminal, where the video playing information may include, but is not limited to, an ID of the second terminal, a sound and picture extension time, an audio stream ID, and the like.

The first terminal can initiate a request for acquiring the spatial audio stream to the server based on the video playing information, and receive an audio stream playing address from the server. The first terminal may acquire the audio stream based on the audio stream playing address, may parse the audio stream, align timestamps of the audio stream and the video stream based on the synchronous playing methods described in the above two examples, and may then play the audio stream based on the above rendering and playing scheme.

In addition, the first terminal may also synchronize the acquired audio stream playing address to a server (for example, a second server), and may also synchronize the playing progress of the first terminal to the server.

For the server, the server may include, but is not limited to, the first server and the second server. The server may send the video playing information to the first terminal based on the request for obtaining the video playing information from the first terminal.

In addition, the server side can also send the audio stream playing address to the first terminal based on the acquisition request of the first terminal for the audio stream. In addition, the server (e.g., the second server) may also store the user play record from the first terminal. In this way, the user play record may be sent to the third terminal in response to a request sent by the third terminal to obtain the user play record (e.g., audio play progress).

For a third terminal, the third terminal may log in the same user account as the first terminal, and the third terminal may send account information of a user logged in the third terminal to the server, and initiate a request for obtaining a play record of the user account to the server, so as to obtain an audio stream play address and an audio play progress from the server, so that an audio stream may be obtained based on the audio stream play address, and the audio stream may be analyzed from an audio frame of the audio play progress, and based on the synchronous play methods described in the two examples, timestamps of the audio stream and the video stream are aligned, and then the audio stream may be played based on the rendering and playing scheme.

The first terminal may be, but is not limited to, a mobile client, the second terminal may be, but is not limited to, a car cinema screen/city large screen, the third terminal may be, but is not limited to, an in-vehicle client, and the server may be, but is not limited to, a server.

According to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, the problem that accompanying sound cannot be listened when a video such as a city large screen is watched can be solved, and the large screen accompanying sound with space audio and sound quality can be listened through a mobile phone or a vehicle; the problem that the sound quality of the film accompanying sound outside the automobile is not high when the automobile cinema is watched can be solved, and the film accompanying sound with the space audio and sound quality can be listened through a mobile phone or a vehicle.

According to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, a mobile client can be allowed to pull a movie ADM format panoramic sound live broadcast stream played on a large screen in a code scanning mode of a terminal such as a mobile phone, and movie spatial audio can be played on the mobile client by rendering metadata of an ADM file and aligning timestamps.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, through a mobile phone/vehicle-mounted client and account content playing synchronous mechanism, the movie/city large-screen panoramic sound accompanying sound can be played in a rendering manner at the vehicle-mounted client. The original panoramic sound accompanying sound of the film can be played by utilizing an on-board multi-channel loudspeaker array (such as a 5.1 sound channel or a 7.1 sound channel), so that the effect of viewing and experiencing the large screen of an automobile cinema/city is greatly improved.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, the rendering and playing application of the spatial audio under the ADM standard in the vehicle-mounted sound system can be realized, and the application scene of the ADM spatial audio and the vehicle-mounted multi-channel sound system is enriched.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the present disclosure, when the audio and the video of the audio and video program are decoded and played at different ends respectively, it is difficult to achieve a good audio and video synchronization effect.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, when the audio and the video of the audio and video program are decoded and played at different terminals respectively, a better audio and video synchronization effect can be realized through some automatic and manual alignment mechanisms of the time stamps in the sequence between the server and the client.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the present disclosure, when the audio stream acquired by the terminal such as the mobile phone through code scanning is played at both ends of the mobile phone and the car, the audio stream and the video stream can be aligned with the timestamp of the streaming media playing of the large screen/cinema, so as to implement the audio and video synchronization technology.

Fig. 12 is a schematic flowchart of another example of an audio playing method according to an exemplary embodiment of the present disclosure. The audio playing method may be applied to the first service end, as shown in fig. 12, and the audio playing method may include:

in step S1210, a video identity of a video stream being played by a second terminal, acquired by a first terminal, may be received from the first terminal.

In step S1220, an audio play address of the audio stream corresponding to the video identity may be transmitted to the first terminal.

In step S1230, the audio stream may be transmitted to the first terminal based on the audio playing address, so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream.

As an example, step S1220 may include the steps of:

receiving a first playing request generated by a first terminal based on a video identity from the first terminal, wherein the first playing request is used for acquiring playing information of a second terminal, and the first playing request comprises the video identity and position information of the first terminal;

determining a second terminal closest to the first terminal based on the first playing request;

sending video playing information to a first terminal, wherein the video playing information comprises an identity of a second terminal and playing parameters of a video stream;

receiving a second playing request generated by the first terminal based on the video playing information from the first terminal, wherein the second playing request is used for acquiring an audio playing address, and the second playing request comprises an identity of the second terminal and an identity of the audio stream;

and sending the audio playing address to the first terminal based on the second playing request.

As an example, the audio playing method may further include: receiving audio playing information sent to the third terminal by the second server from the third terminal; and based on the audio playing information, sending the audio stream to the third terminal so that the third terminal can play the audio stream synchronously with the second terminal playing the video stream. Here, the audio playing information is sent from the first terminal to the second server, and the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal.

By way of example, audio data and a standard time stamp common to the video stream may be included in the audio stream.

In this example, the step of the first terminal or the third terminal playing the audio stream in synchronization with the second terminal playing the video stream may include: decoding the audio stream to obtain a standard timestamp; performing difference processing on the standard timestamp and the local timestamp to obtain audio delay time; and playing the audio stream synchronously with the video stream played by the second terminal based on the audio delay time.

As an example, the step of playing the audio stream in synchronization with the second terminal playing the video stream based on the audio delay time may include: in response to the audio delay time being greater than 0, playing the audio stream after waiting for the audio stream for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

As an example, the first terminal or the third terminal plays the audio stream by: determining a channel of a speaker for playing an audio stream; rendering the sound channel direction of the audio data analyzed from the audio stream to obtain sound channel metadata of the audio data; determining spatial position information of each channel audio in the audio data based on the channel metadata; according to the sound channel and the space position information of the loudspeaker, carrying out sound channel conversion on the sound channel audio to obtain a converted sound channel audio; generating an audio file adapted to the sound channel of the speaker based on the converted sound channel audio; the audio file is played through the audio channel of the speaker.

In this exemplary embodiment, the configurations and executable functions of the first terminal, the second terminal, the third terminal, the first server, the second server, and the like are the same as those of the embodiment described above with reference to fig. 1 to 11, and therefore, the detailed description thereof is omitted here.

Fig. 13 is a schematic flowchart of another example of an audio-video stream synchronized playback method according to an exemplary embodiment of the present disclosure. The audio and video stream synchronous playing method may be applied to the first service end, and the audio and video stream synchronous playing method may be applied to, for example and without limitation, the step of playing the audio stream synchronously with the video stream in the audio playing method described above with reference to fig. 12.

As shown in fig. 13, the audio/video stream synchronous playing method may include the following steps:

in step S1310, an audio stream may be transmitted to the first terminal, and a video stream may be transmitted to the second terminal, wherein the video stream corresponds to the audio stream, the audio stream includes audio data and standard time stamps, and the video stream includes video data and standard time stamps.

In step S1320, the video delay time of the second terminal playing the video stream determined by the second terminal may be sent to the first terminal, so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream based on the video delay time and the audio delay time of the first terminal playing the audio stream, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal, and the audio delay time is a difference between the standard timestamp and the local timestamp of the first terminal.

As an example, the step of the first terminal playing the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time for the first terminal playing the audio stream may include: performing difference processing on the audio delay time and the video delay time to obtain sound and picture asynchronous time; in response to the sound-picture asynchronization time being greater than 0, the audio stream is played after waiting for a time equal to the audio delay time, in response to the sound-picture asynchronization time being less than 0, audio data of the time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

As an example, the audio/video stream synchronous playing method may further include: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; responding to the fact that the audio-video asynchronous time meets a first preset condition, and sending the video delay time to the first terminal at a first time interval; responding to the fact that the audio-video asynchronous time meets a second preset condition, and sending the video delay time to the first terminal at a second time interval; and responding to the audio-video asynchronous time meeting a third preset condition, and sending the video delay time to the first terminal at a third time interval.

Here, the first preset condition is: and the sound-picture asynchronous time is in a first preset interval, or the frequency that the sound-picture asynchronous time is outside the first preset interval in a preset time length is lower than the preset frequency. The second preset condition is as follows: and the sound and picture asynchronous time obtained twice continuously is greater than or equal to the upper limit value of the first preset interval, or is less than or equal to the lower limit value of the first preset interval. The third preset condition is as follows: and the sound and picture asynchronous time obtained twice continuously is greater than or equal to a first threshold value, or is less than or equal to a second threshold value, the first threshold value is greater than the upper limit value of the first preset interval, and the second threshold value is less than the lower limit value of the first preset interval.

Here, the first time interval is greater than the second time interval, which is greater than the third time interval.

As an example, the audio/video stream synchronous playing method may further include: the first terminal responds to the received audio waiting instruction input by a user, and plays the audio stream after waiting for a time length corresponding to the waiting time in the audio waiting instruction; the first terminal responds to the audio discarding instruction input by the user, discards the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and plays the audio stream after the audio data is discarded.

As an example, the audio and video stream synchronous playing method may further include: receiving a video identity of a video stream being played by a second terminal, which is acquired by a first terminal, from the first terminal; sending an audio playing address of the audio stream corresponding to the video identity to the first terminal; and executing the step of transmitting the audio stream to the first terminal based on the audio playing address.

Fig. 14 is a schematic flowchart of still another example of an audio playing method according to an exemplary embodiment of the present disclosure. The audio playing method can be applied to the third terminal.

As shown in fig. 14, the audio playing method may include the steps of:

in step S1410, audio playing information may be received from the second server, where the audio playing information is sent from the first terminal to the second server, the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, the audio playing address is obtained by the first terminal, the first terminal obtains a video identifier of a video stream being played by the second terminal, sends the video identifier to the first server, and obtains an audio playing address of the audio stream corresponding to the video identifier from the first server.

In step S1420, an audio stream may be acquired from the first server based on the audio play information, and the audio stream may be played in synchronization with the second terminal playing the video stream.

As an example, the step of the first terminal sending the video identity to the first server, and acquiring the audio playing address of the audio stream corresponding to the video stream from the first server includes:

generating a first playing request for acquiring playing information of a second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal;

sending the first playing request to a first service end so that the first service end can determine a second terminal closest to the first terminal based on the first playing request;

receiving video playing information from a first service terminal, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream;

generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of a second terminal and an identity of the audio stream;

and sending the second playing request to the first service end, and acquiring an audio playing address from the first service end.

By way of example, audio data and a standard time stamp common to the video stream are included in the audio stream. In this example, the step of the first terminal or the third terminal playing the audio stream in synchronization with the second terminal playing the video stream may include: decoding the audio stream to obtain a standard timestamp; performing difference processing on the standard timestamp and the local timestamp to obtain audio delay time; and playing the audio stream synchronously with the video stream played by the second terminal based on the audio delay time.

As an example, the step of playing the audio stream in synchronization with the second terminal playing the video stream based on the audio delay time may include: in response to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

Fig. 15 is a schematic block diagram of an example of an audio playback apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 15, the audio playing apparatus is applied to a first terminal, and includes:

the identifier obtaining unit 1510 is configured to obtain a video identity of a video stream being played by the second terminal.

The identifier sending unit 1520 is configured to send the video identifier to the first server, and obtain an audio playing address of the audio stream corresponding to the video identifier from the first server.

The audio stream acquiring unit 1530 is configured to acquire an audio stream from the first server based on the audio play address, and play the audio stream in synchronization with the second terminal playing the video stream.

As an example, the identity transmitting unit 1520 is further configured to: generating a first playing request for acquiring playing information of a second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal; sending the first playing request to a first service end so that the first service end can determine a second terminal closest to the first terminal based on the first playing request; receiving video playing information from a first service end, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream; generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of a second terminal and an identity of the audio stream; and sending the second playing request to the first service end, and acquiring an audio playing address from the first service end.

As an example, the audio playing method further includes:

sending the audio playing information to the second server for the second server to send the audio playing information to the third terminal, allowing the third terminal to obtain the audio stream from the first server based on the audio playing information and play the audio stream synchronously with the second terminal playing the video stream,

the audio playing information comprises an audio playing address, or comprises the audio playing address and the audio playing progress of the first terminal.

As an example, the audio stream includes audio data and a standard timestamp common to the video stream, wherein the first terminal or the third terminal plays the audio stream in synchronization with the second terminal playing the video stream, including: decoding the audio stream to obtain a standard timestamp; performing difference processing on the standard timestamp and the local timestamp to obtain audio delay time; and playing the audio stream synchronously with the video stream played by the second terminal based on the audio delay time.

As an example, playing an audio stream in synchronization with a second terminal playing a video stream based on an audio delay time includes: in response to the audio delay time being greater than 0, playing the audio stream after waiting for the audio stream for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after the audio data is discarded is played.

As an example, the first terminal or the third terminal plays the audio stream by: determining a channel of a speaker for playing an audio stream; rendering the sound channel direction of the audio data analyzed from the audio stream to obtain sound channel metadata of the audio data; determining spatial position information of each channel audio in the audio data based on the channel metadata; according to the sound channel and the space position information of the loudspeaker, carrying out sound channel conversion on the sound channel audio to obtain a converted sound channel audio; generating an audio file adapted to the sound channel of the speaker based on the converted sound channel audio; the audio file is played through the audio track of the speaker.

Fig. 16 is a schematic block diagram of another example of an audio playback apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 16, the audio playing apparatus is applied to a first service end, and includes:

the identifier receiving unit 1610 is configured to receive, from the first terminal, the video identity of the video stream being played by the second terminal, which is acquired by the first terminal.

The address transmitting unit 1620 is configured to transmit the audio play address of the audio stream corresponding to the video identity to the first terminal.

The audio stream transmitting unit 1630 is configured to transmit the audio stream to the first terminal based on the audio play address, so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream.

As an example, the address sending unit 1620 is further configured to: receiving a first playing request generated by a first terminal based on a video identity from the first terminal, wherein the first playing request is used for acquiring playing information of a second terminal, and the first playing request comprises the video identity and position information of the first terminal; determining a second terminal closest to the first terminal based on the first playing request; sending video playing information to a first terminal, wherein the video playing information comprises an identity of a second terminal and playing parameters of a video stream; receiving a second playing request generated by the first terminal based on the video playing information from the first terminal, wherein the second playing request is used for acquiring an audio playing address, and the second playing request comprises an identity of the second terminal and an identity of the audio stream; and sending the audio playing address to the first terminal based on the second playing request.

As an example, the audio playback apparatus is further configured to: receiving audio playing information sent to the third terminal by the second server from the third terminal, wherein the audio playing information is sent to the second server by the first terminal and comprises an audio playing address or comprises the audio playing address and the audio playing progress of the first terminal; and based on the audio playing information, sending the audio stream to the third terminal so that the third terminal can play the audio stream synchronously with the second terminal playing the video stream.

As an example, playing an audio stream in synchronization with a second terminal playing a video stream based on an audio delay time includes: in response to the audio delay time being greater than 0, playing the audio stream after waiting for the audio stream for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

As an example, the first terminal or the third terminal plays the audio stream by: determining a channel of a speaker for playing an audio stream; rendering the sound channel direction of the audio data analyzed from the audio stream to obtain sound channel metadata of the audio data; determining spatial position information of each channel audio in the audio data based on the channel metadata; according to the sound channel and the space position information of the loudspeaker, carrying out sound channel conversion on the sound channel audio to obtain a converted sound channel audio; generating an audio file adapted to the sound channel of the loudspeaker based on the converted sound channel audio; the audio file is played through the audio channel of the speaker.

Fig. 17 is a schematic block diagram of still another example of an audio playback apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 17, the audio playback apparatus is applied to a third terminal, and includes:

the information receiving unit 1710 is configured to receive audio playing information from the second server, where the audio playing information is sent to the second server by the first terminal, the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, the audio playing address is obtained by the first terminal, the first terminal obtains a video identity of a video stream being played by the second terminal, sends the video identity to the first server, and obtains an audio playing address of the audio stream corresponding to the video identity from the first server;

the acquisition and playing unit 1720 is configured to acquire an audio stream from the first server based on the audio playing information, and play the audio stream in synchronization with the second terminal playing the video stream.

As an example, the sending, by the first terminal, a video identity to the first server, and obtaining, from the first server, an audio play address of an audio stream corresponding to the video stream includes: generating a first playing request for acquiring playing information of a second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal; sending the first playing request to a first service end so that the first service end can determine a second terminal closest to the first terminal based on the first playing request; receiving video playing information from a first service end, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream; generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of the second terminal and an identity of the audio stream; and sending the second playing request to the first service end, and acquiring an audio playing address from the first service end.

As an example, playing an audio stream in synchronization with a second terminal playing a video stream based on an audio delay time includes: in response to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

Fig. 18 is a schematic block diagram of an example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 18, the audio/video stream synchronous playing apparatus is applied to a first terminal, and includes:

the obtaining unit 1810 is configured to obtain an audio stream from a first server, where the audio stream corresponds to a video stream being played by a second terminal, and the second terminal receives the video stream from the first server, where the audio stream includes audio data and a standard timestamp, and the video stream includes video data and a standard timestamp.

The receiving unit 1820 is configured to receive, from the first service end, a video delay time for the second terminal to play the video stream, where the video delay time is determined by the second terminal, and the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal.

The determining unit 1830 is configured to determine an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal.

The playing unit 1840 is configured to play the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time.

As an example, the playing unit 1840 is further configured to: performing difference processing on the audio delay time and the video delay time to obtain sound and picture asynchronous time; in response to the sound-picture asynchronization time being greater than 0, the audio stream is played after waiting for a time equal to the audio delay time, in response to the sound-picture asynchronization time being less than 0, audio data of the time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

As an example, the audio-video stream synchronous playing device is further configured to: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; receiving video delay time from a first service terminal at a first time interval in response to the audio-video asynchronous time meeting a first preset condition; receiving video delay time from the first service terminal at a second time interval in response to the audio-video asynchronous time meeting a second preset condition; responding to that the audio-video asynchronous time meets a third preset condition, and receiving video delay time from the first service end at a third time interval, wherein the first preset condition is as follows: the sound and picture asynchronous time is in a first preset interval, or the number of times that the sound and picture asynchronous time is outside the first preset interval is lower than the preset number of times within the preset duration, and the second preset condition is as follows: the sound-picture asynchronous time obtained twice continuously is greater than or equal to the upper limit value of the first preset interval, or less than or equal to the lower limit value of the first preset interval, and the third preset condition is that: the sound and picture asynchronous time obtained twice continuously is larger than or equal to a first threshold value, or is smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, the second threshold value is smaller than the lower limit value of the first preset interval, the first time interval is larger than a second time interval, and the second time interval is larger than a third time interval.

As an example, the audio-video stream synchronized playback device is further configured to: in response to receiving an audio waiting instruction input by a user, after waiting for a duration corresponding to the waiting time in the audio waiting instruction, playing an audio stream; and in response to receiving an audio discarding instruction input by a user, discarding the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and playing the audio stream after discarding the audio data.

As an example, the obtaining unit 1810 is configured to obtain an audio stream from a first server by: acquiring a video identity of a video stream being played by a second terminal; the video identity is sent to a first server, and an audio playing address of an audio stream corresponding to the video identity is obtained from the first server; and acquiring the audio stream from the first server based on the audio playing address.

Fig. 19 is a schematic block diagram of another example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 19, the apparatus for synchronously playing audio and video streams is applied to a first service end, and includes:

the stream transmission unit 1910 is configured to transmit an audio stream to a first terminal and a video stream to a second terminal, wherein the video stream corresponds to the audio stream, the audio stream includes audio data and standard time stamps, and the video stream includes video data and standard time stamps.

The time transmitting unit 1920 is configured to transmit the video delay time of the second terminal playing the video stream, which is determined by the second terminal, to the first terminal, so that the first terminal plays the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time of the first terminal playing the audio stream, wherein the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal, and the audio delay time is a difference between the standard timestamp and the local timestamp of the first terminal.

As an example, the step of the first terminal playing the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time for the first terminal playing the audio stream may include: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; in response to the sound-picture asynchronization time being greater than 0, the audio stream is played after waiting for a time equal to the audio delay time, in response to the sound-picture asynchronization time being less than 0, audio data of the time equal to the audio delay time is discarded from the audio stream, and the audio stream after discarding the audio data is played.

As an example, the audio-video stream synchronized playback device is further configured to: performing difference processing on the audio delay time and the video delay time to obtain sound and picture asynchronous time; responding to the fact that the audio-video asynchronous time meets a first preset condition, and sending the video delay time to the first terminal at a first time interval; responding to the fact that the audio-video asynchronous time meets a second preset condition, and sending the video delay time to the first terminal at a second time interval; responding to that the audio-video asynchronous time meets a third preset condition, and sending the video delay time to the first terminal at a third time interval, wherein the first preset condition is as follows: the sound and picture asynchronous time is in a first preset interval, or the number of times that the sound and picture asynchronous time is outside the first preset interval is lower than the preset number of times within the preset duration, and the second preset condition is as follows: the sound and picture asynchronous time obtained twice continuously is greater than or equal to the upper limit value of a first preset interval, or is less than or equal to the lower limit value of the first preset interval, and the third preset condition is as follows: the sound and picture asynchronous time obtained twice continuously is larger than or equal to a first threshold value, or is smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, the second threshold value is smaller than the lower limit value of the first preset interval, the first time interval is larger than a second time interval, and the second time interval is larger than a third time interval.

As an example, the audio-video stream synchronous playing device is further configured to: the first terminal responds to the received audio waiting instruction input by the user, and plays the audio stream after waiting for the duration corresponding to the waiting time in the audio waiting instruction; the first terminal responds to the audio discarding instruction input by the user, discards the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and plays the audio stream after the audio data is discarded.

As an example, the audio-video stream synchronized playback device is further configured to: receiving a video identity of a video stream being played by a second terminal, which is acquired by a first terminal, from the first terminal; sending an audio playing address of the audio stream corresponding to the video identity identification to the first terminal; and executing the step of transmitting the audio stream to the first terminal based on the audio playing address.

With regard to the audio playing apparatus and the audio and video stream synchronous playing apparatus in the above embodiments, the specific manner in which each unit performs operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

FIG. 20 is a block diagram illustrating an electronic device in accordance with an example embodiment. As shown in fig. 20, the electronic device 1000 includes a processor 101 and a memory 102 for storing processor-executable instructions. Here, the processor-executable instructions, when executed by the processor, cause the processor to perform an audio playing method or an audio-video stream synchronization playing method as described in the above exemplary embodiments.

By way of example, the electronic device 1000 need not be a single device, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) alone or in combination. The electronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a server interfaced with local or remote (e.g., via wireless transmission).

In the electronic device 1000, the processor 101 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example and not limitation, processor 101 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.

The processor 101 may execute instructions or code stored in the memory 102, wherein the memory 102 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

Memory 102 may be integrated with processor 101, e.g., with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 102 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 102 and the processor 101 may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., to enable the processor 101 to read files stored in the memory 102.

In addition, the electronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1000 may be connected to each other via a bus and/or a network.

In an exemplary embodiment, there may also be provided a computer-readable storage medium, in which instructions, when executed by a processor of a server, enable the server to perform an audio playing method or an audio and video stream synchronization playing method as described in the above exemplary embodiments. The computer readable storage medium may be, for example, a memory including instructions, and optionally: read-only memory (ROM), random-access memory (RAM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R LTH, BD-RE, blu-ray or optical disk memory, hard Disk Drive (HDD), solid State Disk (SSD), card memory (such as a multimedia card, a Secure Digital (SD) card or an extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a solid state disk, and any other device configured to store and provide computer programs and any associated data, data files and data structures in a non-transitory manner to a computer processor or computer such that the computer programs and any associated data processors or computer programs and computer programs are executed. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

In an exemplary embodiment, a computer program product may also be provided, which includes computer instructions that, when executed by a processor, implement the audio playing method or the audio and video stream synchronization playing method as described in the above exemplary embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for synchronously playing audio and video streams is characterized by being applied to a first terminal and comprising the following steps:

acquiring an audio stream from a first server, wherein the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first server, the audio stream comprises audio data and a standard timestamp, and the video stream comprises video data and the standard timestamp;

receiving, from the first service end, a video delay time determined by a second terminal for playing a video stream by the second terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal;

determining an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal;

and playing the audio stream synchronously with the video stream played by the second terminal based on the video delay time and the audio delay time.

2. The method for synchronously playing the audio and video streams according to claim 1, wherein the playing the audio stream synchronously with the playing of the video stream by the second terminal based on the video delay time and the audio delay time for the first terminal to play the audio stream comprises:

performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time;

in response to the sound-picture asynchronization time being greater than 0, playing the audio stream after waiting a time equal to the audio delay time,

and in response to the sound-picture asynchronous time being less than 0, discarding the audio data with the same length as the audio delay time from the audio stream, and playing the audio stream after discarding the audio data.

3. The audio/video stream synchronous playing method according to claim 1, wherein the audio/video stream synchronous playing method further comprises:

performing difference processing on the audio delay time and the video delay time to obtain sound and picture asynchronous time;

receiving the video delay time from the first service terminal at a first time interval in response to the audio-video asynchronous time meeting a first preset condition;

receiving the video delay time from the first service terminal at a second time interval in response to the sound-picture asynchronous time meeting a second preset condition;

receiving the video delay time from the first service terminal at a third time interval in response to the audio-visual asynchronous time meeting a third preset condition,

the first preset condition is as follows: the number of times that the sound-picture asynchronous time is outside the first preset interval is lower than the preset number of times within the preset duration,

the second preset condition is as follows: the sound-picture asynchronous time obtained twice continuously is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval,

the third preset condition is as follows: the sound-picture asynchronous time obtained twice continuously is larger than or equal to a first threshold value, or is smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of the first preset interval, the second threshold value is smaller than the lower limit value of the first preset interval,

wherein the first time interval is greater than the second time interval, which is greater than the third time interval.

4. The audio/video stream synchronous playing method according to claim 1, wherein the audio/video stream synchronous playing method further comprises:

in response to receiving an audio waiting instruction input by a user, after waiting for a duration corresponding to the waiting time in the audio waiting instruction, playing the audio stream;

and in response to receiving an audio discarding instruction input by a user, discarding the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and playing the audio stream after discarding the audio data.

5. The method for synchronously playing audio and video streams according to claim 1, wherein the audio stream is obtained from a first server by:

acquiring a video identity of the video stream being played by the second terminal;

sending the video identity identification to the first server, and acquiring an audio playing address of an audio stream corresponding to the video identity identification from the first server;

and acquiring the audio stream from the first service terminal based on the audio playing address.

6. A method for synchronously playing audio and video streams is characterized by being applied to a first service end and comprising the following steps:

transmitting an audio stream to a first terminal and transmitting a video stream to a second terminal, wherein the video stream corresponds to the audio stream, the audio stream comprises audio data and standard time stamps, and the video stream comprises video data and the standard time stamps;

and sending the video delay time of the second terminal for playing the video stream, which is determined by the second terminal, to the first terminal, so that the first terminal can play the audio stream synchronously with the second terminal based on the video delay time and the audio delay time of the first terminal for playing the audio stream, wherein the video delay time is the difference between the standard timestamp and the local timestamp of the second terminal, and the audio delay time is the difference between the standard timestamp and the local timestamp of the first terminal.

7. The method for synchronously playing audio and video streams according to claim 6, wherein the playing of the audio stream by the first terminal synchronously with the playing of the video stream by the second terminal is based on the video delay time and the audio delay time for the audio stream played by the first terminal, and the method comprises:

in response to the sound-picture asynchronous time being greater than 0, playing the audio stream after waiting a time equal to the audio delay time,

8. The audio-video stream synchronous playing method according to claim 6, wherein the audio-video stream synchronous playing method further comprises:

responding to the sound-picture asynchronous time meeting a first preset condition, and sending the video delay time to the first terminal at a first time interval;

responding to the sound-picture asynchronous time meeting a second preset condition, and sending the video delay time to the first terminal at a second time interval;

transmitting the video delay time to the first terminal at a third time interval in response to the audio-visual asynchronous time satisfying a third preset condition,

the first preset condition is as follows: the number of times that the sound and picture asynchronous time is outside the first preset interval is lower than the preset number of times within the preset duration,

the second preset condition is as follows: the sound and picture asynchronous time obtained twice continuously is larger than or equal to the upper limit value of the first preset interval, or is smaller than or equal to the lower limit value of the first preset interval,

9. The audio-video stream synchronous playing method according to claim 6, wherein the audio-video stream synchronous playing method further comprises:

the first terminal responds to an audio waiting instruction input by a user and plays the audio stream after waiting for a duration corresponding to the waiting time in the audio waiting instruction;

and the first terminal responds to the audio discarding instruction input by the user, discards the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and plays the audio stream with the discarded audio data.

10. The audio-video stream synchronous playing method according to claim 6, wherein the audio-video stream synchronous playing method further comprises:

receiving, from the first terminal, a video identity of the video stream being played by the second terminal, which is acquired by the first terminal;

sending an audio playing address of the audio stream corresponding to the video identity identification to the first terminal;

and executing the step of sending the audio stream to the first terminal based on the audio playing address.

11. An audio and video stream synchronous playing device is applied to a first terminal, and is characterized by comprising:

an obtaining unit, configured to obtain an audio stream from a first server, where the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first server, the audio stream includes audio data and a standard timestamp, and the video stream includes video data and the standard timestamp;

a receiving unit configured to receive, from the first service end, a video delay time for playing a video stream by a second terminal, where the video delay time is determined by the second terminal, and the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal;

a determining unit configured to determine an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal;

a playing unit configured to play the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time.

12. An audio and video stream synchronous playing device is applied to a first service end, and is characterized by comprising:

a stream transmission unit configured to transmit an audio stream to a first terminal and a video stream to a second terminal, wherein the video stream corresponds to the audio stream, the audio stream includes audio data and a standard time stamp, and the video stream includes video data and the standard time stamp;

a time sending unit configured to send a video delay time, determined by a second terminal, for playing a video stream by the second terminal to the first terminal, so that the first terminal plays the audio stream synchronously with the playing of the video stream by the second terminal based on the video delay time and an audio delay time for playing the audio stream by the first terminal, where the video delay time is a difference between the standard timestamp and a local timestamp of the second terminal, and the audio delay time is a difference between the standard timestamp and the local timestamp of the first terminal.

13. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions,

wherein the processor-executable instructions, when executed by the processor, cause the processor to perform the audio video stream synchronized playback method of any one of claims 1 to 10.

14. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to execute the audio-video stream synchronous playing method according to any one of claims 1 to 10.