CN115174980A

CN115174980A - Audio and video synchronization method, device, equipment and medium based on security queue

Info

Publication number: CN115174980A
Application number: CN202210703135.9A
Authority: CN
Inventors: 李捷明; 荀海峰; 胡德凯; 岳凯
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-10-11

Abstract

The application discloses an audio and video synchronization method, device, equipment and medium based on a security queue, wherein the method comprises the following steps: decoding the target video to obtain video stream data and audio stream data corresponding to the target video; respectively storing video stream data and audio stream data in a first safety queue and a second safety queue; respectively taking out video stream data and audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

Description

Audio and video synchronization method, device, equipment and medium based on security queue

Technical Field

The application relates to the field of digital audio, in particular to an audio and video synchronization method, device, equipment and medium based on a security queue.

Background

A video usually includes a video stream and an audio stream, and when playing the video, the video and the audio stream are decoded respectively and then played in two different threads, which may cause the problem of audio and video non-synchronization.

To solve this problem, the prior art generally employs a larger buffer to buffer the encoded data and the encoded image and audio frames. The uncertainty of the image content brings difficulty to the determination of the size of the buffer area, if the buffer area is too small, the data overflows, and if the buffer area is too large, the waste of the memory space is caused.

Disclosure of Invention

In order to solve the above problems, the present application provides an audio and video synchronization method, apparatus, device and medium based on a security queue, including:

decoding a target video to obtain video stream data and audio stream data corresponding to the target video; storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively; respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

In an example, the adjusting the first secure queue and the second secure queue according to the audio time difference specifically includes: if the first audio time difference is smaller than a first preset threshold value, performing fast dequeue operation through the first secure queue; and if the first audio time difference is larger than a second preset threshold value, executing the sleep operation through the second safety queue.

In an example, the performing, by the first secure queue, a fast dequeue operation specifically includes: determining a first redundant time length of the video stream data in the first secure queue according to the audio time difference; discarding a portion of the video stream data according to the first redundant time length and determining a second audio time difference after discarding the portion of the video stream data; and when the second audio time difference is larger than the first preset threshold value, stopping discarding the video stream data.

In an example, the executing the sleep operation through the second secure queue specifically includes: determining a second redundant time length of the audio stream data in the second secure queue according to the audio time difference; determining the sleeping time of the second safety queue according to the second redundant time length, and determining a third audio time difference after sleeping; and stopping the sleep operation of the second safety queue when the third audio time difference is smaller than the second preset threshold value.

In an example, the storing the video stream data and the audio stream data in a first secure queue and a second secure queue respectively specifically includes: determining enqueuing time corresponding to the video stream data and the audio stream data respectively according to decoding time corresponding to the video stream data and the audio stream data respectively; acquiring current storage states of the first secure queue and the second secure queue, and determining tail storage positions corresponding to the first secure queue and the second secure queue respectively; storing the video stream data to the tail storage position of the first safety queue according to the queuing time; and storing the audio stream data to the tail storage position of the second safety queue.

In an example, the obtaining a first playing time and a second playing time corresponding to the current video stream data and the current audio stream data respectively specifically includes: determining the relative playing time of the current video stream data relative to the starting point of the video stream data as the first playing time by analyzing the current video stream data; and determining the relative playing time of the current audio stream data relative to the starting point of the audio stream data as the second playing time by analyzing the current audio stream data.

In one example, the method further comprises: acquiring the decoding progress of the target video, wherein the decoding progress comprises video stream decoding duration and audio stream decoding duration; determining that the difference between the video stream decoding duration and the audio stream decoding duration is greater than a third preset threshold; a third thread whose set is in an idle state assists in decoding the target video.

The application also provides an audio and video synchronization device based on the security queue, which comprises: the decoding module is used for decoding the target video to obtain video stream data and audio stream data corresponding to the target video; the storage module is used for respectively storing the video stream data and the audio stream data in a first safety queue and a second safety queue; the playing module is used for taking out the video stream data and the audio stream data from the first safety queue and the second safety queue respectively through a pre-stored first thread and a pre-stored second thread; the analysis module is used for acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively; the time difference module is used for determining a first audio time difference of the target video according to the first playing time and the second playing time; and the adjusting module adjusts the first safety queue and the second safety queue according to the audio time difference.

The application also provides an audio and video synchronization device based on the security queue, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: decoding a target video to obtain video stream data and audio stream data corresponding to the target video; storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively; respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

The present application further provides a non-transitory computer storage medium storing computer-executable instructions configured to: decoding a target video to obtain video stream data and audio stream data corresponding to the target video; storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively; respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time respectively corresponding to current video stream data and current audio stream data; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

By the method, the decoded audio stream and the decoded video stream are respectively cached in the safety queue, the audio and video threads are obtained from the safety queue during playing, and the video safety queue is subjected to quick dequeue operation and partial data are discarded when the audio is fast by comparing the difference value of relative playing time during processing each time so as to quickly catch up with the audio to achieve synchronization; when the video is fast, the dequeue of the video stream is dormant so that the audio catches up, and when the error is within a certain range, the dormancy is stopped to carry out normal dequeue operation, so that the purpose of audio and video synchronization is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of an audio and video synchronization method based on a security queue in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an audio and video synchronization apparatus based on a security queue in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an audio and video synchronization device based on a security queue in an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow diagram of an audio and video synchronization method based on a security queue according to one or more embodiments of the present disclosure, where some input parameters or intermediate results in the flow allow manual intervention and adjustment to help improve accuracy. The analysis method according to the embodiment of the present application may be implemented by a terminal device or a server, which is not particularly limited in this application. For convenience of understanding and description, the following embodiments are described in detail by taking a server as an example.

It should be noted that the server may be a single device, or may be a system composed of multiple devices, that is, a distributed server, which is not specifically limited in this application.

As shown in fig. 1, an embodiment of the present application provides a method, including:

s101: and decoding the target video to obtain video stream data and audio stream data corresponding to the target video.

Generally, when playing video, different threads are used to play audio and video. Therefore, in order to achieve synchronization of audio and video playback of the target video, the target video needs to be decoded to obtain video stream data and audio stream data corresponding to the target video. The process comprises two steps: and determining the target video, namely acquiring target video data, and then decoding the target video.

The target video data may be stored in a storage device of the computer device in advance, and when the target video needs to be decoded, the computer device may select the target video data from the storage device. Of course, the computer device may also obtain the target video data from other external devices. For example, the target video data is stored in the cloud, and when the target video needs to be decoded, the computer device may acquire the target video data from the cloud.

S102: and respectively storing the video stream data and the audio stream data in a first safety queue and a second safety queue.

After video stream data and audio stream data are acquired, the video stream data and the audio stream data are respectively stored in a first safety queue and a second safety queue, the safety queues aim to have a consumer producer mode in multithreading, namely, a plurality of producers (producer threads) produce things, a plurality of consumers (consumer threads) consume, an intermediate space is needed for temporary storage, and the safety queues need to be considered for use, so that the safety of the process is ensured.

In one embodiment, when video stream data and audio stream data are respectively stored in a first secure queue and a second secure queue, first, according to decoding time corresponding to the video stream data and the audio stream data, enqueuing time corresponding to the video stream data and the audio stream data respectively is determined, then, current storage states of the first secure queue and the second secure queue are obtained, tail storage positions corresponding to the first secure queue and the second secure queue respectively are determined, and then, according to the enqueuing time, the video stream data are stored in the tail storage position of the first secure queue and the audio stream data are stored in the tail storage position of the second secure queue. And according to the first-in first-out rule of the queue, the output sequence of the data is ensured according to the time of the data entering the queue.

S103: and respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread.

When playing video, it is necessary to play audio stream data and video stream data through two threads. In this case, a first thread and a second thread are required to take out video stream data and audio stream data from the first secure queue and the second secure queue, respectively. It should be noted that the first thread and the second thread are both in an idle state and can be invoked at any time, and the target video is played by taking out data from the secure queue.

S104: and acquiring first playing time and second playing time corresponding to the current video stream data and the current audio stream data respectively.

When playing a video, a first playing time corresponding to video stream data and a second playing time corresponding to audio stream data are synchronously obtained through a thread, where it should be noted that the first playing time is a relative time of the video stream data with respect to starting playing, and the second playing time is a relative time of the audio stream data with respect to starting playing.

In one embodiment, when a first playing time and a second playing time corresponding to the current video stream data and the current audio stream data respectively are obtained, the relative playing time of the current video stream data relative to the starting point of the video stream data is determined as the first playing time by parsing the current video stream data, and the relative playing time of the current audio stream data relative to the starting point of the audio stream data is determined as the second playing time by parsing the current audio stream data.

S105: and determining a first audio time difference of the target video according to the first playing time and the second playing time.

The audio and video play is not synchronous due to different audio and video decoding time, rendering time and thread play speed, and the auditory sensitivity of human ears is highly visual. It is clearly perceptible if the audio is one frame less. But one frame less video is hardly noticeable. Therefore, the key to video synchronization is audio playback. And after the first playing time and the second playing time are obtained, determining a first audio time difference of the target video according to the first playing time and the second playing time. The first audio time difference is the playing time difference between the audio stream data and the video stream data.

S106: and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

After the first audio time difference is known, the first safety queue and the second safety queue can be adjusted according to the first audio time difference, so that the playing speeds of video stream data and audio stream data are adjusted, and audio and video are synchronized.

In one embodiment, when the first secure queue and the second secure queue are adjusted according to the audio time difference, the first audio time difference and a preset threshold value need to be compared, and if the first audio time difference is smaller than the first preset threshold value, a fast dequeue operation is performed through the first secure queue. And if the first audio time difference is larger than a second preset threshold value, executing the sleep operation through a second safety queue. Taking the first preset threshold value of-0.05 s and the second preset threshold value of 0.05s as an example, if the first audio time difference is less than-0.05 s, it indicates that the audio playing speed is fast at this time, and the first secure queue is required to perform fast dequeue operation. If the first audio time difference is greater than 0.05s, it indicates that the video playing speed is faster at this time, and the sleep operation should be executed through the second secure queue.

Further, when the fast dequeue operation is executed through the first secure queue, first, according to the audio time difference, a first redundant time length of video stream data in the first secure queue is determined, then, according to the first redundant time length, a part of the video stream data is discarded, and a second audio time difference after the part of the video stream data is discarded is determined; and when the second audio time difference is larger than the first preset threshold value, stopping discarding the video stream data. When the difference is small, the catch-up may be performed by a delay method. However, if the video is much behind the audio, the video needs to be chased by frame loss. Since video has I-frames and P, B-frames. Frame loss does not result in the loss of an I-frame, otherwise the subsequent pictures are not visible. The discarded portion needs to be judged to see if it contains an I-frame.

In one embodiment, when the sleep operation is executed through the second safety queue, first, according to the audio time difference, a second redundant time length of audio stream data in the second safety queue is determined, according to the second redundant time length, the sleep time of the second safety queue is determined, and a third audio time difference after the sleep is determined; and stopping the dormancy operation of the second safety queue when the third audio time difference is smaller than the second preset threshold value. That is, while the first safety sequence or the second safety sequence is adjusted, the changed audio time difference needs to be calculated in real time to prevent an excessive adjustment.

In one embodiment, since there is a speed difference between the video stream data and the audio stream data during decoding and rendering, the decoding progress of the target video may also be obtained while the target video is played, where the decoding progress includes a video stream decoding duration and an audio stream decoding duration. And then judging whether the difference between the video stream decoding time length and the audio stream decoding time length is greater than a third preset threshold value, if so, calling a third thread in an idle state to assist in decoding the target video so as to accelerate the decoding speed and help audio and video synchronization.

As shown in fig. 2, an embodiment of the present application further provides an audio and video synchronization apparatus based on a security queue, including:

the decoding module 201 decodes the target video to obtain video stream data and audio stream data corresponding to the target video.

The storage module 202 stores the video stream data and the audio stream data in a first secure queue and a second secure queue, respectively.

The playing module 203 takes out the video stream data and the audio stream data from the first secure queue and the second secure queue respectively through a pre-stored first thread and a pre-stored second thread.

The parsing module 204 obtains a first playing time and a second playing time corresponding to the current video stream data and the current audio stream data, respectively.

The time difference module 205 determines a first audio time difference of the target video according to the first playing time and the second playing time.

And the adjusting module 206 adjusts the first secure queue and the second secure queue according to the audio time difference pair.

As shown in fig. 3, an embodiment of the present application further provides an audio and video synchronization device based on a security queue, including:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

decoding a target video to obtain video stream data and audio stream data corresponding to the target video; storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively; respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time respectively corresponding to current video stream data and current audio stream data; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

An embodiment of the present application further provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to:

decoding a target video to obtain video stream data and audio stream data corresponding to the target video; storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively; respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread; acquiring first playing time and second playing time respectively corresponding to current video stream data and current audio stream data; determining a first audio time difference of the target video according to the first playing time and the second playing time; and adjusting the first secure queue and the second secure queue according to the first audio time difference.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the device and media embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, and reference may be made to some description of the method embodiments for relevant points.

The device and the medium provided by the embodiment of the application correspond to the method one to one, so the device and the medium also have the similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the method are explained in detail above, so the beneficial technical effects of the device and the medium are not repeated herein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. An audio and video synchronization method based on a security queue is characterized by comprising the following steps:

decoding a target video to obtain video stream data and audio stream data corresponding to the target video;

storing the video stream data and the audio stream data in a first safety queue and a second safety queue respectively;

respectively taking out the video stream data and the audio stream data from the first safety queue and the second safety queue through a pre-stored first thread and a pre-stored second thread;

acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively;

determining a first audio time difference of the target video according to the first playing time and the second playing time;

and adjusting the first secure queue and the second secure queue according to the first audio time difference.

2. The method of claim 1, wherein the adjusting the first secure queue and the second secure queue according to the audio time difference pair specifically comprises:

if the first audio time difference is smaller than a first preset threshold value, performing fast dequeue operation through the first secure queue;

and if the first audio time difference is larger than a second preset threshold value, executing sleep operation through the second secure queue.

3. The method according to claim 2, wherein the performing fast dequeue operations via the first secure queue includes:

determining a first redundant time length of the video stream data in the first safety queue according to the audio time difference;

discarding a portion of the video stream data according to the first redundant time length and determining a second audio time difference after discarding the portion of the video stream data;

and when the second audio time difference is larger than the first preset threshold value, stopping discarding the video stream data.

4. The method according to claim 2, wherein the performing the sleep operation via the second secure queue specifically includes:

determining a second redundant time length of the audio stream data in the second secure queue according to the audio time difference;

determining the sleeping time of the second safety queue according to the second redundant time length, and determining a third audio time difference after sleeping;

and stopping the sleep operation of the second safety queue when the third audio time difference is smaller than the second preset threshold value.

5. The method of claim 1, wherein storing the video stream data and the audio stream data in a first secure queue and a second secure queue respectively comprises:

determining enqueuing time corresponding to the video stream data and the audio stream data respectively according to decoding time corresponding to the video stream data and the audio stream data respectively;

acquiring current storage states of the first secure queue and the second secure queue, and determining tail storage positions corresponding to the first secure queue and the second secure queue respectively;

storing the video stream data to the tail storage position of the first secure queue according to the enqueuing time; and storing the audio stream data to the tail storage position of the second safety queue.

6. The method according to claim 1, wherein the obtaining of the first playing time and the second playing time corresponding to the current video stream data and the current audio stream data respectively specifically comprises:

determining the relative playing time of the current video stream data relative to the starting point of the video stream data as the first playing time by analyzing the current video stream data;

and determining the relative playing time of the current audio stream data relative to the starting point of the audio stream data as the second playing time by analyzing the current audio stream data.

7. The method of claim 1, further comprising:

acquiring the decoding progress of the target video, wherein the decoding progress comprises video stream decoding duration and audio stream decoding duration;

determining that the difference between the video stream decoding duration and the audio stream decoding duration is greater than a third preset threshold;

a third thread whose set is in an idle state assists in decoding the target video.

8. An audio and video synchronization device based on a security queue, comprising:

the decoding module is used for decoding the target video to obtain video stream data and audio stream data corresponding to the target video;

the storage module is used for respectively storing the video stream data and the audio stream data in a first safety queue and a second safety queue;

the playing module is used for taking out the video stream data and the audio stream data from the first safety queue and the second safety queue respectively through a pre-stored first thread and a pre-stored second thread;

the analysis module is used for acquiring first playing time and second playing time corresponding to current video stream data and current audio stream data respectively;

the time difference module is used for determining a first audio time difference of the target video according to the first playing time and the second playing time;

and the adjusting module adjusts the first safety queue and the second safety queue according to the audio time difference.

9. An audio and video synchronization device based on a secure queue, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform:

and adjusting the first safety queue and the second safety queue according to the first audio time difference pair.

10. A non-transitory computer storage medium storing computer-executable instructions, the computer-executable instructions configured to:

acquiring first playing time and second playing time respectively corresponding to current video stream data and current audio stream data;