WO2016015670A1

WO2016015670A1 - Audio stream decoding method and device

Info

Publication number: WO2016015670A1
Application number: PCT/CN2015/085612
Authority: WO
Inventors: 邝锐强
Original assignee: 广州金山网络科技有限公司
Priority date: 2014-08-01
Filing date: 2015-07-30
Publication date: 2016-02-04
Also published as: CN104113777B; CN104113777A

Abstract

Disclosed are an audio stream decoding method and device, the audio stream decoding method comprising: the number of audio frames currently cached in an audio stream buffer area of an electronic device is determined; when the number of frames is greater than a first quantity threshold value and less than the total number of audio frames which can be cached in the audio stream buffer area, after a preset length of time, discard processing is performed on the audio frames in the audio stream buffer area; un-discarded audio frames in the audio stream buffer area are decoded. Applying the described technical solution, the present invention utilises a method wherein when the number of audio frames in the audio stream buffer area is greater than the first quantity threshold value and less than the total number of audio frames which can be cached in the audio stream buffer area, after the preset length of time, discard processing is performed on the audio frames, thereby reducing the number of times audio frame discarding occurs within an audio frame decoding process, and reducing the number of times a plosive sound occurs.

Description

一种音频流解码方法及装置Audio stream decoding method and device

本申请要求于2014年08月01日提交中国专利局、申请号为201410375254.1发明名称为“一种音频流解码方法及装置”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 2014-1037525, the entire disclosure of which is hereby incorporated by reference in its entirety in its entirety in its entirety in .

技术领域Technical field

本申请涉及流媒体技术领域，特别涉及一种音频流解码方法及装置。The present application relates to the field of streaming media technologies, and in particular, to an audio stream decoding method and apparatus.

背景技术Background technique

为降低网络环境不稳定给视频文件的音频播放效果带来的影响，在对该视频文件的音频流进行解码之前，具有音频流解码功能的电子设备通常会预先使用一段队列结构的内存作为音频流缓冲区，比如图1中的AAC(Advanced Audio Coding，高级音频编码)缓冲区；由于音频流解码耗用较少的CPU(Central Processing Unit，中央处理器)资源，该电子设备通常采用CPU对音频流进行软解码。In order to reduce the impact of the unstable network environment on the audio playback effect of the video file, before decoding the audio stream of the video file, the electronic device with the audio stream decoding function usually uses the memory of the queue structure as the audio stream in advance. Buffer, such as the AAC (Advanced Audio Coding) buffer in Figure 1; because the audio stream decoding consumes less CPU (Central Processing Unit) resources, the electronic device usually uses CPU to audio. The stream is soft decoded.

网络长时间处于波动状态，会导致网络设备中视频文件的音频流大量持续涌入电子设备，由于该电子设备CPU的解码速率是固定的，该音频流缓冲区的缓存能力是有限的，在音频流解码过程中，过多音频流的持续涌入势必会导致丢帧现象的出现。The network is in a fluctuating state for a long time, which causes the audio stream of the video file in the network device to continuously flood into the electronic device. Since the decoding rate of the electronic device CPU is fixed, the buffering capability of the audio stream buffer is limited in the audio. During the stream decoding process, the continuous influx of excessive audio streams will inevitably lead to the phenomenon of frame dropping.

现有技术中，网络设备中视频文件的音频流大量持续涌入电子设备，导致音频流缓冲区被迅速充满，此时，丢弃该音频流缓冲区队列尾或队列头的音频帧。In the prior art, the audio stream of the video file in the network device continuously floods into the electronic device, causing the audio stream buffer to be quickly filled. At this time, the audio frame of the audio stream buffer queue tail or the queue header is discarded.

然而，音频帧的丢帧次数与该视频文件所呈现的声音效果存在直接关系：音频帧的丢帧次数越多，该视频文件播放过程中***音出现的次数也就越多；由于现有丢帧方法在音频流缓冲区处于饱和状态时，才会丢弃不能被立即解码/缓存的音频帧，使得该电子设备有大量持续的音频流涌入时不能及时对其进行解码及缓存，从而导致多次丢帧，***音现象多次出现。 However, the number of dropped frames of the audio frame is directly related to the sound effect of the video file: the more frames are lost in the audio frame, the more times the popping sound appears during the playback of the video file; The frame method discards the audio frame that cannot be decoded/cached immediately when the audio stream buffer is saturated, so that the electronic device cannot decode and cache it in time when a large number of continuous audio streams are flooded, resulting in more The second drop frame, the phenomenon of plosive sound appeared many times.

发明内容Summary of the invention

为解决上述问题，本申请实施例公开了一种音频流解码方法及装置，具体技术方案如下：To solve the above problem, the embodiment of the present application discloses an audio stream decoding method and device, and the specific technical solutions are as follows:

一种音频流解码方法，包括：An audio stream decoding method includes:

确定电子设备的音频流缓冲区当前缓存的音频帧的帧数；Determining the number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device;

当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；When the number of frames is greater than the first threshold, and is smaller than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration is exceeded, the audio frames in the audio stream buffer are discarded. ;

对所述音频流缓冲区中未被丢弃的音频帧进行解码。The audio frames that are not discarded in the audio stream buffer are decoded.

优选的，还包括：Preferably, the method further includes:

当该帧数达到所述总帧数时，立即对所述音频流缓冲区内的音频帧做丢弃处理。When the number of frames reaches the total number of frames, the audio frames in the audio stream buffer are immediately discarded.

优选的，对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值；或者Preferably, after discarding the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the first quantity threshold; or

对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。After the discarding process is performed on the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is smaller than the first quantity threshold .

优选的，确定电子设备的音频流缓冲区当前缓存的音频帧的帧数，包括：Preferably, determining the number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device includes:

根据预设的统计周期，周期性地确定电子设备的音频流缓存区当前缓存的音频帧的帧数。The number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device is periodically determined according to a preset statistical period.

优选的，所述统计周期大于所述预设时长。Preferably, the statistical period is greater than the preset duration.

优选的，所述对所述音频流缓冲区内的音频帧做丢弃处理，包括：Preferably, the discarding the audio frames in the audio stream buffer includes:

从所述音频流缓冲区的队列尾开始，对音频帧进行丢弃；Discarding the audio frame from the end of the queue of the audio stream buffer;

或 Or

从所述音频流缓冲区的队列头开始，对音频帧进行丢弃。The audio frame is discarded starting from the queue header of the audio stream buffer.

优选的，所述音频帧来源于视频文件，所述视频文件还包括视频帧；所述方法还包括：对所述视频文件中的视频帧进行解码。Preferably, the audio frame is derived from a video file, and the video file further includes a video frame; the method further includes: decoding a video frame in the video file.

优选的，所述对所述视频文件中的视频帧进行解码，包括：Preferably, the decoding of the video frame in the video file includes:

检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述DSP缓冲区为数字信号处理器的输入缓冲区，所述DSP缓冲区用于缓存视频帧数据；Detecting whether the DSP buffer of the digital signal processor is in an unsaturated state, wherein the DSP buffer is an input buffer of the digital signal processor, and the DSP buffer is used for buffering video frame data;

如果是，则向DSP缓冲区中***空白帧，直至该DSP缓冲区达到饱和状态；If yes, insert a blank frame into the DSP buffer until the DSP buffer reaches saturation;

对DSP缓冲区内的帧数据进行解码。Decode the frame data in the DSP buffer.

优选的，在所述检测数字信号处理器DSP缓冲区是否处于不饱和状态之前，还包括：Preferably, before the detecting the digital signal processor DSP buffer is in an unsaturated state, the method further includes:

检测当前是否有视频流数据缓存到所述DSP缓冲区，如果否，则执行所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述视频流数据为：预先建立的视频数据缓冲区中的视频帧数据，所述预先建立的视频数据缓冲区用于缓存来源于网络服务器侧的视频帧数据。Detecting whether there is currently video stream data buffered to the DSP buffer, and if not, executing whether the DSP signal buffer of the detection digital signal processor is in an unsaturated state, wherein the video stream data is: pre-established video data Video frame data in the buffer, the pre-established video data buffer is used to buffer video frame data originating from the network server side.

优选的，所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，包括：Preferably, the detecting the digital signal processor DSP buffer is in an unsaturated state, including:

实时检测DSP缓冲区是否处于不饱和状态；Real-time detection of whether the DSP buffer is in an unsaturated state;

或or

根据预设的检测周期，周期性地检测DSP缓冲区是否处于不饱和状态。According to the preset detection period, it is periodically detected whether the DSP buffer is in an unsaturated state.

检测DSP缓冲区处于不饱和状态的时长是否超过预设的阈值。Checks whether the duration of the DSP buffer in the unsaturated state exceeds the preset threshold.

优选的，所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，包括：Preferably, the detecting digital signal processor DSP buffer is in an unsaturated state, include:

检测DSP缓冲区中是否存在来自预先建立的视频数据缓冲区中的视频帧数据，且未被视频帧数据充满。It is detected whether there is video frame data from the pre-established video data buffer in the DSP buffer, and is not filled with video frame data.

优选的，所述对DSP缓冲区内的帧数据进行解码，包括：Preferably, the decoding of the frame data in the DSP buffer comprises:

对DSP缓冲区内的携带有网络标识的视频帧进行解码，所述携带有网络标识的视频帧为来源于预先建立的视频流缓冲区的帧数据。The video frame carrying the network identifier in the DSP buffer is decoded, and the video frame carrying the network identifier is frame data derived from a pre-established video stream buffer.

优选的，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述方法还包括：Preferably, the video frame of the video file carries a timestamp, and the audio frame of the video file carries a timestamp; the method further includes:

根据所述视频帧的时间戳与所述音频帧的时间戳的对应关系，对视频帧的解码结果及音频帧的解码结果进行同步播放。The decoding result of the video frame and the decoding result of the audio frame are synchronously played according to the correspondence between the time stamp of the video frame and the time stamp of the audio frame.

一种音频流解码装置，包括：An audio stream decoding apparatus includes:

帧数确定模块，用于确定电子设备的音频流缓冲区当前缓存的音频帧的帧数；a frame number determining module, configured to determine a frame number of an audio frame currently buffered by an audio stream buffer of the electronic device;

丢帧模块，用于当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；a frame dropping module, configured to: when the number of frames is greater than the first threshold, and less than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration is exceeded, in the audio stream buffer Audio frames are discarded;

音频帧解码模块，用于对所述音频流缓冲区中未被丢弃的音频帧进行解码。And an audio frame decoding module, configured to decode an audio frame that is not discarded in the audio stream buffer.

优选的，所述丢帧模块，还用于：Preferably, the frame dropping module is further configured to:

优选的，所述丢帧模块对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值；或者Preferably, after the frame dropping module discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the first quantity threshold; or

所述丢帧模块对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。 After the frame dropping module discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is smaller than The first number of thresholds is described.

优选的，所述帧数据确定模块，具体用于：Preferably, the frame data determining module is specifically configured to:

优选的，所述丢帧模块，具体用于：Preferably, the frame loss module is specifically configured to:

或or

优选的，所述音频帧来源于视频文件，所述视频文件还包括视频帧；所述装置还包括：视频帧解码模块。Preferably, the audio frame is derived from a video file, and the video file further includes a video frame; the device further includes: a video frame decoding module.

优选的，所述视频帧解码模块，包括：Preferably, the video frame decoding module includes:

第一检测子模块，用于检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述DSP缓冲区用于缓存视频帧数据；a first detecting submodule, configured to detect whether the digital signal processor DSP buffer is in an unsaturated state, wherein the DSP buffer is used to buffer video frame data;

空白帧填充子模块，用于在所述检测模块的检测结果为是的情况下，向DSP缓冲区中***空白帧EOS，直至该DSP缓冲区达到饱和状态；a blank frame filling submodule, configured to insert a blank frame EOS into the DSP buffer when the detection result of the detection module is YES, until the DSP buffer reaches a saturated state;

视频帧解码子模块，用于对DSP缓冲区内的帧数据进行解码。The video frame decoding sub-module is configured to decode frame data in the DSP buffer.

优选的，所述视频帧解码模块，还包括：Preferably, the video frame decoding module further includes:

第二检测子模块，用于检测当前是否有视频流数据缓存到所述DSP缓冲区，并在检测结果为否的情况下，触发所述第一检测模块进行工作，其中，所述视频流数据为：预先建立的视频数据缓冲区中的视频帧数据，所述预先建立的视频数据缓冲区用于缓存来源于网络服务器侧的视频帧数据。a second detecting submodule, configured to detect whether a video stream data is currently buffered to the DSP buffer, and triggering the first detecting module to work if the detection result is negative, where the video stream data is It is: video frame data in a pre-established video data buffer, and the pre-established video data buffer is used to buffer video frame data originating from the network server side.

优选的，所述第一检测子模块，具体用于：Preferably, the first detecting submodule is specifically configured to:

或 Or

优选的，所述视频帧解码子模块，具体用于：Preferably, the video frame decoding submodule is specifically configured to:

优选的，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述装置还包括：Preferably, the video frame of the video file carries a timestamp, and the audio frame of the video file carries a timestamp; the device further includes:

播放模块，用于根据所述视频帧的时间戳与所述音频帧的时间戳的对应关系，对视频帧的解码结果及音频帧的解码结果进行同步播放。The playing module is configured to synchronously play the decoding result of the video frame and the decoding result of the audio frame according to the correspondence between the timestamp of the video frame and the timestamp of the audio frame.

为了实现上述目的，本申请实施例还提供了一种存储介质，其中，该存储介质用于存储应用程序，所述应用程序用于在运行时执行本申请所述的一种音频流解码方法。In order to achieve the above object, an embodiment of the present application further provides a storage medium, where the storage medium is used to store an application, and the application is used to execute an audio stream decoding method described in the present application at runtime.

为了实现上述目的，本申请实施例还提供了一种应用程序，其中，该应用程序用于在运行时执行本申请所述的一种音频流解码方法。In order to achieve the above object, an embodiment of the present application further provides an application, where the application is used to perform an audio stream decoding method described in the present application at runtime.

为了实现上述目的，本申请实施例还提供了一种电子设备，包括：In order to achieve the above object, an embodiment of the present application further provides an electronic device, including:

处理器、存储器、通信接口和总线；a processor, a memory, a communication interface, and a bus;

所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信；The processor, the memory, and the communication interface are connected by the bus and complete communication with each other;

所述存储器存储可执行程序代码；The memory stores executable program code;

所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序，以用于：The processor operates by reading executable program code stored in the memory The program corresponding to the executable program code is used to:

应用上述技术方案，可以确定电子设备的音频流缓冲区当前缓存的音频帧的帧数，当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理，对所述音频流缓冲区中未被丢弃的音频帧进行解码。Applying the foregoing technical solution, the number of frames of the audio frame currently buffered by the audio stream buffer of the electronic device may be determined, when the number of frames is greater than the first quantity threshold, and less than the total number of frames of the audio frame that can be buffered by the audio stream buffer. After the preset duration is passed, the audio frames in the audio stream buffer are discarded, and the audio frames that are not discarded in the audio stream buffer are decoded.

与现有技术相比，本申请实施例采用在音频流缓冲区内的音频帧帧数大于第一数量阈值，且小于音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对音频帧做丢弃处理的方法，减少了音频帧解码过程中音频帧的丢帧次数，降低***音出现的次数。Compared with the prior art, the embodiment of the present application adopts a preset when the number of audio frame frames in the audio stream buffer is greater than the first threshold, and is smaller than the total number of frames of the audio frame that can be buffered by the audio stream buffer. After the duration, the method of discarding the audio frame reduces the number of dropped frames of the audio frame during the decoding process of the audio frame, and reduces the number of occurrences of the popping sound.

附图说明DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present application, and other drawings can be obtained according to the drawings without any creative work for those skilled in the art.

图1为现有技术中音频流解码方法的示例性流程图；1 is an exemplary flowchart of a prior art audio stream decoding method;

图2为本申请实施例提供的一种音频流解码方法的流程图；2 is a flowchart of an audio stream decoding method according to an embodiment of the present application;

图3为现有技术音频流解码方法的示例性丢帧图；3 is an exemplary frame loss diagram of a prior art audio stream decoding method;

图4为本申请实施例提供音频流解码方法的示例性丢帧图；FIG. 4 is an exemplary lost frame diagram of an audio stream decoding method according to an embodiment of the present application; FIG.

图5为本申请实施例提供的另一种音频流解码方法的流程图；FIG. 5 is a flowchart of another audio stream decoding method according to an embodiment of the present application;

图6为本申请实施提供的图5中S204的一种实施方式的流程图；Figure 6 is a flowchart of an embodiment of S204 of Figure 5 provided by the implementation of the present application;

图7为本申请实施例提供的一种音频流解码装置的结构示意图；FIG. 7 is a schematic structural diagram of an audio stream decoding apparatus according to an embodiment of the present disclosure;

图8为本申请实施例提供的另一种音频流解码装置的结构示意图； FIG. 8 is a schematic structural diagram of another audio stream decoding apparatus according to an embodiment of the present disclosure;

图9为本申请实施例提供的图8中视频帧解码模块704的一种结构示意图。FIG. 9 is a schematic structural diagram of a video frame decoding module 704 of FIG. 8 according to an embodiment of the present application.

具体实施方式detailed description

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

随着网络多媒体技术的快速发展，多元化的网络多媒体文件不断丰富人们视觉和听觉感受；流媒体(比如视频流、音频流等)的出现，使得用户不必像非流式播放方式那样，待整个多媒体文件完全下载到播放设备本地后才能观看其中的内容；而只需经过几秒或几十秒的传输延迟即可在播放设备上对其进行解码及播放，它为用户带来了新的视听体验。With the rapid development of network multimedia technology, diversified network multimedia files continue to enrich people's visual and auditory feelings; the emergence of streaming media (such as video streaming, audio streaming, etc.) makes users not have to wait for the whole like non-streaming The multimedia file can be completely downloaded to the playback device before it can be viewed; and it can be decoded and played on the playback device after a few seconds or tens of seconds of transmission delay, which brings new audiovisual to the user. Experience.

在网络环境不稳定的情况下，网路服务器侧的音频流持续大量地涌入电子设备中，该电子设备的解码能力及音频流缓冲区的缓存能力都是有限的，此时需要对音频帧做丢弃处理。由于音频的播放效果与音频帧的丢帧次数关系密切，即音频帧的丢帧次数越多，该音频在播放时***音出现的次数也就越多。为了使网络侧音频流的播放效果比较平滑，本申请实施例提供了一种音频流解码方法及装置。In the case of unstable network environment, the audio stream on the network server side continues to flood into the electronic device. The decoding capability of the electronic device and the buffering capability of the audio stream buffer are limited. In this case, the audio frame needs to be used. Do the discarding process. Since the playing effect of the audio is closely related to the number of dropped frames of the audio frame, that is, the more the number of dropped frames of the audio frame, the more times the popping sound appears in the audio during playback. In order to make the playback effect of the network side audio stream relatively smooth, the embodiment of the present application provides an audio stream decoding method and apparatus.

下面首先对本申请实施例提供的一种音频流解码方法进行介绍。An audio stream decoding method provided by an embodiment of the present application is first introduced.

需要说明的是，本申请实施例方法适用于电子设备中，在实际应用中，该电子设备可以为笔记本电脑、台式电脑、平板电脑，及智能手机等等，本申请实施例对此不作限定。It should be noted that the method in the embodiment of the present application is applicable to an electronic device. In an actual application, the electronic device may be a notebook computer, a desktop computer, a tablet computer, a smart phone, or the like.

如图2所示，一种音频流解码方法，可以包括：As shown in FIG. 2, an audio stream decoding method may include:

S201，确定电子设备的音频流缓冲区当前缓存的音频帧的帧数。S201. Determine a frame number of an audio frame currently buffered by an audio stream buffer of the electronic device.

为了能够在网络不稳定的情况(比如网络不稳导致网络服务器侧的高级音频编码AAC音频流会瞬间涌入电子设备)下正常工作，通常会预先在该电子设备的中央处理器CPU中开辟一块音频流缓冲区，来暂时缓存来自网络服务器侧的AAC音频流数据。为了便于描述，在本申请实施例中，将开辟的音频流缓冲区简称为AAC缓冲区，如图1所示。 In order to be able to work normally in the case of unstable network conditions (such as the network instability caused by the high-speed audio coded AAC audio stream on the network server side), a piece of CPU in the central processing unit of the electronic device is usually pre-opened. An audio stream buffer to temporarily buffer AAC audio stream data from the network server side. For ease of description, in the embodiment of the present application, the audio stream buffer that is opened is simply referred to as an AAC buffer, as shown in FIG. 1 .

可以理解的是，当网络设备侧的AAC音频流的速率大于电子设备的解码速率时，流入该电子设备中的一部分音频帧因不能够被及时解码，而被临时缓存到AAC缓冲区中。此时，可以对AAC缓冲区当前缓存的音频帧的帧数进行统计，进而决定是否需要进行丢帧操作。It can be understood that when the rate of the AAC audio stream on the network device side is greater than the decoding rate of the electronic device, a part of the audio frames flowing into the electronic device are temporarily buffered into the AAC buffer because they cannot be decoded in time. In this case, the number of frames of the audio frame currently buffered in the AAC buffer can be counted to determine whether a frame loss operation is required.

可选的，在本申请实施例的一个具体实施方式中，可以根据预设的统计周期，周期性地确定电子设备的音频流缓存区当前缓存的音频帧的帧数。比如每隔6s对AAC缓冲区内缓存的音频帧的帧数进行统计。Optionally, in a specific implementation manner of the embodiment of the present application, the number of frames of the audio frame currently buffered in the audio stream buffer area of the electronic device may be periodically determined according to a preset statistical period. For example, the number of frames of audio frames buffered in the AAC buffer is counted every 6 seconds.

需要说明的是，本申请中的预设的统计周期可以为电子设备***默认的统计周期，或用户根据实际需求自行设定的统计周期，本申请实施例对此不作限定。It should be noted that the preset statistical period in the present application may be a default statistical period of the electronic device system, or a statistical period set by the user according to actual needs, which is not limited in this embodiment of the present application.

S202，当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理。S202. When the number of frames is greater than the first threshold, and is smaller than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration, the audio frame in the audio stream buffer is Discard processing.

需要说明的是，通常情况下，音频流缓存区能够缓存的音频帧的总帧数为25帧，本申请实施例中的第一数量阈值取值为15，当然，也可以根据实际需求自行设定，本申请实施例对此不作限定。It should be noted that, in general, the total number of frames of the audio frame that can be buffered in the audio stream buffer is 25 frames. The first threshold in the embodiment of the present application has a value of 15, which may be set according to actual needs. The embodiment of the present application does not limit this.

此外，基于实际经验，本申请实施例中的预设时长通常为5s，当然，也可以根据实际需求自行设定，本申请实施例对此不作限定。优选的，所述预设时长小于所述统计周期。In addition, based on actual experience, the preset duration in the embodiment of the present application is usually 5 s, and of course, it can be set according to actual needs, which is not limited by the embodiment of the present application. Preferably, the preset duration is less than the statistical period.

可选的，在本申请的一个实施方式中，在对所述音频流缓冲区内的音频帧做丢弃处理时，可以从所述音频流缓冲区的队列尾开始，对音频帧进行丢弃。Optionally, in an implementation manner of the application, when the audio frame in the audio stream buffer is discarded, the audio frame may be discarded from the end of the queue of the audio stream buffer.

可选的，在本申请的另一个实施方式中，在对所述音频流缓冲区内的音频帧做丢弃处理时，可以从所述音频流缓冲区的队列头开始，对音频帧进行丢弃。Optionally, in another implementation manner of the present application, when the audio frame in the audio stream buffer is discarded, the audio frame may be discarded from the queue header of the audio stream buffer.

可选的，在对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值。此时，结合实例对本申请方法与现有技术进行比较，如图3和图4所示，通常情况下，电子设备中的解码器对音频帧的解码速率为25帧/秒，电子设备中的音频流缓冲区的存储容量为25帧，可以取n1＝15，预设时长T＝5秒；网络服务器侧的音频流的涌入速度为27帧/秒。Optionally, after discarding the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the first quantity threshold. At this time, the method of the present application is compared with the prior art by combining examples. As shown in FIG. 3 and FIG. 4, in general, the decoding rate of the audio frame by the decoder in the electronic device is 25 frames/second, in the electronic device. Storage capacity of the audio stream buffer The amount is 25 frames, which can take n1=15, the preset duration is T=5 seconds; the intrusion speed of the audio stream on the network server side is 27 frames/second.

为了方便理解，假定第0秒时，该音频流缓冲区是空的。如图3所示，现有技术方法中，第1秒时，27帧音频帧到达该电子设备中，该电子设备的解码器只能解码25帧音频帧，剩下的2帧缓存到音频流缓冲区中；第2秒时，继续有27帧音频帧到达该电子设备中，该电子设备的解码器只能解码25帧音频帧，此时该音频流缓冲区缓存的音频帧帧数是4；依次类推，第12秒时，该音频流缓冲区缓存的音频帧帧数是24帧；第13秒时，继续有27帧音频帧到达该电子设备中，该电子设备的解码器只能解码25帧音频帧，剩余2帧不能被解码，而该音频流缓冲区缓存的缓存能力是25帧，且当前缓存的音频帧帧数是24帧，只能再缓存1帧，此时该音频流缓冲区达到饱和状态，还有1帧既不能被缓存，又不能被解码，只能被丢弃。For ease of understanding, the audio stream buffer is assumed to be empty at 0 seconds. As shown in FIG. 3, in the prior art method, at the first second, 27 frames of audio frames arrive in the electronic device, and the decoder of the electronic device can only decode 25 frames of audio frames, and the remaining 2 frames are buffered to the audio stream. In the buffer; at the 2nd second, 27 frames of audio frames continue to arrive in the electronic device, and the decoder of the electronic device can only decode 25 frames of audio frames, and the number of audio frame frames buffered by the audio stream buffer is 4 And so on, at the 12th second, the number of audio frame frames buffered by the audio stream buffer is 24 frames; at the 13th second, 27 frames of audio frames continue to arrive in the electronic device, and the decoder of the electronic device can only decode 25 frames of audio frames, the remaining 2 frames can not be decoded, and the buffering capacity of the audio stream buffer is 25 frames, and the number of currently buffered audio frames is 24 frames, and only 1 frame can be cached. The buffer is saturated, and one frame can neither be cached nor decoded, and can only be discarded.

第14秒时，继续有27帧音频帧到达该电子设备中，剩余2帧不能被解码，而该音频流缓冲区在第13秒时已达到饱和状态，剩余的2帧既不能被解码、又不能被缓存，只能丢弃。由此可知，在第12秒之后，每一秒都会出现丢帧现象。At the 14th second, 27 frames of audio frames continue to arrive in the electronic device, and the remaining 2 frames cannot be decoded. The audio stream buffer has reached saturation at the 13th second, and the remaining 2 frames cannot be decoded. Can not be cached, can only be discarded. It can be seen that after the 12th second, frame dropping occurs every second.

如图4所示，在本申请方法中，第1秒时，27帧音频帧到达该电子设备中，该电子设备的解码器只能解码25帧音频帧，剩下的2帧缓存到音频流缓冲区中；以此类推，第8秒时，该音频流缓冲区缓存的音频帧帧数是16帧，大于n1，持续时长T＝5秒后，该音频流缓存区缓存的音频帧的帧数是24帧，此时对缓冲区中的音频帧进行丢弃，直至该缓存区缓存的音频帧的帧数是15帧，此后，只需间隔5秒对音频帧进行丢弃，而不必像现有技术那样每隔1秒都要丢帧。As shown in FIG. 4, in the method of the present application, at the 1st second, 27 frames of audio frames arrive in the electronic device, the decoder of the electronic device can only decode 25 frames of audio frames, and the remaining 2 frames are buffered to the audio stream. In the buffer; and so on, at the 8th second, the number of audio frame buffers buffered by the audio stream buffer is 16 frames, which is greater than n1, and the duration of the length T=5 seconds, the frame of the audio frame buffered by the audio stream buffer The number is 24 frames. At this time, the audio frame in the buffer is discarded until the number of frames of the audio frame buffered in the buffer is 15 frames. Thereafter, the audio frame is discarded by only 5 seconds, instead of As with technology, frames are dropped every 1 second.

以上实施例中列举实际参数仅仅是为方便理解，本申请实施例包括但不限于以上参数。在实际情况中，对于其它参数，本申请实施例的方法仍能在一定程度上降低了丢帧的次数，减少***音的出现次数。The actual parameters are listed in the above embodiments only for the convenience of understanding, and the embodiments of the present application include but are not limited to the above parameters. In the actual situation, for other parameters, the method in the embodiment of the present application can still reduce the number of dropped frames to a certain extent, and reduce the number of occurrences of popping sounds.

可选的，在对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。比如，第一数量阈值为15，第二数量阈值可以为小于15的整数值，此时，本申请实施例仍能够降低音频帧的丢帧次数，推导方法如上述实例，本申请对此不再赘述。Optionally, after discarding the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is smaller than The first number of thresholds is described. For example, the first number threshold is 15, and the second number threshold may be less than 15. In this case, the embodiment of the present application can still reduce the number of dropped frames of the audio frame, and the derivation method is as described in the above example, which is not described in this application.

有时，在没有达到预设时长时，音频流缓冲内缓存的音频帧的帧数就已达到该音频流缓冲区能够缓存音频帧的总帧数，此时，可以立即对所述音频流缓冲区内的音频帧做丢弃处理。Sometimes, when the preset duration is not reached, the number of frames of the audio frame buffered in the audio stream buffer has reached the total number of frames in the audio stream buffer that can buffer the audio frame. In this case, the audio stream buffer can be immediately buffered. The audio frames inside are discarded.

S203，对所述音频流缓冲区中未被丢弃的音频帧进行解码。S203. Decode an audio frame that is not discarded in the audio stream buffer.

由此可见，本申请实施例方法可以确定电子设备的音频流缓冲区当前缓存的音频帧的帧数，当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理，对所述音频流缓冲区中未被丢弃的音频帧进行解码。It can be seen that the method of the embodiment of the present application can determine the number of frames of the audio frame currently buffered by the audio stream buffer of the electronic device, when the number of frames is greater than the first threshold, and is smaller than the audio frame that can be buffered by the audio stream buffer. After the preset number of frames, the audio frames in the audio stream buffer are discarded, and the audio frames that are not discarded in the audio stream buffer are decoded.

可选的，在本申请的一个实施例中，所述音频帧来源于视频文件，所述视频文件还包括视频帧；如图5所示，所述方法还包括：Optionally, in an embodiment of the present application, the audio frame is derived from a video file, and the video file further includes a video frame. As shown in FIG. 5, the method further includes:

S204，对所述视频文件中的视频帧进行解码。S204. Decode a video frame in the video file.

可选的，在本申请实施例的一个实施例中，如图6所示，上述S204，可以包括：Optionally, in an embodiment of the embodiment of the present application, as shown in FIG. 6, the foregoing S204 may include:

S204a，检测数字信号处理器DSP缓冲区是否处于不饱和状态；其中，所述DSP缓冲区为数字信号处理器的输入缓冲区，所述DSP缓冲区用于缓存视频帧数据。S204a: Detect whether the digital signal processor DSP buffer is in an unsaturated state; wherein the DSP buffer is an input buffer of a digital signal processor, and the DSP buffer is used to buffer video frame data.

通常情况下，DSP(Digital Signal Processor，数字信号处理器)内包含输入缓冲区和输出缓冲区，为了便于描述，在本申请实施例中，将数字信号处理器的输入缓冲区简称为DSP缓冲区，它的主要作用是对进入DSP的视频帧数据进行暂时缓存。Generally, the input signal buffer and the output buffer are included in the DSP (Digital Signal Processor). For the convenience of description, in the embodiment of the present application, the input buffer of the digital signal processor is simply referred to as a DSP buffer. Its main role is to temporarily cache the video frame data entering the DSP.

为了能够在网络不稳定的情况(比如网络不稳导致网络服务器侧的H.264视频流会瞬间涌入播放设备)下正常工作，通常会预先在视频播放设备的驱动或硬件中开辟一块视频流缓冲区，来暂时缓存来自网络服务器侧的H.264 视频流数据，为了便于描述，在本申请实施例中，将开辟的视频流缓冲区简称为H.264缓冲区。In order to be able to work normally under unstable network conditions (such as the network instability caused by H.264 video streaming on the network server side), a video stream is usually opened in the driver or hardware of the video playback device. Buffer to temporarily cache H.264 from the web server side For the convenience of description, in the embodiment of the present application, the video stream buffer that is opened is simply referred to as an H.264 buffer.

可以理解的是，开辟的缓冲区越小，视频流到达DSP缓冲区的延时就越小，该视频流的播放延时也就越小，但可能导致播放不平滑；开辟的缓冲区越大，播放则越平滑，但视频流到达DSP缓冲区的延时就越长。It can be understood that the smaller the buffer is opened, the smaller the delay of the video stream reaching the DSP buffer, and the smaller the playback delay of the video stream, but the playback may not be smooth; the larger the buffer is opened. The smoother the playback, the longer the delay of the video stream reaching the DSP buffer.

可选的，在本申请的一个实施方式中，可以实时检测DSP缓冲区是否处于不饱和状态。Optionally, in an implementation manner of the application, the DSP buffer may be detected in an unsaturation state in real time.

可选的，在本申请的另一个实施方式中，可以根据预设的检测周期，周期性地检测DSP缓冲区是否处于不饱和状态；比如每隔1s对DSP的缓冲区进行一次检测。比如每隔1s对DSP的缓冲区进行一次检测。Optionally, in another implementation manner of the present application, the DSP buffer may be periodically detected to be in an unsaturated state according to a preset detection period; for example, the buffer of the DSP is detected every 1 s. For example, the DSP buffer is tested every 1 s.

需要说明的是，本申请中的预设的检测周期可以为播放设备***默认的检测周期，或用户根据实际需求自行设定的检测周期，本申请实施例对此不作限定。It should be noted that the preset detection period in the present application may be a default detection period of the playback device system, or a detection period set by the user according to actual requirements, which is not limited in this embodiment of the present application.

S204b，向DSP缓冲区中***空白帧，直至该DSP缓冲区达到饱和状态。S204b, insert a blank frame into the DSP buffer until the DSP buffer reaches a saturated state.

本申请实施例中的空白帧可以理解为透明帧，覆盖在具有实际画面内容的视频帧上不会对该视频帧的播放效果产生影响，在H.264编码方式中，该空白帧也被叫做EOS帧。The blank frame in the embodiment of the present application can be understood as a transparent frame, and the coverage on the video frame having the actual picture content does not affect the playback effect of the video frame. In the H.264 coding mode, the blank frame is also called EOS frame.

前面已经提到，只有当DSP缓冲区被充满时，DSP解码器才能够对该处于饱和状态的DSP缓冲区的视频帧进行解码；在本申请实施例中，当由于网络阻塞等原因致使DSP缓冲区处于不饱和状态时，可以向该处于未饱和状态的DSP缓冲区内***空白帧，以使该DSP缓冲区迅速达到饱和状态。As mentioned above, the DSP decoder can decode the video frame of the DSP buffer in the saturated state only when the DSP buffer is full. In the embodiment of the present application, the DSP buffer is caused due to network congestion or the like. When the zone is in an unsaturated state, a blank frame can be inserted into the DSP buffer that is not saturated, so that the DSP buffer quickly reaches saturation.

需要说明的是，当DSP缓冲区处于饱和状态时，本申请实施例方法不会继续向该DPS缓冲区中***空白帧，以避免因DSP缓冲区中帧数据过满而导致的丢帧问题。It should be noted that, when the DSP buffer is in a saturated state, the method in this embodiment does not continue to insert a blank frame into the DPS buffer to avoid the frame loss problem caused by the frame data in the DSP buffer being overfilled.

S204c，对DSP缓冲区内的帧数据进行解码。S204c, decoding frame data in the DSP buffer.

在DSP缓冲区达到饱和状态时，可以迫使该DSP缓冲区的包含空白帧在内的帧数据全部溢出，从而实现对该DSP缓冲区内残留的帧数据进行解码。When the DSP buffer reaches a saturated state, the frame data including the blank frame of the DSP buffer may be forced to overflow, thereby decoding the residual frame data in the DSP buffer.

由此可见，在DSP缓冲区处于不饱和状态、DSP缓冲区中残留视频流不能被立即解码的情况下，本申请实施例能够采用向DSP缓冲区内***不影响视频流播放效果的空白帧的方法，使该DSP缓冲区快速达到饱和状态，继而对包含上述残留视频流的帧数据进行解码，实现了在不影响该部分视频流后续播放效果的基础上，对上述残留视频流数据的立即解码。Therefore, in the case that the DSP buffer is in an unsaturated state and the residual video stream in the DSP buffer cannot be decoded immediately, the embodiment of the present application can adopt the insertion into the DSP buffer without affecting the view. The method of playing a blank frame of the frequency stream enables the DSP buffer to quickly reach a saturated state, and then decodes the frame data including the residual video stream, thereby realizing that the subsequent playback effect of the part of the video stream is not affected, Immediate decoding of the above residual video stream data.

可选的，在本申请的一个实施例中，上述S204a，可以包括：Optionally, in an embodiment of the present application, the foregoing S204a may include:

可以理解的是，正常情况下，H.264缓冲区中的H.264视频流缓存到DSP缓冲区只需极短的时间；有时，网络会出现暂时阻塞的问题，但很快会恢复正常，此时DSP缓冲区可能会短暂的处于不饱和状态，很快又会达到饱和状态，且不会对后续的播放效果产生较大的影响。Understandably, under normal circumstances, the H.264 video stream in the H.264 buffer is buffered to the DSP buffer in a very short time; sometimes, the network will temporarily block the problem, but will soon return to normal. At this point, the DSP buffer may be temporarily in an unsaturated state, and will soon reach saturation state, and will not have a greater impact on subsequent playback effects.

基于上述情况，在本申请实施例中，可以设置一个不影响整体观看效果的时长阈值，如果DSP缓冲区处于不饱和状态的时长超过时长阈值，则向该DSP缓冲区内***空白帧，使该DSP缓冲区迅速达到饱和状态；如果DSP缓冲区处于不饱和状态的时长不超过时长阈值(比如上述情况)，则可以不向该DSP缓冲区内***空白帧。Based on the above situation, in the embodiment of the present application, a duration threshold that does not affect the overall viewing effect may be set. If the duration of the DSP buffer in the unsaturated state exceeds the duration threshold, a blank frame is inserted into the DSP buffer to enable the blank frame. The DSP buffer quickly reaches saturation; if the duration of the DSP buffer in the unsaturated state does not exceed the duration threshold (such as the above), blank frames may not be inserted into the DSP buffer.

可选的，在本申请的一个实施例中，上述S204c，可以包括：Optionally, in an embodiment of the present application, the foregoing S204c may include:

对DSP缓冲区内的携带有网络标识的视频帧数据进行解码；其中，所述携带有网络标识的视频帧为来源于预先建立的视频流缓冲区的帧数据。The video frame data carrying the network identifier in the DSP buffer is decoded; wherein the video frame carrying the network identifier is frame data derived from a pre-established video stream buffer.

可以理解的是，网络服务器侧发送的视频帧数据都携带有时间戳等网络标识，本申请实施例中的携带有网络标识的视频帧数据可以理解为网络服务器侧发送到播放设备中的视频帧数据，由于网络服务器侧发送的帧数据首先缓存到H.264缓冲区中，故也可以理解为来源于预先建立的H.264缓冲区的视频帧数据。It can be understood that the video frame data sent by the network server side carries the network identifier such as the time stamp. The video frame data carrying the network identifier in the embodiment of the present application can be understood as the video frame sent by the network server side to the playback device. Data, since the frame data sent by the network server side is first buffered into the H.264 buffer, it can also be understood as video frame data derived from a pre-established H.264 buffer.

在本申请实施例中，可以只对包含实质内容的视频帧进行解码，从而维持了该视频帧的原始展示效果。In the embodiment of the present application, only the video frame containing the substantial content may be decoded, thereby maintaining the original display effect of the video frame.

可选的，在本申请的一个实施例中，在上述S204a之前，还可以包括：Optionally, in an embodiment of the present application, before the foregoing S204a, the method may further include:

检测当前是否有视频流数据缓存到所述DSP缓存区。 It is detected whether there is currently video stream data buffered to the DSP buffer.

在本申请实施例中，在检测DSP缓冲区是否处于不饱和状态之前，可以通过检测当前是否有视频流数据缓存到该DSP缓冲区，来确定是否需要检测DSP缓冲区是否处于不饱和状态；如果检测到当前没有视频流数据缓存到该DSP缓冲区，则可以确定当前网络阻塞较严重，如果此时该DSP缓冲区中残留部分视频流数据，则可以向该DSP缓冲区***空白帧，以达到将上述残留视频流数据尽快进行解码的目的。In the embodiment of the present application, before detecting whether the DSP buffer is in an unsaturated state, it may be determined whether the DSP buffer is in an unsaturated state by detecting whether the video stream data is currently buffered to the DSP buffer; If it is detected that there is no video stream data buffered to the DSP buffer, it can be determined that the current network congestion is serious. If some video stream data remains in the DSP buffer at this time, a blank frame can be inserted into the DSP buffer to achieve The above residual video stream data is decoded as soon as possible.

可选的，在本申请的一个实施例中，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述方法还可以包括：Optionally, in an embodiment of the present application, the video frame of the video file carries a timestamp, and the audio frame of the video file carries a timestamp. The method may further include:

可选的，本申请实施例方法也可以应用于音视频异步播放的场景。Optionally, the method in this embodiment of the present application may also be applied to a scenario in which audio and video are played asynchronously.

可以理解的是，在某些应用场景，比如警察实时跟踪罪犯、实时远程控制等实时性要求较高的场景，更侧重于音频流与视频流播放的实时性，此时，电子设备可以直接对解码后的音频或视频进行播放，而无需再对两者同步播放，从而避免了网络原因导致的视频流堵塞，音频流无法播放；或音频流阻塞时，视频流也无法播放的发生。It can be understood that in some application scenarios, such as real-time tracking of criminals, real-time remote control, and other real-time scenarios, the focus is on the real-time performance of audio stream and video stream playback. At this time, the electronic device can directly The decoded audio or video is played without synchronous playback of the two, thereby avoiding blockage of the video stream caused by the network, the audio stream cannot be played, or the video stream cannot be played when the audio stream is blocked.

相应于上面的方法实施例，本申请实施例还提供了的一种音频流解码装置。Corresponding to the above method embodiment, an embodiment of the present application further provides an audio stream decoding apparatus.

如图7所示，一种音频流解码装置，可以包括：As shown in FIG. 7, an audio stream decoding apparatus may include:

帧数确定模块701，用于确定电子设备的音频流缓冲区当前缓存的音频帧的帧数；a frame number determining module 701, configured to determine a frame number of an audio frame currently buffered by an audio stream buffer of the electronic device;

丢帧模块702，用于当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；The frame loss module 702 is configured to: when the number of frames is greater than the first threshold, and less than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration, the audio stream buffer The audio frame inside is discarded;

音频帧解码模块703，用于对所述音频流缓冲区中未被丢弃的音频帧进行解码。The audio frame decoding module 703 is configured to decode an audio frame that is not discarded in the audio stream buffer.

可选的，在本申请的一个实施例中，所述丢帧模块702，还用于：Optionally, in an embodiment of the present application, the frame dropping module 702 is further configured to:

当该帧数达到所述总帧数时，立即对所述音频流缓冲区内的音频帧做丢弃处理。 When the number of frames reaches the total number of frames, the audio frames in the audio stream buffer are immediately discarded.

可选的，在本申请的一个实施例中，所述丢帧模块702对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值；或者Optionally, in an embodiment of the present application, after the frame dropping module 702 discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the First number threshold; or

所述丢帧模块702对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。After the frame dropping module 702 discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is less than The first number of thresholds.

可选的，在本申请的一个实施例中，所述帧数据确定模块703，具体用于：Optionally, in an embodiment of the present application, the frame data determining module 703 is specifically configured to:

在本申请实施例的一个实施方式中，所述统计周期大于所述预设时长。In an embodiment of the embodiment of the present application, the statistical period is greater than the preset duration.

可选的，在本申请的一个实施例中，所述丢帧模块702，具体用于：Optionally, in an embodiment of the present application, the frame dropping module 702 is specifically configured to:

或or

可选的，在本申请的一个实施例中，所述音频帧来源于视频文件，所述视频文件还包括视频帧；如图8所示，所述装置还包括：Optionally, in an embodiment of the present application, the audio frame is derived from a video file, and the video file further includes a video frame. As shown in FIG. 8, the device further includes:

视频帧解码模块704，用于对所述视频文件中的视频帧进行解码。The video frame decoding module 704 is configured to decode the video frame in the video file.

可选的，在本申请的一个实施例中，如图9所示，所述视频帧解码模块704，包括：Optionally, in an embodiment of the present application, as shown in FIG. 9, the video frame decoding module 704 includes:

第一检测子模块704a，用于检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述DSP缓冲区用于缓存视频帧数据；The first detecting sub-module 704a is configured to detect whether the digital signal processor DSP buffer is in an unsaturated state, wherein the DSP buffer is used to buffer video frame data;

空白帧填充子模块704b，用于在所述第一检测模块704a的检测结果为是的情况下，向DSP缓冲区中***空白帧EOS，直至该DSP缓冲区达到饱和状态；The blank frame filling sub-module 704b is configured to insert a blank frame EOS into the DSP buffer when the detection result of the first detecting module 704a is YES until the DSP buffer reaches a saturated state;

视频帧解码子模块704c，用于对DSP缓冲区内的帧数据进行解码。The video frame decoding sub-module 704c is configured to decode frame data in the DSP buffer.

可选的，在本申请的一个实施例中，所述视频帧解码模块704，还包括：Optionally, in an embodiment of the present application, the video frame decoding module 704 further includes:

第二检测子模块，用于检测当前是否有视频流数据缓存到所述DSP缓冲区，并在检测结果为否的情况下，触发所述第一检测模块进行工作，其中，所述视频流数据为：预先建立的视频数据缓冲区中的视频帧数据，所述预先建立的视频数据缓冲区用于缓存来源于网络服务器侧的视频帧数据。 a second detecting submodule, configured to detect whether a video stream data is currently buffered to the DSP buffer, and triggering the first detecting module to work if the detection result is negative, where the video stream data is It is: video frame data in a pre-established video data buffer, and the pre-established video data buffer is used to buffer video frame data originating from the network server side.

可选的，在本申请的一个实施例中，所述第一检测子模块704a，具体用于：Optionally, in an embodiment of the present application, the first detecting submodule 704a is specifically configured to:

或or

可选的，在本申请的一个实施例中，所述视频帧解码子模块704c，具体用于：Optionally, in an embodiment of the present application, the video frame decoding submodule 704c is specifically configured to:

可选的，在本申请的一个实施例中，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述装置还包括：Optionally, in an embodiment of the present application, the video frame of the video file carries a timestamp, and the audio frame of the video file carries a timestamp; the device further includes:

对于装置实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

为了描述的方便，描述以上装置时以功能分为各种模块分别描述。当然，在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above devices are described as being separately divided into various modules by function. Of course, the functions of each module can be implemented in the same software or software and/or hardware when implementing the present application.

为了实现上述目的，本申请实施例还提供了一种存储介质，其中，该存储介质用于存储应用程序，所述应用程序用于在运行时执行本申请实施例所述的一种音频流解码方法。其中，本申请所述的一种音频流解码方法，包括：In order to achieve the above object, the embodiment of the present application further provides a storage medium, where the storage medium is used to store an application, and the application is configured to perform an audio stream decoding according to an embodiment of the present application at runtime. method. The audio stream decoding method of the present application includes:

当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；When the number of frames is greater than the first number threshold, and is smaller than an audio frame that can be buffered by the audio stream buffer After the preset number of frames, the audio frames in the audio stream buffer are discarded;

为了实现上述目的，本申请实施例还提供了一种应用程序，其中，该应用程序用于在运行时执行本申请实施例所述的一种音频流解码方法。其中，本申请所述的一种音频流解码方法，包括：In order to achieve the above object, the embodiment of the present application further provides an application program, where the application is used to perform an audio stream decoding method according to an embodiment of the present application at runtime. The audio stream decoding method of the present application includes:

所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序，以用于：The processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for:

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or are inherent to such processes, methods, articles or equipment Elements. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，所述的程序可以存储于计算机可读取存储介质中，这里所称得的存储介质，如：ROM/RAM、磁碟、光盘等。One of ordinary skill in the art can understand that all or part of the steps in implementing the above method embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium, which is referred to herein. Storage media such as ROM/RAM, disk, CD, etc.

以上所述仅为本申请的较佳实施例而已，并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本申请的保护范围内。 The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application.

Claims

一种音频流解码方法，其特征在于，该方法包括：An audio stream decoding method, the method comprising:

确定电子设备的音频流缓冲区当前缓存的音频帧的帧数；Determining the number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device;

当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；When the number of frames is greater than the first threshold, and is smaller than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration is exceeded, the audio frames in the audio stream buffer are discarded. ;

对所述音频流缓冲区中未被丢弃的音频帧进行解码。The audio frames that are not discarded in the audio stream buffer are decoded.
根据权利要求1所述的方法，其特征在于，还包括：The method of claim 1 further comprising:

当该帧数达到所述总帧数时，立即对所述音频流缓冲区内的音频帧做丢弃处理。When the number of frames reaches the total number of frames, the audio frames in the audio stream buffer are immediately discarded.
根据权利要求1或2所述的方法，其特征在于，对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值；或者The method according to claim 1 or 2, wherein after the audio frame in the audio stream buffer is discarded, the number of frames of the audio frame in the audio stream buffer is equal to the first quantity threshold. ;or

对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。After the discarding process is performed on the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is smaller than the first quantity threshold .
根据权利要求1或2所述的方法，其特征在于，确定电子设备的音频流缓冲区当前缓存的音频帧的帧数，包括：The method according to claim 1 or 2, wherein determining the number of frames of the audio frame currently buffered by the audio stream buffer of the electronic device comprises:

根据预设的统计周期，周期性地确定电子设备的音频流缓存区当前缓存的音频帧的帧数。The number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device is periodically determined according to a preset statistical period.
如权利要求4所述的方法，其特征在于，所述统计周期大于所述预设时长。The method of claim 4 wherein said statistical period is greater than said predetermined duration.
根据权利要求1或2所述的方法，其特征在于，所述对所述音频流缓冲区内的音频帧做丢弃处理，包括：The method according to claim 1 or 2, wherein the discarding the audio frames in the audio stream buffer comprises:

从所述音频流缓冲区的队列尾开始，对音频帧进行丢弃；Discarding the audio frame from the end of the queue of the audio stream buffer;

或 Or

从所述音频流缓冲区的队列头开始，对音频帧进行丢弃。The audio frame is discarded starting from the queue header of the audio stream buffer.
根据权利要求1所述的方法，其特征在于，所述音频帧来源于视频文件，所述视频文件还包括视频帧；所述方法还包括：对所述视频文件中的视频帧进行解码。The method of claim 1, wherein the audio frame is derived from a video file, the video file further comprising a video frame; the method further comprising: decoding a video frame in the video file.
根据权利要求7所述的方法，其特征在于，所述对所述视频文件中的视频帧进行解码，包括：The method according to claim 7, wherein the decoding of the video frame in the video file comprises:

检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述DSP缓冲区为数字信号处理器的输入缓冲区，所述DSP缓冲区用于缓存视频帧数据；Detecting whether the DSP buffer of the digital signal processor is in an unsaturated state, wherein the DSP buffer is an input buffer of the digital signal processor, and the DSP buffer is used for buffering video frame data;

如果是，则向DSP缓冲区中***空白帧，直至该DSP缓冲区达到饱和状态；If yes, insert a blank frame into the DSP buffer until the DSP buffer reaches saturation;

对DSP缓冲区内的帧数据进行解码。Decode the frame data in the DSP buffer.
根据权利要求8所述的方法，其特征在于，在所述检测数字信号处理器DSP缓冲区是否处于不饱和状态之前，还包括：The method according to claim 8, wherein before the detecting the digital signal processor DSP buffer is in an unsaturated state, the method further comprises:

检测当前是否有视频流数据缓存到所述DSP缓冲区，如果否，则执行所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述视频流数据为：预先建立的视频数据缓冲区中的视频帧数据，所述预先建立的视频数据缓冲区用于缓存来源于网络服务器侧的视频帧数据。Detecting whether there is currently video stream data buffered to the DSP buffer, and if not, executing whether the DSP signal buffer of the detection digital signal processor is in an unsaturated state, wherein the video stream data is: pre-established video data Video frame data in the buffer, the pre-established video data buffer is used to buffer video frame data originating from the network server side.
根据权利要求8所述的方法，其特征在于，所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，包括：The method according to claim 8, wherein the detecting whether the digital signal processor DSP buffer is in an unsaturated state comprises:

实时检测DSP缓冲区是否处于不饱和状态；Real-time detection of whether the DSP buffer is in an unsaturated state;

或or

根据预设的检测周期，周期性地检测DSP缓冲区是否处于不饱和状态。According to the preset detection period, it is periodically detected whether the DSP buffer is in an unsaturated state.
根据权利要求8所述的方法，其特征在于，所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，包括： The method according to claim 8, wherein the detecting whether the digital signal processor DSP buffer is in an unsaturated state comprises:

检测DSP缓冲区处于不饱和状态的时长是否超过预设的阈值。Checks whether the duration of the DSP buffer in the unsaturated state exceeds the preset threshold.
根据权利要求8所述的方法，其特征在于，所述检测数字信号处理器DSP缓冲区是否处于不饱和状态，包括：The method according to claim 8, wherein the detecting whether the digital signal processor DSP buffer is in an unsaturated state comprises:

检测DSP缓冲区中是否存在来自预先建立的视频数据缓冲区中的视频帧数据，且未被视频帧数据充满。It is detected whether there is video frame data from the pre-established video data buffer in the DSP buffer, and is not filled with video frame data.
根据权利要求12所述的方法，其特征在于，所述对DSP缓冲区内的帧数据进行解码，包括：The method according to claim 12, wherein the decoding of the frame data in the DSP buffer comprises:

对DSP缓冲区内的携带有网络标识的视频帧进行解码，所述携带有网络标识的视频帧为来源于预先建立的视频流缓冲区的帧数据。The video frame carrying the network identifier in the DSP buffer is decoded, and the video frame carrying the network identifier is frame data derived from a pre-established video stream buffer.
根据权利要求7所述的方法，其特征在于，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述方法还包括：The method according to claim 7, wherein the video frame of the video file carries a timestamp, and the audio frame of the video file carries a timestamp; the method further includes:

根据所述视频帧的时间戳与所述音频帧的时间戳的对应关系，对视频帧的解码结果及音频帧的解码结果进行同步播放。The decoding result of the video frame and the decoding result of the audio frame are synchronously played according to the correspondence between the time stamp of the video frame and the time stamp of the audio frame.
一种音频流解码装置，其特征在于，该装置包括：An audio stream decoding apparatus, the apparatus comprising:

帧数确定模块，用于确定电子设备的音频流缓冲区当前缓存的音频帧的帧数；a frame number determining module, configured to determine a frame number of an audio frame currently buffered by an audio stream buffer of the electronic device;

丢帧模块，用于当该帧数大于第一数量阈值，且小于所述音频流缓冲区能缓存的音频帧的总帧数时，在经过预设时长后，对所述音频流缓冲区内的音频帧做丢弃处理；a frame dropping module, configured to: when the number of frames is greater than the first threshold, and less than the total number of frames of the audio frame that can be buffered by the audio stream buffer, after the preset duration is exceeded, in the audio stream buffer Audio frames are discarded;

音频帧解码模块，用于对所述音频流缓冲区中未被丢弃的音频帧进行解码。And an audio frame decoding module, configured to decode an audio frame that is not discarded in the audio stream buffer.
根据权利要求15所述的装置，其特征在于，所述丢帧模块，还用于：The apparatus according to claim 15, wherein the frame dropping module is further configured to:

当该帧数达到所述总帧数时，立即对所述音频流缓冲区内的音频帧做丢弃处理。 When the number of frames reaches the total number of frames, the audio frames in the audio stream buffer are immediately discarded.
根据权利要求15或16所述的装置，其特征在于，所述丢帧模块对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第一数量阈值；或者The device according to claim 15 or 16, wherein after the frame dropping module discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to Describe the first number threshold; or

所述丢帧模块对所述音频流缓冲区内的音频帧做丢弃处理后，所述音频流缓冲区内音频帧的帧数等于所述第二数量阈值，且所述第二数量阈值小于所述第一数量阈值。After the frame dropping module discards the audio frame in the audio stream buffer, the number of frames of the audio frame in the audio stream buffer is equal to the second quantity threshold, and the second quantity threshold is smaller than The first number of thresholds is described.
根据权利要求15或16所述的装置，其特征在于，所述帧数据确定模块，具体用于：The device according to claim 15 or 16, wherein the frame data determining module is specifically configured to:

根据预设的统计周期，周期性地确定电子设备的音频流缓存区当前缓存的音频帧的帧数。The number of frames of the currently buffered audio frame of the audio stream buffer of the electronic device is periodically determined according to a preset statistical period.
根据权利要求18所述的装置，其特征在于，所述统计周期大于所述预设时长。The apparatus according to claim 18, wherein said statistical period is greater than said preset duration.
根据权利要求15或16所述的装置，其特征在于，所述丢帧模块，具体用于：The device according to claim 15 or 16, wherein the frame dropping module is specifically configured to:

从所述音频流缓冲区的队列尾开始，对音频帧进行丢弃；Discarding the audio frame from the end of the queue of the audio stream buffer;

或or

从所述音频流缓冲区的队列头开始，对音频帧进行丢弃。The audio frame is discarded starting from the queue header of the audio stream buffer.
根据权利要求15所述的装置，其特征在于，所述音频帧来源于视频文件，所述视频文件还包括视频帧；所述装置还包括：视频帧解码模块。The apparatus according to claim 15, wherein the audio frame is derived from a video file, the video file further comprising a video frame; and the apparatus further comprises: a video frame decoding module.
根据权利要求21所述的装置，其特征在于，所述视频帧解码模块，包括：The device according to claim 21, wherein the video frame decoding module comprises:

第一检测子模块，用于检测数字信号处理器DSP缓冲区是否处于不饱和状态，其中，所述DSP缓冲区用于缓存视频帧数据；a first detecting submodule, configured to detect whether the digital signal processor DSP buffer is in an unsaturated state, wherein the DSP buffer is used to buffer video frame data;

空白帧填充子模块，用于在所述第一检测模块的检测结果为是的情况下，向DSP缓冲区中***空白帧EOS，直至该DSP缓冲区达到饱和状态； a blank frame filling submodule, configured to insert a blank frame EOS into the DSP buffer when the detection result of the first detecting module is YES, until the DSP buffer reaches a saturated state;

视频帧解码子模块，用于对DSP缓冲区内的帧数据进行解码。The video frame decoding sub-module is configured to decode frame data in the DSP buffer.
根据权利要求22所述的装置，其特征在于，所述视频帧解码模块，还包括：The device according to claim 22, wherein the video frame decoding module further comprises:

第二检测子模块，用于检测当前是否有视频流数据缓存到所述DSP缓冲区，并在检测结果为否的情况下，触发所述第一检测模块进行工作，其中，所述视频流数据为：预先建立的视频数据缓冲区中的视频帧数据，所述预先建立的视频数据缓冲区用于缓存来源于网络服务器侧的视频帧数据。a second detecting submodule, configured to detect whether a video stream data is currently buffered to the DSP buffer, and triggering the first detecting module to work if the detection result is negative, where the video stream data is It is: video frame data in a pre-established video data buffer, and the pre-established video data buffer is used to buffer video frame data originating from the network server side.
根据权利要求22所述的装置，其特征在于，所述第一检测子模块，具体用于：The device according to claim 22, wherein the first detecting submodule is specifically configured to:

实时检测DSP缓冲区是否处于不饱和状态；Real-time detection of whether the DSP buffer is in an unsaturated state;

或or

根据预设的检测周期，周期性地检测DSP缓冲区是否处于不饱和状态。According to the preset detection period, it is periodically detected whether the DSP buffer is in an unsaturated state.
根据权利要求22所述的装置，其特征在于，所述第一检测子模块，具体用于：The device according to claim 22, wherein the first detecting submodule is specifically configured to:

检测DSP缓冲区处于不饱和状态的时长是否超过预设的阈值。Checks whether the duration of the DSP buffer in the unsaturated state exceeds the preset threshold.
根据权利要求22所述的装置，其特征在于，所述第一检测子模块，具体用于：The device according to claim 22, wherein the first detecting submodule is specifically configured to:

检测DSP缓冲区中是否存在来自预先建立的视频数据缓冲区中的视频帧数据，且未被视频帧数据充满。It is detected whether there is video frame data from the pre-established video data buffer in the DSP buffer, and is not filled with video frame data.
根据权利要求26所述的装置，其特征在于，所述视频帧解码子模块，具体用于：The device according to claim 26, wherein the video frame decoding submodule is specifically configured to:

对DSP缓冲区内的携带有网络标识的视频帧进行解码，所述携带有网络标识的视频帧为来源于预先建立的视频流缓冲区的帧数据。The video frame carrying the network identifier in the DSP buffer is decoded, and the video frame carrying the network identifier is frame data derived from a pre-established video stream buffer.
根据权利要求15所述的装置，其特征在于，所述视频文件的视频帧携带有时间戳，所述视频文件的音频帧携带有时间戳；所述装置还包括：The device of claim 15 wherein said video file is viewed The frequency frame carries a timestamp, and the audio frame of the video file carries a timestamp; the device further includes:

播放模块，用于根据所述视频帧的时间戳与所述音频帧的时间戳的对应关系，对视频帧的解码结果及音频帧的解码结果进行同步播放。 The playing module is configured to synchronously play the decoding result of the video frame and the decoding result of the audio frame according to the correspondence between the timestamp of the video frame and the timestamp of the audio frame.