CN107424620B

CN107424620B - Audio decoding method and device

Info

Publication number: CN107424620B
Application number: CN201710625359.1A
Authority: CN
Inventors: 尚德建; 胡小鹏; 陈卫东
Original assignee: Shanghai Lingshi Communication Technology Development Co ltd; Suzhou Keyuan Software Technology Development Co ltd; Suzhou Keda Technology Co Ltd
Current assignee: Shanghai Lingshi Communication Technology Development Co ltd; Suzhou Keyuan Software Technology Development Co ltd; Suzhou Keda Technology Co Ltd
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2020-12-01
Anticipated expiration: 2037-07-27
Also published as: CN107424620A

Abstract

The invention relates to the field of multi-party audio and video conferences and discloses an audio decoding method and an audio decoding device, wherein the audio decoding method comprises the following steps: decoding the audio frame to obtain a decoding result; judging whether the decoding result has voice or not; and skipping N frames of audio frames when no voice exists, and returning to execute the step of decoding the audio frames after skipping the N frames of audio frames to obtain a decoding result, wherein N is more than or equal to 1. In the audio and video conference system, when a plurality of terminals are accessed, the decoding end carries out frame skipping decoding on the terminal without sound, thereby effectively reducing the decoding load of the whole system, reducing the occupancy rate of the system memory, improving the tone quality when the decoded audio enters the audio mixing processing platform, and simultaneously increasing the access path number and improving the system utilization rate. Compared with the prior art, the audio decoding method provided by the invention does not need to perform acoustic model detection scoring on each audio frame, and further effectively reduces the occupancy rate of platform resources.

Description

Audio decoding method and device

Technical Field

The invention relates to the technical field of audio and video conferences, in particular to an audio decoding method and device.

Background

The audio and video conference is a teleconference providing audio and video services, and the system is a virtual conference realized by a network communication technology. At present, with the development of communication and internet technology, a remote audio/video interaction multiparty conference system has been rapidly popularized.

The audio and video conference system is a complete system containing audio and video and having data interaction, specifically when an audio conference or an audio and video conference is held, audio access platforms of a plurality of terminals need to be subjected to audio mixing processing and sent back to the terminals, so that the conference is normally carried out, wherein the process of decoding the audio data of each access terminal and then mixing the audio is a necessary process, but because all people do not produce sound, the audio of all the access terminals does not need to enter the audio mixing, only the terminal with the sound is selected to enter the audio mixing processing, the load of a server can be effectively reduced, and meanwhile, the quality of the voice after the audio mixing can also be improved.

In order to achieve the above purpose, many mainstream audio/video conference manufacturers currently determine which terminal enters the audio mixing process by judging energy information after decoding, however, under the condition that multiple terminals are accessed, high load of audio decoding and serious waste of decoding resources are caused. Some manufacturers also use the audio energy information to solve the problem of determining which terminals enter the audio mixing process by decoding or not when the terminals transmit the code stream, but the compatibility of new and old terminals cannot be achieved.

Chinese patent application publication No. CN106710606A discloses a speech processing method based on artificial intelligence, which comprises the following steps: scoring a current frame in a voice packet to be decoded by using an acoustic model; identifying whether the current frame is a quasi-mute frame or not according to the scoring result, if so, skipping the current frame during decoding, and not decoding the current frame; and if the current frame is not the quasi-mute frame, decoding the current frame during decoding. Although the speech processing method can avoid redundant decoding, the speech processing method skips and does not decode partial audio frames. However, the following problems still remain:

1. although partial audio is not decoded, the acoustic model detection scoring is still required to be carried out on each audio frame, and platform resources still need to be occupied;

2. when the voice packets to be decoded are quasi-silent frames and pseudo-silent frames which are alternately distributed, the adoption of the voice processing method can certainly cause the incoherence of the front voice and the rear voice and influence the effect of voice playing.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is as follows: in the prior art, each audio frame needs to be subjected to acoustic model detection and scoring, and the occupation of platform resources is increased.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

the invention provides an audio decoding method, which comprises the following steps:

decoding the audio frame to obtain a decoding result;

judging whether the decoding result has voice or not;

and skipping N frames of audio frames when no voice exists, and returning to execute the step of decoding the audio frames after skipping the N frames of audio frames to obtain a decoding result, wherein N is more than or equal to 1.

Optionally, when there is no speech, further comprising:

judging whether the continuous decoding time is 0 or not;

when the continuous decoding time is 0, executing the skipping of the N frames of audio frames, and returning to execute the decoding of the audio frames after the skipping of the N frames of audio frames to obtain a decoding result;

and when the continuous decoding time is not 0, reducing the continuous decoding time and acquiring the next frame of audio frame for decoding.

Optionally, when speech is present, the continuous decoding time is set to an initial value, and subsequent audio frames are continuously decoded until the continuous decoding time is reduced to 0.

Optionally, the method further comprises:

judging whether the frame skipping mark is in an open state, wherein the open state is used for indicating that the audio frame is not decoded;

and when the frame skipping mark is not in an on state, executing the step of decoding the audio frame.

Optionally, when there is no speech, skipping N frames of audio frames, and the step of decoding the audio frame after skipping the N frames of audio frames includes:

when no voice exists, acquiring the number of times of continuously decoding the voice;

and determining the value of N according to the number of times of continuously decoding the voice which does not exist, wherein the larger the number of times of continuously decoding the voice which does not exist is, the larger the value of N is.

The present invention also provides an audio decoding apparatus comprising:

the decoding unit is used for decoding the audio frame to obtain a decoding result;

a first judging unit, configured to judge whether a speech exists in the decoding result;

and the decoding unit is also used for skipping N frames of audio frames when no voice exists, and returning to execute decoding on the audio frames after the N frames of audio frames are skipped, wherein N is greater than or equal to 1.

Optionally, the method further comprises:

a second judgment unit configured to judge whether or not the continuous decoding time is 0 when no speech exists;

the decoding unit is configured to skip the N frames of audio frames and decode the audio frame after skipping the N frames of audio frames when the second determination unit determines that the continuous decoding time is 0.

The decoding unit is further configured to reduce the continuous decoding time and acquire a next frame of audio frame for decoding when the second determination unit determines that the continuous decoding time is not 0.

Optionally, the decoding unit is further configured to set the continuous decoding time to an initial value when speech is present, and continuously decode subsequent audio frames until the continuous decoding time is reduced to 0.

Optionally, the decoding unit includes:

the acquisition module is used for acquiring the times of continuously decoding the voice when the voice does not exist;

and the determining module is used for determining the value of N according to the times of continuously decoding the voice-absent times, wherein the larger the times of continuously decoding the voice-absent times is, the larger the value of N is.

The present invention also provides an audio decoding terminal, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform the audio decoding method of any of claims 1 to 6.

Compared with the prior art, the technical scheme of the invention has the following advantages:

the audio decoding method provided by the invention comprises the steps of firstly decoding an audio frame to obtain a decoding result; then judging whether the decoding result has voice; and skipping N frames of audio frames when no voice exists, and returning to execute decoding on the audio frames after skipping the N frames of audio frames to obtain a decoding result. The audio decoding method selects frame skipping decoding for the subsequent audio frame under the condition of no voice through judging whether the current audio frame has voice or not, so that when a plurality of terminals are accessed in an audio and video conference system, the decoding terminal performs frame skipping decoding on the terminal without voice, the decoding load of the whole system is effectively reduced, the memory occupancy rate of the system is reduced, the tone quality of the decoded audio when entering a sound mixing processing platform is improved, meanwhile, the number of access paths can be increased, and the system utilization rate is improved. Compared with the prior art, the audio decoding method provided by the invention does not need to perform acoustic model detection scoring on each audio frame, and further effectively reduces the occupancy rate of platform resources.

The audio decoding method provided by the invention also judges the continuous decoding time under the condition of no voice, when the continuous decoding time is 0, the step of frame skipping decoding is executed, otherwise, the continuous decoding time is reduced, and the next frame of audio frame is obtained for decoding. Therefore, when the complete voice of a certain access terminal is interrupted, the normal decoding of the audio frame without the voice can be ensured, so that a section of complete voice is obtained, the continuity of the voice is ensured, and the smooth proceeding of an audio and video conference is facilitated.

According to the audio decoding method provided by the invention, when the decoded audio frame is judged to have voice, the continuous decoding time is recovered to be an initial value, and the subsequent audio frame is continuously decoded until the continuous decoding time is reduced to 0. The recovery of the continuous decoding time ensures that the subsequent audio frames can be continuously decoded until the frame skipping decoding is executed when the time without voice exceeds the continuous decoding time, namely the continuous decoding time is reduced to 0, so that the continuity of voice is ensured on one hand, and on the other hand, the access terminal without voice is ensured not to continuously decode, thereby reducing the decoding load of the whole system.

The audio decoding method provided by the invention also comprises the steps of acquiring the times of continuously decoding the voice-free voice when the voice does not exist, determining the value of N according to the times of continuously decoding the voice-free voice, wherein the times of continuously decoding the voice-free voice is positively correlated with N. The frame number N of the skipped audio frames is increased along with the increase of the times of continuously decoding the voice-free frames and is reduced along with the reduction of the times of continuously decoding the voice-free frames, so the flexibility of the frame skipping decoding is obviously improved, the decoding load is further reduced, and the whole decoding system is further optimized.

The audio decoding device provided by the invention comprises a decoding unit and a first judgment unit, wherein the decoding unit is used for decoding an audio frame to obtain a decoding result, the first judgment unit is used for judging whether voice exists in the decoding result, when the voice does not exist, the decoding unit skips N frames of audio frames, and returns to execute decoding on the audio frame after skipping the N frames of audio frames, wherein N is more than or equal to 1. The audio decoding device judges whether the current audio frame has voice through the first judging unit, and selects frame skipping decoding to the subsequent audio frame through the decoding unit under the condition of no voice, so that when a multi-channel terminal is accessed in an audio and video conference system, the decoding end performs frame skipping decoding to the terminal without voice, the decoding load of the whole system is effectively reduced, the memory occupancy rate of the system is reduced, the tone quality of the decoded audio when entering a sound mixing processing platform is improved, meanwhile, the number of access paths can be increased, and the system utilization rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention;

fig. 2 is a flowchart of a method of a specific example of an audio decoding method according to embodiment 1 of the present invention;

fig. 3 is a flowchart of a method of a specific example of an audio decoding method according to embodiment 2 of the present invention;

fig. 4 is a flowchart of a method of a specific example of an audio decoding method according to embodiment 3 of the present invention;

fig. 5 is a flowchart of a method of a specific example of an audio decoding method according to embodiment 3 of the present invention;

fig. 6 is a flowchart of a method of a specific example of an audio decoding apparatus according to embodiment 4 of the present invention;

fig. 7 is a block diagram showing a specific example of an audio decoding apparatus according to embodiment 4 of the present invention;

fig. 8 is a block diagram showing a specific example of an audio decoding apparatus according to embodiment 4 of the present invention;

fig. 9 is a block diagram showing a specific example of an audio decoding apparatus according to embodiment 4 of the present invention;

fig. 10 is a block diagram showing a specific example of an audio decoding apparatus according to embodiment 4 of the present invention;

fig. 11 is a block diagram showing a specific example of an audio decoding terminal according to embodiment 5 of the present invention;

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 shows a schematic view of an application scenario of an embodiment of the present invention, where an audio/video conference system includes a terminal side, a decoding platform, and a sound mixing processing platform, where there may be multiple terminals. Fig. 1 shows a situation where two terminals are accessed, that is, an audio and video conference system includes a decoding platform, a mixing processing platform, a first terminal, and a second terminal. The first terminal and the second terminal can be mobile clients such as smart phones and tablet computers, and can also be special audio and video conference system terminals.

The audio and video conference system works as follows:

the terminal side collects the audio data of the participants in real time, converts the audio data into audio frames, packages the audio frames and sends the audio frames to the decoding platform, the decoding platform decodes the received audio data, sends the processed audio to the audio mixing processing platform, and the audio mixing processing platform sends the audio data to each terminal side after carrying out audio mixing processing on the audio data.

Example 1

The embodiment provides an audio decoding method applied to a decoding platform, as shown in fig. 2, including the following steps:

and step S14, decoding the audio frame to obtain a decoding result.

In this embodiment, the audio frame may be decoded by using a decoding method such as PCM decoding, MP3 decoding, OGG decoding, or MPC decoding. Preferably, the decoding is performed by using a PCM decoding method in this embodiment, which can ensure that the decoding process has strong interference resistance, and is convenient for using computer programming, without increasing or with little increase in cost.

In this embodiment, the decoding result includes voice information, energy information, or other related audio information, where the voice information refers to a human voice, the audio/video conference system takes human speech as a first requirement, and the energy information refers to volume information carried by an audio frame.

Step S15, judging whether the decoding result has voice; when no voice is present, step S13 is performed, otherwise, other operations are performed.

In this embodiment, another operation may be to obtain the next audio frame, and return to step S14.

And S13, skipping N frames of audio frames, and returning to execute the step S14 for the audio frames after skipping N frames of audio frames, wherein N is more than or equal to 1.

The audio decoding method provided by the embodiment selects frame skipping decoding for the subsequent audio frame through the judgment of whether the current audio frame has voice or not and under the condition of no voice, so that when a multi-channel terminal is accessed in an audio and video conference system, the decoding end performs frame skipping decoding on the terminal without voice, the decoding load of the whole system is effectively reduced, the memory occupancy rate of the system is reduced, the tone quality of the decoded audio when entering a sound mixing processing platform is improved, meanwhile, the number of access paths can be increased, and the system utilization rate is improved. Compared with the prior art, the audio decoding method does not need to perform acoustic model detection scoring on each audio frame, and further effectively reduces the occupancy rate of platform resources.

Example 2

The embodiment provides an audio decoding method applied to a decoding platform, as shown in fig. 3, including the following steps:

and step S24, decoding the audio frame to obtain a decoding result. The same as step S14 in embodiment 1, which is not described herein again.

Step S25, judging whether the decoding result has voice; when no voice is present, step S27 is performed, otherwise step S26 is performed.

Step S26, setting the continuous decoding time to an initial value. Step S211 is then performed.

In this embodiment, the initial value of the continuous decoding time is a value greater than 0, for example, the initial value is 10 seconds or 15 seconds or the like, and may be freely set in actual circumstances.

Step S27, determining whether the continuous decoding time is 0, if the continuous decoding time is 0, executing step S23, and if the continuous decoding time is not 0, executing step S210.

And S23, skipping N frames of audio frames, and returning to execute the step S24 for the audio frames after skipping N frames of audio frames, wherein N is more than or equal to 1. The same as step S13 in embodiment 1, which is not described herein again.

And step S210, reducing the continuous decoding time.

Step S211, acquiring the next frame of audio frame, and returning to step S24.

In the audio decoding method provided in this embodiment, under the condition that no speech exists, the continuous decoding time is further determined, and when the continuous decoding time is 0, the step of frame skipping decoding is performed, otherwise, the continuous decoding time is reduced, and the next frame of audio frame is obtained for decoding. Therefore, when the complete voice of a certain access terminal is interrupted, the normal decoding of the audio frame without the voice can be ensured, so that a section of complete voice is obtained, the continuity of the voice is ensured, and the smooth proceeding of an audio and video conference is facilitated.

And when the decoded audio frame is judged to have voice, restoring the continuous decoding time to be an initial value, and continuously decoding the subsequent audio frame until the continuous decoding time is reduced to 0. The recovery of the continuous decoding time ensures that the subsequent audio frames can be continuously decoded until the frame skipping decoding is executed when the time without voice exceeds the continuous decoding time, namely the continuous decoding time is reduced to 0, so that the continuity of voice is ensured on one hand, and on the other hand, the access terminal without voice is ensured not to continuously decode, thereby reducing the decoding load of the whole system.

Example 3

The embodiment provides an audio decoding method applied to a decoding platform, as shown in fig. 4, including the following steps:

step S31, receiving the audio code stream sent by the terminal side;

step S32, determining whether the frame skipping flag is in an on state, where the on state is used to indicate that the audio frame is not decoded, if the frame skipping flag is in the on state, executing step S33, otherwise executing step S34.

In this embodiment, when the audio code stream sent by the terminal side is received for the first time, the state of the frame skipping flag is defaulted to not being turned on.

And S33, skipping N frames of audio frames, and returning to execute the step S34 for the audio frames after skipping N frames of audio frames, wherein N is more than or equal to 1.

And step S34, decoding the audio frame to obtain a decoding result.

Step S35, judging whether the decoding result has voice; when no voice is present, step S37 is performed, otherwise step S36 is performed.

In this embodiment, as shown in fig. 5, step S35 specifically includes:

step S351, determining whether there is voice information in the decoding result, if there is voice information, it represents that there is voice, and executing step S36, otherwise, executing step S352.

Step S352 determines whether the energy information is higher than a preset threshold, if the energy information is higher than the preset threshold, the step S36 is executed, otherwise, the step S37 is executed, if the energy information is higher than the preset threshold, the speech is present.

Step S36, setting the continuous decoding time to an initial value. Step S311 is then performed.

Step S37, determining whether the continuous decoding time is 0, if the continuous decoding time is 0, executing step S33, and if the continuous decoding time is not 0, executing step S310.

Step S38, the number of times of continuous decoding of speech absence is acquired.

And step S39, determining the value of N according to the number of times of continuously decoding the voice, wherein the larger the number of times of continuously decoding the voice is, the larger the value of N is.

Step S310, reducing the continuous decoding time.

Step S311, the next frame of audio frame is acquired, and the process returns to step S34.

In the audio decoding method provided by this embodiment, when the audio code stream accesses the decoding end, it is first necessary to determine whether the frame skipping flag of the decoding end is turned on, if so, frame skipping decoding is performed, and if not, the current frame is directly decoded, so that the degree of automation of the decoding end is improved, and unnecessary decoding is avoided.

In addition, the number of times of continuously decoding the voice-absent frame is positively correlated with N, and the frame number N of the skipped audio frame is increased along with the increase of the number of times of continuously decoding the voice-absent frame and is reduced along with the reduction of the number of times of continuously decoding the voice-absent frame, so that the flexibility of frame skipping decoding is obviously improved, the decoding load is further reduced, and the whole decoding system is further optimized.

When no voice message exists, the energy information is further judged, the situation that an accessed audio frame which has small volume but belongs to human voice is skipped and is not decoded is avoided, and the audio processing effect is improved.

Example 4

The present embodiment provides an audio decoding apparatus for performing the audio decoding method in embodiment 1. As shown in fig. 6, the audio decoding apparatus includes:

a decoding unit 41, configured to decode the audio frame to obtain a decoding result;

a first judgment unit 42, configured to judge whether speech exists in the decoding result;

when there is no speech, the decoding unit 41 is further configured to skip the N frames of audio frames, and return to performing decoding on the audio frames after skipping the N frames of audio frames, where N is greater than or equal to 1.

As an alternative implementation manner of this embodiment, as shown in fig. 7, the method further includes:

a second judgment unit 43 for judging whether the continuous decoding time is 0 when no speech exists;

the decoding unit 41 is configured to skip the N-frame audio frame and decode the audio frame after skipping the N-frame audio frame when the second determination unit 43 determines that the continuous decoding time is 0.

The decoding unit 41 is further configured to reduce the continuous decoding time and acquire the next frame of audio frame for decoding when the second judging unit 43 judges that the continuous decoding time is not 0.

As an optional implementation manner of this embodiment, the decoding unit 41 is further configured to set the continuous decoding time as an initial value when speech exists, and continuously decode subsequent audio frames until the continuous decoding time is reduced to 0.

As an optional implementation manner of this embodiment, as shown in fig. 8, the method further includes:

a receiving unit 44, configured to receive an audio code stream sent by a terminal side;

a third judging unit 45, configured to judge whether the frame skip flag is in an on state, where the on state is used to indicate that the audio frame is not decoded;

the decoding unit 41 is further configured to decode the audio frame when the frame skip flag is not in the on state.

As an alternative implementation of this embodiment, as shown in fig. 9, the decoding unit 41 includes:

an obtaining module 411, configured to obtain, when there is no speech, the number of times that no speech exists after continuous decoding;

a determining module 412, configured to determine a value of N according to the number of times that no speech is decoded continuously, where the larger the number of times that no speech is decoded continuously, the larger the value of N.

As an alternative implementation of this embodiment, as shown in fig. 10, the first judging unit 42 includes:

a first determining subunit 421, configured to determine whether voice information exists in the decoding result;

the second determining subunit 422 is configured to determine whether the energy information is higher than a preset threshold when no voice information exists in the decoding result.

The audio decoding apparatus provided in this embodiment determines whether a current audio frame has a speech by the first determining unit 42, and selects frame skipping decoding for a subsequent audio frame by the decoding unit 41 when no speech exists, so that when a multi-channel terminal is accessed in an audio and video conference system, the decoding end performs frame skipping decoding for a terminal without sound, thereby effectively reducing the decoding load of the whole system, reducing the occupancy rate of a system memory, improving the tone quality of the decoded audio when entering the mixing processing platform, and simultaneously increasing the number of access paths and improving the system utilization rate.

In addition, the first and second judging

sub-units

421 and 422 are arranged to avoid skipping and not decoding audio frames with small volume but belonging to human voice, thereby improving the audio processing effect.

Example 5

The present embodiment provides an audio decoding terminal, as shown in fig. 11, the device includes one or more processors 51 and a memory 52, in which one processor 51 is taken as an example.

The audio decoding terminal may further include: and an audio image display (not shown) for displaying a level amplitude image of the audio. The processor 51, the memory 52 and the audio image display may be connected by a bus or other means, and the connection by the bus is exemplified in the figure.

The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

The memory 52, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the audio decoding method in the embodiment of the present invention. The processor 51 executes various functional applications of the server and data processing, i.e., implements the audio decoding method in the above-described embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 52.

The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the audio decoding apparatus, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 optionally includes a memory located remotely from the processor 51, and these remote memories may be connected to the audio decoding apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 52 and, when executed by the one or more processors 51, perform the audio decoding method described in embodiment 1.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For details of the technique not described in detail in the embodiment, reference may be made to the related description in the embodiment shown in fig. 2.

Example 6

The present embodiment provides a non-transitory computer storage medium storing computer-executable instructions that can perform the audio decoding method described in embodiment 1. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. An audio decoding method, comprising the steps of:

decoding the audio frame to obtain a decoding result;

judging whether the decoding result has voice or not;

skipping N frames of audio frames when no voice exists, and returning to execute decoding on the audio frames after the N frames of audio frames are skipped to obtain a decoding result, wherein N is more than or equal to 1;

when no voice exists, skipping N frames of audio frames, and decoding the audio frames after skipping the N frames of audio frames comprises:

2. The audio decoding method of claim 1, further comprising, when no speech is present:

judging whether the continuous decoding time is 0 or not;

and when the continuous decoding time is not 0, reducing the continuous decoding time, acquiring the next frame of audio frame, and returning to execute decoding on the audio frame to obtain a decoding result.

3. The audio decoding method of claim 2,

when speech is present, the continuous decoding time is set to an initial value, and subsequent audio frames are continuously decoded until the continuous decoding time is reduced to 0.

4. The audio decoding method of claim 1, further comprising:

5. An audio decoding apparatus, comprising:

when no voice exists, skipping N frames of audio frames, and returning to execute decoding on the audio frames after skipping the N frames of audio frames, wherein N is greater than or equal to 1;

the decoding unit includes:

6. The audio decoding apparatus according to claim 5, further comprising:

the decoding unit is configured to skip the N frames of audio frames and decode the audio frames after skipping the N frames of audio frames when the second determination unit determines that the continuous decoding time is 0;

7. The audio decoding apparatus according to claim 6,

the decoding unit is further configured to set the continuous decoding time to an initial value when speech is present, and continuously decode subsequent audio frames until the continuous decoding time is reduced to 0.

8. An audio decoding terminal, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform the audio decoding method of any of claims 1 to 4.