CN109496333A - A kind of frame losing compensation method and equipment - Google Patents

A kind of frame losing compensation method and equipment Download PDF

Info

Publication number
CN109496333A
CN109496333A CN201780046044.XA CN201780046044A CN109496333A CN 109496333 A CN109496333 A CN 109496333A CN 201780046044 A CN201780046044 A CN 201780046044A CN 109496333 A CN109496333 A CN 109496333A
Authority
CN
China
Prior art keywords
frame
historical
information
future
current frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201780046044.XA
Other languages
Chinese (zh)
Inventor
高振东
肖建良
刘泽新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN109496333A publication Critical patent/CN109496333A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A kind of frame losing compensation method and equipment, comprising: receive speech code stream sequence;Obtain the history frame information and Future frame information in the speech code stream sequence, wherein, the speech code stream sequence includes the frame information of multiple speech frames, multiple speech frame includes at least one historical frames, at least one present frame and at least one future frame, at least one historical frames are located in the time domain before at least one present frame, at least one future frame is located in the time domain after at least one present frame, the history frame information is the frame information of at least one historical frames, which is the frame information of at least one future frame;According to the history frame information and Future frame information, the frame information of at least one present frame is estimated, improve the accuracy of frame losing compensation.

Description

Lost frame compensation method and equipment Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a frame loss compensation method and device.
Background
In a Packet Switching (PS) voice call, such as VoLTE (voice over long term evolution), VoWiFi (voice over wireless fidelity), and VoIP (voice over internet protocol), since voice data is not an exclusive bandwidth resource, bandwidth preemption and data blocking occur, which may cause delay jitter and frame loss, and voice interruption and blocking occur. In order to reduce the voice interruption and the stutter caused by the delay Jitter, an Adaptive Jitter Buffer (AJB) is usually adopted in the call scheme to reduce the delay Jitter effect in a certain time range.
In the prior art, the vocoder itself has a lost frame compensation function (PLC), and can estimate the code stream information of the current lost frame according to the good frame (history frame) information before the current lost frame. The code stream information of the current frame loss comprises formant spectrum (format) information, fundamental tone frequency, fractional fundamental tone, adaptive codebook gain, fixed codebook gain or energy. However, since the actual speech has a fast change speed, the phoneme, glottal, vocal tract, and oral cavity information included in the pronunciation of each word are constantly changed, and therefore, the frame loss compensation using the historical frames is not accurate enough.
Content of application
The application provides a frame loss compensation method and equipment, which can improve the accuracy of frame loss compensation.
In a first aspect, an embodiment of the present application provides a frame loss compensation method, including:
firstly, receiving a voice code stream sequence; acquiring historical frame information and future frame information in a voice code stream sequence, wherein the voice code stream sequence comprises frame information of a plurality of voice frames, the plurality of voice frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is positioned in front of the at least one current frame in a time domain, the at least one future frame is positioned behind the at least one current frame in the time domain, the historical frame information is the frame information of the at least one historical frame, and the future frame information is the frame information of the at least one future frame; and estimating the frame information of at least one current frame according to the historical frame information and the future frame information, thereby improving the accuracy of frame loss compensation.
In one possible design, before estimating frame information of at least one current frame, the method may determine a type or a state of a speech frame in a speech code stream sequence, including: it is determined whether there is a good frame before at least one current frame, whether at least one good frame before a current frame is a silence frame, whether there is a valid future frame, etc. Aiming at different types or different states of voice frames in a voice code stream sequence, different compensation measures are taken for the current frame, so that a recovered signal is closer to an original signal, and a better frame loss compensation effect is achieved.
In one possible design, after the receiving device receives the sequence of speech codestreams through the interface, the sequence of speech codestreams may be stored in a buffer, such as a buffer of an AJB. And then decoding the frame information of the voice code stream sequence in the buffer area to obtain decoded historical frame information and undecoded future frame information in the buffer area.
In another possible design, the historical frame information includes formant spectrum information for the historical frame, and the future frame information includes formant spectrum information for the future frame. Formant spectrum information for at least one current frame may be determined based on formant spectrum information for historical frames and formant spectrum information for future frames. For example, formant spectral information is the excitation response of the vocal tract when uttering.
In another possible design, before determining the formant spectrum information of at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame, a frame state in the speech code stream sequence may be determined, including: it is determined how many frames are lost, whether there are future good frames, whether there are good frames before the current frame, etc. And then, calculating the formant spectrum information of the current frame loss by adopting different methods according to the frame state in the voice code stream sequence.
In another possible design, the historical frame information includes a pitch value for the historical frame and the future frame information includes a pitch value for the future frame. A pitch value for at least one current frame may be determined based on pitch values of the historical frames and pitch values of future frames. For example, the pitch value is a gene frequency of vocal cord vibration at the time of utterance, and the gene period is a value in which the pitch frequency is reciprocal.
In another possible design, before determining the pitch value of at least one current frame based on the pitch values of the historical frames and the pitch values of the future frames, a frame state of the speech code stream sequence may be determined, including: judging how many frames are lost, whether future good frames exist, whether good frames exist before the current frame and the like, and then calculating the pitch value of the current lost frame by adopting different methods according to the frame state of the voice code stream sequence.
In another possible design, the magnitude of the spectral tilt of at least one current frame is determined according to the magnitude of a time-domain signal obtained by decoding a historical frame; and determining the frame type of the at least one current frame according to the magnitude of the spectral tilt of the at least one current frame. For example, the time domain signal is a time domain representation of the decoded analog frame information.
In another possible design, pitch change states of a plurality of subframes in at least one current frame may be obtained; and determining the frame type of at least one current frame according to the pitch change states of the plurality of subframes. The voiced sound is mainly generated by vocal cord vibration, fundamental tone exists, the vocal cord vibration changes slowly, and the fundamental tone changes slowly. For example, each subframe has a pitch, so that the pitch is used to determine the frame type of at least one current frame.
In another possible design, a frame type of the at least one current frame is determined, and at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame is determined based on the frame type. The current frame comprises a voice frame with fundamental tone and a voice frame with noise, the self-adaptive codebook gain is the energy gain of the fundamental tone part, and the fixed codebook gain is the energy gain of the noise part.
In another possible design, if the frame type is voiced, the adaptive codebook gain for at least one current frame is determined according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain for at least one current frame, and the average of the fixed codebook gains for multiple historical frames is used as the fixed codebook gain for at least one current frame.
In another possible design, if the frame type is unvoiced, the fixed codebook gain of at least one current frame is determined according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of at least one current frame, and the average value of the adaptive codebook gains of a plurality of historical frames is used as the adaptive codebook gain of at least one current frame.
In another possible design, the energy gain of the at least one current frame is determined according to the time domain signal size in the decoded historical frame information and the length of each subframe in the historical frame. Wherein the energy gain of the current frame comprises the energy gain of the current frame in voiced speech or the energy gain of the current frame in unvoiced speech.
In a second aspect, the present application provides a frame loss compensation apparatus, which is configured to implement the method and functions performed by the user equipment in the first aspect, and is implemented by hardware/software, where the hardware/software includes units corresponding to the functions.
In a third aspect, the present application provides a frame loss compensation device, including: a vocoder, a memory, and a communication bus, the memory coupled to the vocoder through the communication bus; wherein, the communication bus is used for realizing the connection communication between the processor and the memory, and the vocoder executes the program stored in the memory for realizing the steps in the frame loss compensation method provided by the first aspect.
Yet another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.
Yet another aspect of the present application provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the above-described aspects.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
Fig. 1 is a schematic diagram of a frame loss compensation system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a sequence of speech sequences provided by an embodiment of the present application;
fig. 3 is a schematic flow chart of a frame loss compensation method according to an embodiment of the present application
Fig. 4 is a schematic flow chart of another frame loss compensation provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of a frame loss compensation apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a frame loss compensation device according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a frame loss compensation system according to an embodiment of the present disclosure. The system can be applied to PS voice call (including but not limited to VoLTE, VoWiFi and VoIP) scenarios, and can also be applied to Circuit Switched (CS) call in case of adding buffering. The system comprises a base station and a receiving device, which may refer to a device providing a voice and/or data connection to a user, i.e. a user device, and may also be connected to a computing device such as a laptop or desktop computer, or it may be a stand-alone device such as a Personal Digital Assistant (PDA). A receiving device can also be called a system, subscriber unit, subscriber station, mobile, remote station, access point, remote terminal, access terminal, user agent, or user device. A base station may be an access point, a node b, an evolved node b (eNB), or a 5G base station, and refers to a device in an access network that communicates over the air-interface, through one or more sectors, with wireless terminals. By converting received air-interface frames to IP (internet protocol) packets, the base station may act as a signal relay device between the wireless terminal and the rest of the access network, which may include an internet protocol network. The base station may also coordinate the management of attributes for the air interface. Both the base station and the user equipment of this embodiment can adopt the frame loss compensation method mentioned in the following embodiments, and include corresponding frame loss compensation devices to implement frame loss compensation for the speech signal. When any device receives the voice code stream sequence from the opposite device, the voice code stream sequence can be decoded to obtain decoded frame information, and compensation and subsequent decoding are performed on lost frames.
Fig. 2 is a schematic diagram of a speech code stream sequence according to an embodiment of the present application. As shown in fig. 2, at the current time T, a common speech signal, i.e., a speech frame, is divided into three frames, including: historical frames (at least one frame before time T), current frames (at least one frame at time T), and future frames (at least one frame at time T), where time T is a unit or point in time of a period of time. For example, the historical frames include N-1 frames, N-2 frames, N-3 frames, N-4 frames, etc., and the current frame is a lost frame including: n frames and N +1 frames, the future frames include N +2 frames, N +3 frames, N +4 frames, etc., and N is a positive integer greater than 4. The lost frame or missed frame call involved in the frame loss compensation in this embodiment may include a frame lost in transmission, a corrupted frame, a frame that cannot be correctly received or decoded, or a frame that cannot be used for a particular reason.
Fig. 3 is a flowchart illustrating a frame loss compensation method according to an embodiment of the present application. As shown in fig. 3, the method in the embodiment of the present application includes:
s301, obtaining historical frame information and future frame information in the speech code stream sequence, wherein the speech code stream sequence comprises frame information of a plurality of speech frames, the speech frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is located before the at least one current frame in a time domain, the at least one future frame is located after the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame. At least one current frame may be a frame loss due to various reasons.
In a specific implementation, after the receiving device receives the sequence of speech code streams through the interface, the sequence of speech code streams may be stored in a buffer of a memory or a buffer, for example, a buffer of an AJB. And then decoding the frame information of the voice code stream sequence in the buffer area to obtain the decoded historical frame information and the undecoded future frame information in the buffer area. For example, the decoded history frame is a speech analog signal, and the history frame before decoding is a speech digital signal. The future frame is not decoded, and the frame loss compensation apparatus or system can parse the future frame to obtain partially valid frame information, such as formant spectrum information and pitch value. Wherein information or data of a plurality of frames including the history frame, the current frame and the future frame are buffered in the buffer.
S302, compensating the frame information of the at least one current frame according to the historical frame information and the future frame information.
In a specific implementation, before performing S302, the type or state of the speech frame in the speech code stream sequence may be determined, including: it is determined whether there is a good frame before the frame loss (i.e., a normal frame that can be used as compensation), whether the good frame before the frame loss is a silence frame, whether there is a valid future frame, etc. Aiming at different types or different states of the voice frames in the voice code stream sequence, different compensation measures are taken for the current frame in S302, so that the recovered signal is closer to the original signal, and the better frame loss compensation effect is achieved. Fig. 4 shows a specific frame loss compensation method, and fig. 4 is a schematic flow chart of another frame loss compensation method provided in the embodiment of the present application. The method comprises the following steps:
s401, judging whether the current frame is a lost frame or a bad frame. S402, if the current frame is a good frame, the good frame is decoded. S403, if the current frame is a bad frame or a lost frame, determine whether a good frame before the current frame is a Silence frame (SID). S404, if the good frame before the current frame is the silence frame, directly decoding the silence frame. S405, if the good frame before the current frame is not the silence frame, it is determined whether there is a valid future frame after the current frame. S406, if no effective future frame exists after the current frame, compensating the current frame according to the historical frame information. S407, if there is a valid future frame, compensating for the current frame loss according to the historical frame information and the future frame information, and the specific implementation of this step is described in detail below. S408, after compensating the frame information of the current frame, decoding the compensated current frame.
In this embodiment, the step S407 is described in detail, and the current frame loss may be compensated by comprehensively considering the historical frame information and the future frame information, and the specific method is as follows:
in one embodiment, the historical frame information includes formant spectrum information for the historical frame, and the future frame information includes formant spectrum information for the future frame. In the solution of this embodiment, the formant spectrum information of the at least one current frame may be determined according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame. The formant spectrum information is an excitation response of a vocal tract during utterance, and includes derivative Spectral Frequency (ISF) information, and hereinafter, the formant spectrum information is expressed by using an ISF vector.
For example, N-2 frames and N-1 frames in the speech code stream sequence are good frames, N frames and N +1 frames are lost, future N +2 frames and N +3 frames exist, and formant spectrum information of the N frames is calculated by adopting first-order polynomial fitting. .
ISFi(N-1)=a+b×(N-1);
ISFi(N+2)=a+b×(N+2);
Wherein, the formant spectrum information of the N-1 frame and the N +2 frameInformation is represented by formant spectrum information for a plurality of points, the formant spectrum information being processed by a filter, each point representing a coefficient pair of the filter. ISFi(N-1) is formant spectrum information corresponding to the i-th point of the N-1 frame, ISFi(N +2) is formant spectrum information corresponding to the ith point of the N +2 frame, and is calculated by the two formulas:
substituting the above a and b into ISFi(N) a + b.N, wherein ISFi(N) is formant spectrum information corresponding to the ith point of the N frames, and is obtained by calculation:
optionally, before determining the formant spectrum information of the at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame, a frame state in the speech code stream sequence may be determined, including: it is determined how many frames are lost, whether there are future good frames, whether there are good frames before the current frame, etc. And then, calculating the formant spectrum information of the current frame loss by adopting different methods according to the frame state in the voice code stream sequence.
For example, if two frames before the current frame loss are good frames, 1-2 frames are lost, and there is no future good frame, the first-order polynomial in the prior art is used to perform fitting using the two frames before the current frame loss. If the frame before the current frame loss is a good frame, one or more frames are lost, and no future good frame exists, the frame before the current frame loss and the ISF are utilizedmean(i) And (6) fitting. Losing 3 or more frames and having good frames in the future, and using ISF by using a first-order polynomialmean(i) And future good frames. If the three frames before the current frame loss are good frames, 1-2 frames are lost, and there are future good frames, a first-order polynomial is used to fit the good frames before the current frame loss and the future good frames, which is described in detail above. Wherein ISFmean(i) The calculation method is as follows:
ISFmean(i)=β×ISFconst_mean(i)+(1-β)×ISFadaptive_mean(i),i=0,…,15;
wherein past _ ISFq(i) Is the formant spectrum information corresponding to the ith point of the previous frame of the current frame loss, β is a preset constant value ISFconst_mean(i) Is a section ofAverage value of formant spectrum information in the middle range.
In addition, ISF is obtained through calculationmean(i) Then, according to the formant spectrum information and ISF of the previous frame of the current frame lossmean(i) Calculating the formant spectrum information ISF of the current frame lossq(i) In that respect The calculation formula is as follows:
ISFq(i)=α×past_ISFq(i)+(1-α)×ISFmean(i) and i is 0, …,15 and α which are preset constants.
In another embodiment, the historical frame information includes a pitch value of the historical frame, and the future frame information includes a pitch value of the future frame. In this embodiment, the pitch value of the at least one current frame may be determined according to the pitch values of the historical frames and the pitch values of the future frames. For example, the pitch value is the pitch frequency of vocal cord vibrations at the time of utterance, and is the reciprocal of the pitch period. Or the pitch value is the pitch period.
For example, the speech code stream sequence includes four pitches of each frame. And setting the N-2 frame and the N-1 frame as good frames, not setting the N frame and the N +1 frame, and setting the N +2 frame and the N +3 frame, and calculating the pitch value of the N frame by adopting a second-order polynomial according to the fitting of the N-1 frame and the N +2 frame. Knowing the pitch value pitch of the N-1 frame1(N-1),pitch2(N-1),pitch3(N-1),pitch4Pitch values pitch of (N-1) and N +2 frames1(N+2),pitch2(N+2),pitch3(N+2),pitch4(N + 2). Where pitch represents the pitch value, N represents the frame number, and the subscript represents the position of the sub-frame within each frame, e.g., one pitch per sub-frame. The second order polynomial is as follows:
Y=a0+a1x+a2x2wherein a is0、a1、a2The coefficients of the fitted curves, respectively, may be preset according to engineering design experience. The following matrix can be obtained according to the principle of the least square sum of the deviations:
where N is the sum of the sub-frames of the N-1 frame and the N +2 frame (N is 8), and xiIs the time point of the ith sub-frame in the N-1 frame and the N +2 frame, yiIs the ith of the N-1 frame and the N +2 frameAnd the pitch value corresponding to the time point of the subframe. Batch type1(N-1) the time point is defined as 4 · (N-1) +1, pitch2(N-1) the time point is defined as 4 · (N-1) +2, and so on pitch4The (N +2) time point is defined as 4 · (N +2) + 4. Then 4 · (N-1) +1 and pitch are added separately1(N-1), (4) · (N-1) +2, and (4 · N +1), (…), (4 · N + 4) are each substituted as variables x into Y ═ a, which is a known coefficient, respectively0+a1x+a2x2And the calculated Y value is used as the pitch value pitch of the N frame1(N),pitch2(N),pitch3(N),pitch4(N)。
Optionally, before determining the pitch value of the at least one current frame according to the pitch values of the historical frames and the pitch values of the future frames, the determining a frame state of the speech code stream sequence may include: judging how many frames are lost, whether future good frames exist, whether good frames exist before the current frame loss and the like, and then calculating the fundamental tone value of the current frame loss by adopting different methods according to the frame state of the voice code stream sequence.
For example, if two frames before the current frame loss are good frames, 1-3 frames are lost, and there is no future good frame, the second-order polynomial is used to fit the pitch value of the current frame loss by using the good frames before the current frame loss. If the current frame loss comprises a voice frame with fundamental tone and a voice frame with noise, if 4 frames or more are lost, the energy of the fundamental tone is reduced, and only the fundamental tone value of the noise is compensated. The three frames before the current frame loss are good frames, 1-3 frames are lost, and there are future good frames, and the second-order polynomial is adopted to fit the pitch value of the current frame loss by using the good frames before the current frame loss and the future good frames, which is already introduced above.
S303, judging the frame type of the current frame. Wherein the frame types include unvoiced and voiced. The voiced sound and the unvoiced sound have great difference in sounding characteristics, the current frame has different frame types, and the adopted frame loss compensation strategy is also different. The difference between voiced and unvoiced sounds is that voiced signals have a pronounced periodicity, which is caused by vocal cord vibrations. The periodicity detection may employ algorithms such as zero-crossing rate, correlation, spectral tilt (spectral tilt), or pitch change rate. The zero-crossing rate and the correlation calculation are applied in the prior art and are not described. The following description is presented in terms of determining the frame state of a speech signal by spectral tilt and pitch change rate.
In one embodiment, the magnitude of the spectral tilt of the at least one current frame may be determined according to the magnitude of a time-domain signal obtained by decoding a historical frame; and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame. The pitch frequency of the voiced speech signal is 500Hz or less, and the periodic signal can be determined from the spectral tilt. The spectral tilt formula is calculated as follows:
wherein the content of the first and second substances,
wherein tilt is the magnitude of the spectral tilt of the current frame, s is the magnitude of the analog time domain signal obtained by decoding the historical frame, and i is the time point of the time domain signal in the time direction. For a speech coding sequence over a period of time, r0Is fixed, the value of r1 is relatively small because unvoiced sound has a white noise-like characteristic, and if the value of tilt is smaller than a preset threshold, the frame type of the at least one current frame is determined to be unvoiced sound. And for voiced sounds, r1If the value of tilt is not less than the preset threshold, determining that the frame type of the at least one current frame is unvoiced.
In another embodiment, pitch change states of a plurality of subframes in the at least one current frame may be obtained; and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes. The voiced sound is mainly generated by vocal cord vibration, fundamental tone exists, the vocal cord vibration changes slowly, and the fundamental tone changes slowly. Each sub-frame has a fundamental tone, so the frame type of the current frame loss is judged by the fundamental tone.
Wherein, pitchchangePitch change state in 4 subframes in one frame, pitch (i) is the pitch value in the i-th subframe, and pitch (i +1) is the pitch value in the i + 1-th subframe. If pitchchangeIf the change is small, at least one current frame is judged to be voiced, if the change is large, the at least one current frame is judged to be unvoiced, and the judgment can be carried out by changing the changeAnd comparing the voice signal with a preset threshold value to finish the comparison, and determining that the voice signal is unvoiced when the threshold value is reached, or determining that the voice signal is voiced otherwise. If extension to interframe judgment is desired, i can be increased to 0,1, …,7, so that the pitch change of 8 subframes in two frames can be judged.
S304, adjusting the energy of at least one current frame.
In a specific implementation, a frame type of the at least one current frame may be determined, and at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame may be determined according to the frame type. The current frame comprises a voice frame of fundamental tone and a voice frame of noise, the self-adaptive codebook gain is the energy gain of the fundamental tone part, and the fixed codebook gain is the energy gain of the noise part.
In one embodiment, if the frame type is voiced, the adaptive codebook gain of the at least one current frame is determined according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and the average value of the fixed codebook gains of a plurality of historical frames is used as the fixed codebook gain of the at least one current frame. Wherein, one history frame may be the latest history frame before the current frame.
In another embodiment, if the frame type is unvoiced, the fixed codebook gain of the at least one current frame is determined according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and the average value of the adaptive codebook gains of a plurality of historical frames is used as the adaptive codebook gain of the at least one current frame.
[ correction 02.08.2017 based on rules 91]For example, if the number of currently dropped frames does not exceed 3 frames, the currently dropped frames are enhanced in energy adjustment, including: if the frame status of the current frame loss is voiced, the adaptive codebook gain of the current frame loss is the fixed codebook gain g of the current frame lossc=median5(gc(n-1),...,gc(n-5)), wherein gp(n-1) adaptive codebook addition for the most recent one of said historical framesYi, GvoiceEnergy gain, T, for the currently lost frame in voiced speechcThe pitch period of the most recent historical frame, median5 (g)c(n-1),...,gc(n-5)) is the average of the fixed codebook gains for the last five historical frames. If the frame status of the current frame loss is unvoiced, the adaptive codebook gain g of the current frame lossp=median5(gp(n-1),...,gp(n-5)), the fixed codebook gain for the current frame loss, where mean 5 (g)p(n-1),...,gp(n-5)) is the average of the adaptive codebook gains for the last five historical frames, gc(n-1) fixed codebook gain for the last historical frame, GnoiseFor the energy gain of the current dropped frame in unvoiced sound, TcThe pitch period of the last historical frame.
For another example: if the number of the current frame loss exceeds 3 frames, attenuating the current frame loss in terms of energy adjustment, including: adaptive codebook gain g for current frame lossp=Pp(state)×median5(gp(n-1),...,gp(n-5)), fixed codebook gain g for the current dropped framec=Pc(state)×median5(gc(n-1),...,gc(n-5)). Of these, mean 5 (g)p(n-1),...,gp(n-5)) is the average of the adaptive codebook gains for the last five historical frames, mean 5 (g)c(n-1),...,gc(n-5)) is the average of the fixed codebook gains for the last five historical frames, e.g., Pp(state) is a decay factor (P)p(1) Is 0.98, Pp(2)=0.96,Pp(3)=0.75,Pp(4)=0.23,Pp(5)=0.05,Pp(6) 0.01). E.g. Pc(state) is a decay factor (P)c(1)=0.98,Pc(2)=0.98,Pc(3)=0.98,Pc(4)=0.98,Pc(5)=0.98,Pc(6)=0.70),state={0,1,2,3,4,5,6}。
Wherein the at least one current energy gain may be determined according to a time domain signal size in the decoded historical frame information and a length of each subframe in the historical frame. Wherein, the energy gain of the current frame comprises the energy gain of the current frame lost in the voiced sound or the energy gain of the current frame lost in the unvoiced sound. Since future frames are not decoded at the current time, the energy gain of the current dropped frame can only be determined from historical frame information. When the previous good frame is voiced, the energy gain calculation formula of the current lost frame is as follows:
wherein S is the time domain signal size obtained by decoding the previous good frame of the current frame, and the length is 4Lsubfr,LsubfrDenotes the length of one subframe, TcRepresents the pitch period of the previous good frame, i is the point in time of the time domain signal in the time direction. To prevent GvoiceToo large or too small results in unpredictable energy of the recovered frame, GvoiceLimited to [0, 2 ]]。
When the previous good frame is unvoiced or noisy, the calculation formula of the energy gain of the current frame is as follows:
wherein S represents the time domain signal size obtained by decoding the previous good frame of the current frame, LsubfrDenotes the length of one subframe, and i is a time point of the time domain signal in the time direction. To prevent GnoiseToo large or too small results in unpredictable energy of the recovered frame, GnoiseLimited to [0, 2 ]]。
In summary, in the embodiment of the present invention, the historical frame information and the future frame information in the speech code stream sequence are first obtained, and then the formant spectrum information, the pitch value, the fixed codebook gain, the adaptive codebook gain and the energy of the current frame loss in the speech signal are estimated according to the historical frame information and the future frame information. The accuracy of frame loss compensation is improved by simultaneously utilizing historical frame information and future frame information to compensate frame loss.
The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a frame loss compensation apparatus, which may be a vocoder and may include, for example, a receiving module 501, an obtaining module 502 and a processing module 503, where the detailed description of each module is as follows.
A receiving module 501, configured to receive a speech code stream sequence;
an obtaining module 502, configured to obtain historical frame information and future frame information in the speech code stream sequence, where the speech code stream sequence includes frame information of multiple speech frames, the multiple speech frames include at least one historical frame, at least one current frame, and at least one future frame, the at least one historical frame is located before the at least one current frame in a time domain, the at least one future frame is located after the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;
a processing module 503, configured to estimate frame information of the at least one current frame according to the historical frame information and the future frame information.
Wherein the voice code stream sequence is stored in a buffer area;
optionally, the processing module 503 is specifically configured to: decoding frame information of a plurality of voice frames of the voice code stream sequence in the buffer area to obtain decoded historical frame information; obtaining the future frame information that is not decoded from the buffer.
Wherein the historical frame information comprises formant spectrum information of the at least one historical frame, the future frame information comprising formant spectrum information of the at least one future frame;
optionally, the processing module 503 is specifically configured to: and determining the formant spectrum information of the at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
Wherein the historical frame information includes a pitch value of the at least one historical frame, and the future frame information includes a pitch value of the at least one future frame;
optionally, the processing module 503 is specifically configured to: determining a pitch value of the at least one current frame based on the pitch value of the at least one historical frame and the pitch value of the at least one future frame.
Wherein the historical frame information comprises an energy of the at least one historical frame, the future frame information comprises an energy of the at least one future frame;
optionally, the processing module 503 is specifically configured to: determining the energy of the at least one current frame according to the energy of the at least one historical frame and the energy of the at least one future frame.
Optionally, the processing module 503 is specifically configured to: determining a frame type of the at least one current frame, the frame type comprising an unvoiced sound or a voiced sound;
determining at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame according to the frame type.
Optionally, the processing module 503 is further configured to determine a magnitude of the spectral tilt of the at least one current frame;
and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame.
Optionally, the processing module 503 is further configured to obtain pitch change states of multiple subframes in the at least one current frame;
and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes.
Optionally, the processing module 503 is specifically configured to: and if the frame type is voiced, determining the adaptive codebook gain of the at least one current frame according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the fixed codebook gains of a plurality of historical frames as the fixed codebook gain of the at least one current frame.
Optionally, the processing module 503 is specifically configured to: and if the frame type is unvoiced, determining the fixed codebook gain of the at least one current frame according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the adaptive codebook gains of a plurality of historical frames as the adaptive codebook gain of the at least one current frame.
Optionally, the processing module 503 is further configured to determine the at least one current energy gain according to a time domain signal size in the decoded historical frame information and a length of each subframe in the historical frame.
It should be noted that, the specific functional implementation of each module may also correspond to the corresponding description of the method embodiment shown in fig. 3, and execute the method and the function executed in the foregoing embodiment.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a frame loss compensation device according to the present application. The apparatus may include: at least one vocoder 601, e.g., an Adaptive Multi-Rate vocoder (AMR-WB), at least one communication interface 602, at least one memory 603, and at least one communication bus 604. Wherein a communication bus 604 is used to enable the connection communication between these components. The communication interface 602 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 603 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 603 may optionally be at least one memory device located remotely from the vocoder 601. The memory 603 stores a set of program codes therein, and may further be used to store temporary data such as intermediate operation data of the vocoder 601. The vocoder 601 executes the program code in the memory 603 to implement the method mentioned in the previous embodiments, and the description of the previous embodiments can be specifically referred to. Further, the vocoder 601 can cooperate with the memory 603 and the communication interface 602 to perform the operations of the receiving device in the embodiments of the above-mentioned application. The vocoder 601 may specifically include a processor, such as a Central Processing Unit (CPU) or a Digital Signal Processor (DSP), etc., that executes the program code. For example, the communication interface 602 may be used to receive a sequence of speech codestreams.
It is understood that the memory 603 may not store program code, and in this case the vocoder 601 may include a hardware processor such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a hardware accelerator formed of integrated circuits that do not need to execute the program code. At this time, the memory 603 may be used only for storing temporary data such as intermediate operation data of the vocoder 601.
In the above embodiments, the functions of the method may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer or its internal processor. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (23)

  1. A method for frame loss compensation, the method comprising:
    receiving a voice code stream sequence;
    acquiring historical frame information and future frame information in the voice code stream sequence, wherein the voice code stream sequence comprises frame information of a plurality of voice frames, the plurality of voice frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is positioned in front of the at least one current frame in a time domain, the at least one future frame is positioned behind the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;
    estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.
  2. The method of claim 1, further comprising: storing the voice code stream sequence in a buffer area;
    the acquiring of the historical frame information and the future frame information in the speech code stream sequence comprises:
    decoding frame information of a plurality of voice frames of the voice code stream sequence in the buffer area to obtain decoded historical frame information;
    obtaining the future frame information that is not decoded from the buffer.
  3. The method of claim 1 or 2, wherein the historical frame information comprises formant spectrum information of the at least one historical frame, the future frame information comprises formant spectrum information of the at least one future frame;
    estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:
    determining formant spectrum information of the at least one current frame based on the formant spectrum information of the at least one historical frame and the formant spectrum information of the at least one future frame.
  4. A method according to any one of claims 1 to 3, wherein said historical frame information comprises a pitch value of said at least one historical frame, and said future frame information comprises a pitch value of said at least one future frame;
    estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:
    determining a pitch value of the at least one current frame based on the pitch value of the at least one historical frame and the pitch value of the at least one future frame.
  5. The method of any of claims 1-4, wherein the historical frame information comprises an energy of the at least one historical frame, the future frame information comprises an energy of the at least one future frame;
    estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:
    determining the energy of the at least one current frame according to the energy of the at least one historical frame and the energy of the at least one future frame.
  6. The method of any of claims 1-5, wherein said estimating frame information for the at least one current frame based on the historical frame information and the future frame information comprises:
    determining a frame type of the at least one current frame, the frame type comprising an unvoiced sound or a voiced sound;
    determining at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame according to the frame type.
  7. The method of claim 6, wherein said determining the frame type of the at least one current frame comprises:
    determining a magnitude of spectral tilt of the at least one current frame;
    and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame.
  8. The method of claim 6, wherein said determining the frame type of the at least one current frame comprises:
    obtaining pitch change states of a plurality of subframes in the at least one current frame;
    and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes.
  9. The method of any of claims 6-8, wherein said determining at least one of an adaptive codebook gain and a fixed codebook gain for the at least one current frame based on the frame type comprises:
    and if the frame type is voiced, determining the adaptive codebook gain of the at least one current frame according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the fixed codebook gains of a plurality of historical frames as the fixed codebook gain of the at least one current frame.
  10. The method of any of claims 6-9, wherein said determining at least one of an adaptive codebook gain and a fixed codebook gain for the at least one current frame based on the frame type comprises:
    and if the frame type is unvoiced, determining the fixed codebook gain of the at least one current frame according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the adaptive codebook gains of a plurality of historical frames as the adaptive codebook gain of the at least one current frame.
  11. The method of claim 9 or 10, wherein the method further comprises:
    and determining the energy gain of the at least one current frame according to the time domain signal size in the decoded historical frame information and the length of each subframe in the historical frame.
  12. A frame loss compensation apparatus, the apparatus comprising:
    the receiving module is used for receiving the voice code stream sequence;
    an obtaining module, configured to obtain historical frame information and future frame information in the speech code stream sequence, where the speech code stream sequence includes frame information of multiple speech frames, the multiple speech frames include at least one historical frame, at least one current frame, and at least one future frame, the at least one historical frame is located before the at least one current frame in a time domain, the at least one future frame is located after the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;
    and the processing module is used for estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.
  13. The apparatus of claim 12, wherein the sequence of speech codestreams is stored in a buffer;
    the acquisition module is specifically configured to:
    decoding frame information of a plurality of voice frames of the voice code stream sequence in the buffer area to obtain decoded historical frame information;
    obtaining the future frame information that is not decoded from the buffer.
  14. The apparatus of claim 12 or 13, wherein the historical frame information comprises formant spectrum information for the at least one historical frame, the future frame information comprising formant spectrum information for the at least one future frame;
    the processing module is specifically configured to:
    and determining the formant spectrum information of the at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
  15. The apparatus according to any one of claims 12 to 14, wherein said historical frame information comprises a pitch value of said at least one historical frame, and said future frame information comprises a pitch value of said at least one future frame;
    the processing module is specifically configured to:
    determining a pitch value of the at least one current frame based on the pitch value of the at least one historical frame and the pitch value of the at least one future frame.
  16. The apparatus of any of claims 12 to 15, wherein the historical frame information comprises an energy of the at least one historical frame, the future frame information comprises an energy of the at least one future frame;
    the processing module is specifically configured to:
    determining the energy of the at least one current frame according to the energy of the at least one historical frame and the energy of the at least one future frame.
  17. The apparatus according to any one of claims 12 to 16, wherein the processing module is specifically configured to:
    determining a frame type of the at least one current frame, the frame type comprising an unvoiced sound or a voiced sound;
    determining at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame according to the frame type.
  18. The apparatus of claim 17, wherein the processing module is further configured to determine a magnitude of spectral tilt of the at least one current frame;
    and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame.
  19. The apparatus of claim 17, wherein the processing module is further configured to obtain pitch change statuses for a plurality of subframes in the at least one current frame;
    and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes.
  20. The apparatus according to any one of claims 17 to 19, wherein the processing module is specifically configured to:
    and if the frame type is voiced, determining the adaptive codebook gain of the at least one current frame according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the fixed codebook gains of a plurality of historical frames as the fixed codebook gain of the at least one current frame.
  21. The apparatus according to any one of claims 17 to 19, wherein the processing module is specifically configured to:
    and if the frame type is unvoiced, determining the fixed codebook gain of the at least one current frame according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the adaptive codebook gains of a plurality of historical frames as the adaptive codebook gain of the at least one current frame.
  22. The apparatus of claim 20 or 21, wherein the processing module is further configured to determine the energy gain of the at least one current frame according to the time domain signal size in the decoded historical frame information and the length of each subframe in the historical frame.
  23. A frame loss compensation apparatus, comprising: a memory, a communication bus, and a vocoder, the memory coupled to the vocoder through the communication bus; wherein the memory is configured to store program code, and the vocoder is configured to call the program code to:
    receiving a voice code stream sequence;
    acquiring historical frame information and future frame information in the voice code stream sequence, wherein the voice code stream sequence comprises frame information of a plurality of voice frames, the plurality of voice frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is positioned in front of the at least one current frame in a time domain, the at least one future frame is positioned behind the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;
    estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.
CN201780046044.XA 2017-06-26 2017-06-26 A kind of frame losing compensation method and equipment Pending CN109496333A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/090035 WO2019000178A1 (en) 2017-06-26 2017-06-26 Frame loss compensation method and device

Publications (1)

Publication Number Publication Date
CN109496333A true CN109496333A (en) 2019-03-19

Family

ID=64740767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780046044.XA Pending CN109496333A (en) 2017-06-26 2017-06-26 A kind of frame losing compensation method and equipment

Country Status (2)

Country Link
CN (1) CN109496333A (en)
WO (1) WO2019000178A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111554308A (en) * 2020-05-15 2020-08-18 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium
CN111711992A (en) * 2020-06-23 2020-09-25 瓴盛科技有限公司 Calibration method for CS voice downlink jitter
CN111836117A (en) * 2019-04-15 2020-10-27 深信服科技股份有限公司 Method and device for sending supplementary frame data and related components
CN112489665A (en) * 2020-11-11 2021-03-12 北京融讯科创技术有限公司 Voice processing method and device and electronic equipment
CN112634912A (en) * 2020-12-18 2021-04-09 北京猿力未来科技有限公司 Packet loss compensation method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004239930A (en) * 2003-02-03 2004-08-26 Iwatsu Electric Co Ltd Method and system for detecting pitch in packet loss compensation
KR20050024651A (en) * 2003-09-01 2005-03-11 한국전자통신연구원 Method and apparatus for frame loss concealment for packet network
CN1659625A (en) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 Method and device for efficient frame erasure concealment in linear predictive based speech codecs
CN101009098A (en) * 2007-01-26 2007-08-01 清华大学 Sound coder gain parameter division-mode anti-channel error code method
CN101147190A (en) * 2005-01-31 2008-03-19 高通股份有限公司 Frame erasure concealment in voice communications
CN101379551A (en) * 2005-12-28 2009-03-04 沃伊斯亚吉公司 Method and device for efficient frame erasure concealment in speech codecs
US20090240490A1 (en) * 2008-03-20 2009-09-24 Gwangju Institute Of Science And Technology Method and apparatus for concealing packet loss, and apparatus for transmitting and receiving speech signal
CN101630242A (en) * 2009-07-28 2010-01-20 苏州国芯科技有限公司 Contribution module for rapidly computing self-adaptive code book by G723.1 coder
CN101894558A (en) * 2010-08-04 2010-11-24 华为技术有限公司 Lost frame recovering method and equipment as well as speech enhancing method, equipment and system
CN102449690A (en) * 2009-06-04 2012-05-09 高通股份有限公司 Systems and methods for reconstructing an erased speech frame
CN103325375A (en) * 2013-06-05 2013-09-25 上海交通大学 Coding and decoding device and method of ultralow-bit-rate speech
CN103714820A (en) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 Packet loss hiding method and device of parameter domain
CN106251875A (en) * 2016-08-12 2016-12-21 广州市百果园网络科技有限公司 The method of a kind of frame losing compensation and terminal

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1659625A (en) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 Method and device for efficient frame erasure concealment in linear predictive based speech codecs
JP2004239930A (en) * 2003-02-03 2004-08-26 Iwatsu Electric Co Ltd Method and system for detecting pitch in packet loss compensation
KR20050024651A (en) * 2003-09-01 2005-03-11 한국전자통신연구원 Method and apparatus for frame loss concealment for packet network
CN101147190A (en) * 2005-01-31 2008-03-19 高通股份有限公司 Frame erasure concealment in voice communications
CN101379551A (en) * 2005-12-28 2009-03-04 沃伊斯亚吉公司 Method and device for efficient frame erasure concealment in speech codecs
CN101009098A (en) * 2007-01-26 2007-08-01 清华大学 Sound coder gain parameter division-mode anti-channel error code method
US20090240490A1 (en) * 2008-03-20 2009-09-24 Gwangju Institute Of Science And Technology Method and apparatus for concealing packet loss, and apparatus for transmitting and receiving speech signal
CN102449690A (en) * 2009-06-04 2012-05-09 高通股份有限公司 Systems and methods for reconstructing an erased speech frame
CN101630242A (en) * 2009-07-28 2010-01-20 苏州国芯科技有限公司 Contribution module for rapidly computing self-adaptive code book by G723.1 coder
CN101894558A (en) * 2010-08-04 2010-11-24 华为技术有限公司 Lost frame recovering method and equipment as well as speech enhancing method, equipment and system
CN103325375A (en) * 2013-06-05 2013-09-25 上海交通大学 Coding and decoding device and method of ultralow-bit-rate speech
CN103714820A (en) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 Packet loss hiding method and device of parameter domain
CN106251875A (en) * 2016-08-12 2016-12-21 广州市百果园网络科技有限公司 The method of a kind of frame losing compensation and terminal

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111836117A (en) * 2019-04-15 2020-10-27 深信服科技股份有限公司 Method and device for sending supplementary frame data and related components
CN111836117B (en) * 2019-04-15 2022-08-09 深信服科技股份有限公司 Method and device for sending supplementary frame data and related components
CN111554308A (en) * 2020-05-15 2020-08-18 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium
CN111711992A (en) * 2020-06-23 2020-09-25 瓴盛科技有限公司 Calibration method for CS voice downlink jitter
CN111711992B (en) * 2020-06-23 2023-05-02 瓴盛科技有限公司 CS voice downlink jitter calibration method
CN112489665A (en) * 2020-11-11 2021-03-12 北京融讯科创技术有限公司 Voice processing method and device and electronic equipment
CN112489665B (en) * 2020-11-11 2024-02-23 北京融讯科创技术有限公司 Voice processing method and device and electronic equipment
CN112634912A (en) * 2020-12-18 2021-04-09 北京猿力未来科技有限公司 Packet loss compensation method and device
CN112634912B (en) * 2020-12-18 2024-04-09 北京猿力未来科技有限公司 Packet loss compensation method and device

Also Published As

Publication number Publication date
WO2019000178A1 (en) 2019-01-03

Similar Documents

Publication Publication Date Title
CN109496333A (en) A kind of frame losing compensation method and equipment
US7778824B2 (en) Device and method for frame lost concealment
JP5571235B2 (en) Signal coding using pitch adjusted coding and non-pitch adjusted coding
JP5232151B2 (en) Packet-based echo cancellation and suppression
US9047863B2 (en) Systems, methods, apparatus, and computer-readable media for criticality threshold control
KR100581413B1 (en) Improved spectral parameter substitution for the frame error concealment in a speech decoder
KR101168648B1 (en) Method and apparatus for obtaining an attenuation factor
EP2140637B1 (en) Method of transmitting data in a communication system
CN107248411B (en) Lost frame compensation processing method and device
US8401865B2 (en) Flexible parameter update in audio/speech coded signals
SE521679C2 (en) Method and apparatus for suppressing noise in a communication system
TW201506909A (en) Apparatus and method for improved concealment of the adaptive codebook in acelp-like concealment employing improved pulse resynchronization
CN105408954B (en) Apparatus and method for improved concealment of adaptive codebooks in ACE L P-like concealment with improved pitch lag estimation
KR20150054716A (en) Generation of comfort noise
KR101409305B1 (en) Attenuation of overvoicing, in particular for generating an excitation at a decoder, in the absence of information
JPH10190498A (en) Improved method generating comfortable noise during non-contiguous transmission
JP2006323230A (en) Noise level estimating method and device thereof
RU2707144C2 (en) Audio encoder and audio signal encoding method
KR101452635B1 (en) Method for packet loss concealment using LMS predictor, and thereof recording medium
KR102132326B1 (en) Method and apparatus for concealing an error in communication system
US20040138878A1 (en) Method for estimating a codec parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190319