MXPA96002143A - System for speech compression based on adaptable codigocifrado, better - Google Patents

System for speech compression based on adaptable codigocifrado, better

Info

Publication number
MXPA96002143A
MXPA96002143A MXPA/A/1996/002143A MX9602143A MXPA96002143A MX PA96002143 A MXPA96002143 A MX PA96002143A MX 9602143 A MX9602143 A MX 9602143A MX PA96002143 A MXPA96002143 A MX PA96002143A
Authority
MX
Mexico
Prior art keywords
gain
filter
adaptive
signal
code
Prior art date
Application number
MXPA/A/1996/002143A
Other languages
Spanish (es)
Other versions
MX9602143A (en
Inventor
Kroon Peter
Original Assignee
Lucent Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/482,715 external-priority patent/US5664055A/en
Application filed by Lucent Technologies Inc filed Critical Lucent Technologies Inc
Publication of MX9602143A publication Critical patent/MX9602143A/en
Publication of MXPA96002143A publication Critical patent/MXPA96002143A/en

Links

Abstract

The present invention relates to a method for use in a speech processing system that includes a first portion comprising an adaptive encrypted code and a corresponding adaptive encrypted code amplifier and a second portion comprising a fixed encrypted code coupled to a filter in passing, the pass filter comprises a delay memory coupled to a pass filter amplifier, the method is characterized in that it comprises: determining the step filter gain based on a periodicity measurement of a speech signal; Samples of a signal in the pass filter based on the step filter gain determines

Description

SPEECH COMPRESSION SYSTEM BASED ON ENHANCED ADAPTABLE ENCRYPTION CODE Field of the Invention The present invention relates in general to speech compression systems based on adaptive encrypted code and more particularly to said systems operating to compress speech having a period of tone or height less than or equal to the length of the encrypted code vector (sub-frame). BACKGROUND OF THE INVENTION Many speech compression systems employ a sub-system to model the periodicity of a speech signal.
Two of these periodicity models in wide use in speech comprehension (or coding) systems are tone direction filter (PPF) and adaptive cipher code (ACB). The ACB is primarily a memory that stores samples of past speech signals, or their derivatives, such as excitation or speech residual signals (referred to below as speech signals). Periodicity is introduced (or modeled) by copying samples of the past speech signal (as stored in memory) in the present to "forecast" how the present speech signal will look. The PPF is a simple IIR filter that is typically of the form y (n) = x (n) + gpy (nM) (1) REP: 22573 where n is a sample index, and is the output, x is the power, m is a filter delay value and gp is a scale (or gain) value. Because the current output of the PPF depends on a past output, the periodicity is entered by the PPF. Although any of the ACB or PPF can be used in speech coding, these periodicity models do not operate identically under all circumstances. For example, while a PPF and an ACB will produce the same results when the speech speech tone period is greater than or equal to the sub-frame size (or codebook vector) this is not the case if the period of a tone smaller than the size of the sub-frame. This difference is illustrated by Figures 1 and 2 where it is considered that the pitch period (or delay) is 2.5 ms, but the sub-frame size is 5 ms. Figure 1 presents a conventional combination of a fixed cipher code (FCB) and an ACB as used in a typical CELP speech compression system (this combination is used in both the decoder and the CELP system encoder). As illustrated in the Figure, FCB 1 receives an index value, I, which causes the FCB to send a speech signal vector (excitation) of a predetermined duration. This duration is referred to as a sub-frame (here 5 ms). Illustratively, this speech excitation signal will consist of one or more main pulses located in the sub-frame. For purposes of clarity of presentation, the output vector is considered to have a single large pulse of unit magnitude. The output vector is scaled by a gain gc applied by the amplifier 5. In parallel with the operation FCB 1 and gain 5, the ACB 10 generates a speech signal based on previously synthesized speech. In a conventional manner, the ACB looks for its past speech memory for speech samples that most closely match the original speech that is encoded. These samples are in the neighborhood of a period of a tone (M) in the past from the present sample that tries to synthesize. These past speech samples may not exist if the tone is fractional; they may have to be synthesized by the ACB of the surrounding speech sample values by linear interpolation, as is conventional. The ACB uses a past sample identified (or synthesized) in this way as the current sample. For clarity of explanation, the rest of this discussion will consider that from this discussion it considers that the first tone is an integral multiple of the sample period and the past samples are identified by M to copy in the present sub-frame. The ACB sends out individual samples in this way for the entire sub-frame (5 ms). All samples produced by the ACB are scaled by a gain gp, applied by the amplifier 15.
For current samples in the second half of the sub-table the "past" samples used as the "present samples" are those samples in the first half of the sub-table. This is because the sub-frame is 5 ms in length, but the pitch period M, - and the period of time used to identify past samples for use with current samples - is 2.5 ms. Therefore, if the current sample to be synthesized is at the 4 ms point in the sub-table, the last speech sample is at the 4 ms-2.5 ms or 1.5 ms point in the same sub-table. The output signals from amplifiers FCB and ACB 5, 15 are summed in summation circuit 20, to result in an excitation signal for a conventional linear predictive synthesis (LPC) filter (not shown) . A stylized representation of a sub-frame of this excitation signal produced by the circuit 20 is also illustrated in Figure 1. Considering pulses of unit magnitudes before scaling, the coded code system results in several pulses in the sub. -5 ms frame A first pulse of height gp, a second pulse of height gc, a third pulse of height gp. The third pulse is simply a copy of the first pulse created by the ACB. It should be noted that there are no copies of the second pulse in the second half of the sub-frame, since the ACB memory does not include the second pulse (and fixed encrypted code has only one pulse per sub-frame). Figure 2 presents a periodicity model comprising a FCB 25 in series with a PPF 50. The PPF 50 comprises a summation circuit 45, a delay memory 35 and an amplifier 40. As with the system discussed above, an index I applied to FCB 25 causes the FCB to output an excitation vector corresponding to the index. This vector has a higher pulse. The vector is scaled by the amplifier 30 that applies the gain gc. The vector adjusted in scale ° again applies to PPF 50. PPF 50 operates in accordance with equation (1) above. A stylized representation of the output signal PPF 50 is also presented in Figure 2. The first pulse of the output subframe PPF is the result of a delay, M, applied to a larger pulse (which is considered to have unit amplitude) of the previous sub-frame (not shown). The next pulse in the sub-frame is a pulse contained in the output vector FCB scaled by the amplifier 30. Then, due to the delay 35 of 2.5 ms, these two pulses are repeated 2.5 ms later, respectively adjusted in scale by the amplifier 40. There are major differences between the output signals of the ACB and PPF implementations of the periodicity model. These are manifested in the second half of the synthesized sub-charts illustrated in Figures 1 and 2. First, the amplitudes of the third pulses are different - gp compared to gpX The second, there is no fourth pulse on the output of the model ACB. Considering this missing pulse, when the tone period is less than the frame size, the combination of the ACB and a FCB will not introduce a second contribution of the fixed coded code in the sub-frame. This is different from the operation of a filter with a serial tone forecast with a fixed encrypted code. Brief Description of the Invention For those speech coding systems that use a periodic ACB model, it has been proposed that a PPF be used at the output of the FCB. This PPF has a delay equal to the integer component of the pitch period and a fixed gain of 0.8. The PPF achieves the insertion of the missing FCB pulse from the sub-frame, but with a gain value that is speculative. The reason that the gain is speculative is that a joint quantification of the ACB and FCB gains prevents the determination of an ACB gain for the current sub-frame until both ACB and FCB vectors have been determined. The inventor of the present invention that the fixed gain aspect of the tone loop added to an ACB-based synthesizer results in synthesized speech that is too periodic sometimes, resulting in a non-natural "buzzing" of synthesized speech. The present invention solves a disadvantage of the proposed use of a PPF at the output of an FCB in systems employing an ACB. The present invention provides gain for PPF that is not fixed but adaptive based on a measure of periodicity of the speech signal. Adaptive PPF gain improves PPF performance since the gain is small when the speech signal is not very periodic and large when the speech signal is not highly periodic. This adaptability avoids the "buzz" problem. In accordance with the embodiment of the present invention. Speech processing systems including a first portion comprising an adaptive code and corresponding adaptive free code amplifier and a second portion comprising a fixed encrypted code coupled to a pass filter, are adapted to delay the gain of encrypted code adaptive, - determine the step filter gain, based on the delayed adaptive cipher code gain, and amplify samples of a signal in the through filter based on the determined step filter gain. The adaptive codebook gain is delayed by a sub-frame. The delayed gain is used since the quantized gain for the adaptive encrypted code is not available, until the fixed encrypted code gain is determined. The step filter gain is equal to the delayed adaptive codebook gain, except when the adaptive codebook gain is already less than 0.2 or greater than 0.8, in which case the tone filter gain is set equal to 0.2 or 0.8, respectively. The limits are for perceptually undesirable effects due to errors in estimating how periodic the excitation signal is currently. Brief Description of the Drawings Figure 1 presents a conventional combination of FCB and ACB systems as used in a typical CELP speech understanding system, as well as a stylized representation of a sub-frame of an excitation signal generated by the combination. Figure 2 presents a periodicity model comprising a FCB and a PPF, as well as a stylized representation of a sub-frame of an output signal PPF. Figure 3 presents an illustrative embodiment of a speech coder according to the present invention. Figure 4 presents an illustrative embodiment of a decoder according to the present invention. Detailed Description I. Introduction to Illustrative Modes For clarity of explanation, illustrative embodiments of the present invention are presented comprising individual functional blocks (including functional blocks labeled "processors"). The portions that these blocks represent may be provided through the use of either dedicated or shared hardware, including but not limited to, hardware capable of running software. For example, the processor functions presented in Figures 3 and 4 can be provided by a single shared processor. (Use of the term "processor" shall not be considered to refer exclusively to physical equipment capable of running software). Illustrative modes may comprise hardware, digital signal processor (DSP), such as AT &; T DSP16 or DSP32C, read-only memory (ROM) to store software that performs the operations discussed below, and random access memory (RAM) to store DSP results. Modalities of physical equipment with very large scale integration (VLSI), as well as customized VLSI circuitry in combination with a general purpose DSP circuit, can also be provided. The modalities described below are suitable for use in many speech compression systems such as, for example, that described in a preliminary draft recommendation G.729 to the ITU standards body (project G.729) which has been annexed to this as an appendix. This speech compression system operates at 8 kbit / s and is based on code-driven linear predictive coding (CELP) see project G.729 section 2. This draft recommendation includes a complete description of the speech coding system, as well as the use of the present invention. See in general, for example Figure 2 and the discussion of section 2.1 of the G.729 project. With respect to one embodiment of the present invention, see the discussion in sections 3.8 and 4.1.2 of the G.729 project. II. Illustrative Modes Figures 3 and 4 illustrate illustrative embodiments of the present invention employed in the encoder and decoder of the G.729 project. Figure 3 is a modified version of Figure 2 of the G.729 project that has been enlarged to show in detail the illustrative encoder mode. Figure 4 is similar to Figure 3 of the G.729 project enlarged to show the details of the illustrative encoder mode. In the discussion that follows, reference will be made to the G.729 project sections where appropriate. A general description of the G.729 project coder is presented in section 2.1 while a general description of the decoder is presented in section 2.2. A. The Encoder According to the embodiment, a feed speech signal (16 bits PCM at the sampling rate of 8 kHz) is provided to the preprocessor 100. The preprocessor 100 filters out the entire speech signal to remove components from the speech. Low frequency undesirable and scaled speech signal to avoid process overflow. See project G.729 section 3.1. The pre-processed speech signal s (n) is then provided to the linear projection analyzer 105. See project G.729 section 3.2. The linear prediction coefficients (LP) a "?; are provided to the synthesis filter LP 155 which receives the excitation signal u (n) formed from the combined signal of the FCB and ACB portions of the encoder. when using an analysis-by-synthesis search processing, where the error between the original and synthesized speech is minimized according to a distortion measurement perceptually weighted by the perceptual weighting filter 165. See project G.729 section 3.3. With respect to the ACB portion 112 of the embodiment, a signal representing the perceptually weighted distortion (error) is employed by a tone period processor 170 to determine an open-loop tone period (delay) used by the encrypted code system adaptive 110. The encoder uses the open-loop pitch period determined as the basis of a closed-loop tone system, ACB 110 calculates an encrypted code vector adaptive v (n) by interpolating the excitation passed to a selected fractional tone. See project G.729 sections 3.4-3.7. The adaptive encrypted code gain amplifier 115 applies a gp scale adjustment factor to the output of the ACB 110 system. See project G.729 section 3.9.2. Regarding the FCB portion 118 of the embodiment, an index generated by the medium square error search processor (MSE) 175 is received by the FCB system 120 and an encrypted code vector c (n) is generated in response. See project G.729 section 3.8. This encrypted code vector is provided to the PPF system 128 according to the present invention (see discussion below). The output of the PPF 128 system is scaled by the amplifier FCB 145, which applies a scaling factor §c. The gc scale adjustment factor is determined according to project G.729 section 3.9. The vectors that are sent out of the ACB and FCB 112 portions, and 118 of the encoder are summed in adder 150 and provide the LP synthesis filter as discussed above. B. The PPF system As mentioned above, the PPF system addresses the disadvantage of the ACB system that is displayed when the speech pitch period that is synthesized is less than the size of the sub-frame and the fixed PPF gain is too large for Speak that it is not very periodic.
The PPF system 128 includes a switch 126 which controls whether the PPF 128 contributes to the excitation signal. If the delay M, is less than the size of the sub-frame, L, then the switch 126 closes and the PPF 128 contributes to the excitation. If M > L switch 126 is open and PPF 128 does not contribute to excitation. A signal for the switching control K is set when M < L. It should be noted that the use of switch 126 is simply illustrative. Many alternate designs are possible, including for example, a switch that is used to derive PPF 128 entirely when M > . L. The delay employed by the PPF system is the entire portion of the tone period M, as calculated by the tone period processor 170. The delay processor memory 135 is released from the PPF 128 operation in each sub-frame. picture. The gain applied to the PPF system is provided by the delay processor 125. The processor 125 receives the ACB gain gp and stores it by a sub-frame (delay of a sub-frame). The stored gain value is then compared with upper and lower limits of 0.8 and 0.2, respectively. In case the stored value of the gain is already greater than the upper limit than the lower limit, the gain is adjusted to the respective limit. In other words, the preferred PPF gain is limited to a range of values greater than or equal to 0.2 and less than or equal to 0.8. Within that range, the gain can be considered the value of the gain of the delayed adaptive cipher code. The upper and lower limits are placed on the value of the adaptive PPF gain, such that the synthesized signal is neither on periodic or aperiodic, that both are perceptually undesirable. As such, extremely small or large ACB gain values should be avoided. It should be apparent to those with ordinary skill in the art that the ACB gain can be limited to the specific range before storage for a sub-frame. As such, the processor stores a signal that reflects the ACB gain, either pre- or post-filtered to the specified range. Also, the exact value of the upper and lower limits is a matter of choice that can be varied to achieve desired results in any specific embodiment of the present invention. C. The decoder The previously described encoder (and in the referenced sections of the G.729 project) provides a table of data representing compressed speech every 10 ms. The table comprises 80 bits and is detailed in Tables 1 and 9 of the G.729 project. Each 80-bit compressed speech frame is sent over a communications channel to a decoder that synthesizes speech signals (representing two sub-frames) based on the picture produced by the encoder. The channel in which the frames communicate (not shown) can be of any type, (such as conventional telephony network, cellular or wireless networks, ATM networks, etc.) and / or can comprise a storage medium (such such as magnetic storage, RAM or semiconductor ROM, optical storage, such as CD-ROM, etc.). An illustrative decoder in accordance with the present invention is presented in Figure 4. The decoder is very similar to the encoder of Figure 3 in that it includes both an adaptive encrypted code portion 240 and a portion? E fixed encrypted code 200. The decoder decodes the transmitted parameters (see project G.729 section 4.1) and performs synthesis to obtain reconstructed speech. The FCB portion includes an FCB 205 that responds to a FCB I index, communicated to the decoder from the encoder. The FCB 205 generates a vector c (n) of length equal to a sub-frame. See project G.729 section 4.1.3. This vector is applied to the PPF 210 of the decoder. The PPF 210 operates as described above (based on a gain value, ACB, delayed gp in the delay processor 225 and the ACB tone period, M, both received from the encoder by the channel) to result in a vector for application to the FCB 235 applicator. The amplifier, which applies a gain §c, from the channel generates a scaled version of the vector produced by PPF 210 See project G.729 section 4.1.4. The output signal of the amplifier 235 is supplied to the adder 255 which generates an excitation signal u (n). The output signal generated by the ACB portion 240 of the decoder is also provided to the adder 255. the ACB portion 240 comprises the ACB 245 which generates a contribution of the adaptive cipher code v (n) of length equal to a sub-frame based on past excitation signals and the ACB M pitch period, received from the encoder by the channel. See project G.729 section 4.1.2. This vector is scaled by the amplifier 250 based on the gain factor gp that is received on the channel. This scaled vector is the output of the ACB portion 240. The excitation signal u (n) produced by the adder 255 is applied to an LPC 260 synthesis filter that synthesizes the speech signal based on LPC coefficients, if received. on the channel. See project G.729 section 4.1.6. Finally, the output of the synthesis filter LPC 260 is supplied to a post-processor 265 that performs adaptive post-filtering (See project G.729 sections 4.2.1-4.2.4) high-pitched filtering (See project G.729 section 4.2.5) and adjustment in ascending scale (See project G.729 section 4.2.5). II. Discussion Although a number of specific embodiments of this invention have been illustrated and described and described, it will be understood that these embodiments are merely illustrative of the many possible specific assemblies that may be designed in application of the principles of the invention. Numerous and other varied arrangements can be designed in accordance with these principles by those of ordinary skill in the art, without departing from the spirit and scope of the invention. For example, in case of scalar gain quantization, the gain of the PPF can be adapted based on the current gain instead of the previous one, ACB. Also the values of the limits in the gain PPF (0.2, 0.8) are simply illustrative. Other limits, such as 0.1 and 0.7, may be sufficient. In addition, although the illustrative embodiment of the present invention relates to encrypted code "amplifiers", it will be understood by those of ordinary skill in the art that this term encompasses the scaling of digital signals. Furthermore, this scaling can be achieved with scale factors (or gains) that are less than or equal to one (including negative values), as well as greater than one.
Kroon 4 INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATIONS STANDARDIZATION SECTION Date: June 1995 Original: E STUDIES GROUP 15 CONTRIBUTION - Q. 12/15 PROJECT OF RECOMMENDATION G.729 Speech Coding 8 kbit / s using Linear Predictive Coding Excited by Code - Algebraic- of Conjugate Structure (CS-ACELP) June 7, 1995 version 4.0 Note; Until this recommendation is approved by ITU, neither the C code nor the test vectors will be available from ITU. To have source code C, contact: Mr. Gerhard Shroeder, Rapporteur SG15 / Q.12 Deutsche Telekom AG, Postfach 10003, 64276 Darmstadt, Germany Tel .: +49 6151 83 3973, Fax: +49 6151 83 7828, Emaíl: gerhard . [email protected] .dbp.de Contents 1. Introduction 2. General description of the encoder 2. 1 Encoder 2.2 Decoder 2.3 Delay 2.4 Description of speech encoder 2.5 Notational conventions 3. Functional description of the encoder 3. 1 Pre-processing 3.2 Analysis and quantification of linear projection 3.2.1 Window formation and self-correlation calculations 3.2.2 Levinson-Durbin algorithm 3.2.3 LP to LSP conversion 3.2.4 Quantification of LSP coefficients 3.2.5 Interpolation of the LSP coefficients 3.2.6 LP to LSP conversion 3. 3 Perceptual weighting 3.4 Open-loop tone analysis 3.5 Calculating the impulse response 3.6 Calculating the target signal 3.7 Searching the adaptive encrypted code 3.7.1 Generating the adaptive encrypted code vector 3.7.2 Calculating the encrypted codeword for delays of adaptive encryption code 3.7.3 Calculation of adaptive encryption code gain 3.8 Fixed encryption code: structure and search 3.8.1 Fixed encryption code search procedure 3.8.2 Calculation of code word for fixed encryption code 3.9 Quantification of gain 3.9.1 Gain forecast 3.9.2 Encoding code search for gain quantification 3.9.3 Calculation of coded code for gain quantizer 3.10 Memory update 3.11 Encoder and decoder initialization 4. Functional description of the decoder 4.1 Parameter decoding procedure 4.1.1 Decoding of LP filter parameters 4.1.2 Decoding of adaptive encrypted code vector 5 4.1.3 Decoding of fixed encrypted code vector 4.1.4 Decoding of encrypted code gains fixed and adaptable 4.1.5 Calculation of the parity bit 10 4.1.6 Calculation of reconstructed speech 4.2 Post-processing 4.2.1 Post-filter tone 4.2.2 Short-term post-filter 4.2.3 Slope compensation 15 4.2.4 Adaptive gain control 4.2.5 High-pitch filter and up-scaling adjustment 4.3 Table-erase and parity-error concealment 20 4.3.1 Repetition of LP filter parameters 4.3.2 Gain attenuation of fixed and adaptive code 4.3. 3 Attenuation of the gain predictor memory 4.3.4 Generation of the replacement excitation . Exact bit description of the CS-ACELP 5.1 encoder Use of simulation software 5.2 Organization of simulation software 1. Introduction This recommendation contains the description of an algorithm for the coding of speech signals at 8 kbits / s using the Linear Predictive Coding Excited by Code -Algebraic- of Conjugate Structure (CS-ACELP). This encoder is designed to operate with a digital signal obtained first by filtering telephone bandwidth (ITU Rec. G.710) of the analog power signal, then sample it at 8000 Hz, by conversion to 16-bit linear PCM for power to the encoder. The decoder output shall be converted back to an analog signal by similar means. Other power / output characteristics such as those specified by ITU Rec. G.711 for 64 kbit / s PCM data shall be converted to the 16-bit linear PCM when decoding, or 16-bit linear PCM to the appropriate format after decoding. The bit stream from the encoder to the decoder is defined within this standard. This recommendation is organized as follows: Section 2 gives a general profile of the CS-ACELP algorithm. Sections 3 and 4, the principles of the CS-ACELP encoder and decoder are discussed respectively. Section 5 describes the software that defines this encoder in 16-bit fixed-point arithmetic. 2. Coding Overview The CS-ACELP encoder is based on the code-driven linear predictive coding (CELP) model. The encoder operates in 10 ms speech frames corresponding to 80 samples at a sampling rate of 8000 samples / second. For each frame of 10 msec, the speech signal is analyzed to extract the parameters of the CELP model (LP filter coefficient, indices and gains of fixed and adaptive code). These parameters are encoded and transmitted. The bit allocation of the encoder parameters is illustrated in table l. In the decoder, these parameters are used to retrieve the synthesis and excitation filter parameters. Table 1: Bit allocation of the 8 kbit / CS-ACELP algorithm (10 msec frame). Parameter Code Subframe Subframe Total of word 1 2 per Frame LSP LO, Ll, L2, L3 18 Adaptive code book delay P1, P2 8 5 13 Delay parity PO 1 fixed codebook index Cl, C2 13 13 26 Fixed code book signal SI, S2 4 Table 1: Algorithm allocation 8 kbits / CS-ACELP (10 msec frame). (Continued) Parameter Code Subframe Subframe Total word 1 2 per Table Code book gains (stage 1) GA1, GA2 3 3 6 Code book winnings (stage 2) GB1. GB2 4 4 8 Total 80 Speech is reconstructed by filtering this excitation through the LP synthesis filter, as illustrated in Figure 1. The short-term synthesis filter is based on a linear tenth order (LP) forecast filter.
LMRODS nsnoDf Inmooc OOnOODB EXC? MOCN LAMOHAZd UHO SPEAKS DESALTS D6K? 0Oin Ma? N OE MRM ICTROa? COVUEMTE O TeBpSFtECßClA Figure 1: Block diagram of the conceptual CELP synthesis model.
The tone synthesis filter or long term filter is implemented using the adaptive encryption code approach so called, for delays less than the length of the subframe. After calculating the reconstructed speech, it is further improved by a post-filter. 2.1 Encoder The signal flow in the encoder is illustrated in Figure 2. The power signal is filtered by high pitch and scaled in the pre-processing block.
Figure 2: Signal flow in the CS-ACELP encoder.
The pre-processed signal serves as the power signal for all subsequent analysis. The LP analysis is performed once per 10 ms frame to calculate the LP filter coefficients. These coefficients are converted to line spectrum pairs (LSP) and quantified using two-stage (VQ) predictive quantization with 18 bits. The excitation sequence is chosen by using the analysis-by-synthesis search procedure where the error between the original and synthesized speech is minimized, according to a perceptually weighted dispersion measure. This is done by filtering the error signal with a perceptual weighting filter, whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to improve the performance for power signal with a flat frequency response. The excitation parameters (adaptive and fixed cipher code parameters) are determined per sub-frame of 5 ms (40 samples) each. The quantized and unquantized LP filter coefficients are used for the second sub-frame, while in the first sub-frame, the interpolated LP filter coefficients are used (both quantized and unquantized). An open loop tone delay is estimated once per frame of 10 ms, based on the perceptually weighted speech signal. Then, the following operations are repeated for each sub-table. The target signal x (n) is calculated by filtering the residual LP through the weighted synthesis filter W (z) / Á (z). The initial states of these filters are updated by filtering the error between residual LP and excitation. This is equivalent to the common approach of subtracting the zero feed response of the weighted bias filter from the weighted speech signal. The impulse response h (n) of the weighted synthesis filter is calculated. The closed loop tone analysis is then performed (to find the adaptive cipher code delay and gain) using the target x (n) and the impulse response h (n) when searching for the open loop tone delay value. A fractional tone delay with resolution 1/3, is used. The pitch delay is encoded with 8 bits in the first sub-frame and differentially encoded with 5 bits in the second sub-frame the target signal x (n) is updated by removing the adaptive code contribution (filtered adaptive code vector) ) and this new objective x2 (n) is used in the search for fixed algebraic cipher code (to find the optimal excitation.) An algebraic code with 17 bits is used for the excitation of fixed encrypted code. The fixed memories are quantized in vectors with 7 bits (with forecast MA applied to the gain of the fixed encrypted code.) Finally, the filter memories are updated using the determined excitation signal. 2. 2 Decoder The signal flow in the decoder is illustrated in Figure 3, first, the parameter indices are extracted from the received bitstream. These indices are decoded to obtain the encoder parameters that correspond to a speech frame of 10 ms. These parameters are the LSP coefficients, the 2 fractional pitch delays, the 2 fixed codebook vectors, and the 2 sets of fixed and adaptive codebook gains. The LSP coefficients are interposed and converted to LP filter coefficients for each sub-frame. Then, for each sub-frame of 40 samples, the following stages are carried out: • The excitation is built by adding the adaptable and fixed ciphered code vectors adjusted in scale for their respective gains.
Figure 3: Signal flow in the CS-ACELP decoder. • Speech is reconstructed by filtering the excitation through the LP synthesis filter.
• The reconstructed speech signal is passed through a post-processing stage, which comprises an adaptive post-filter based on long-term and short-term synthesis filters, followed by a high-pass filter operation and adjustment in scale. 2.3 Delay This encoder processes speech and other audio signals with a 10 ms frame. In addition, there is an anticipation of 5 ms, which results in a total algorithmic delay of 15 ms.
All additional delays in a practical implementation of this encoder are due to: • Processing time required for encoding and decoding operations. • Communication link transmission time, • Multipathized delay when combining audio data with other data. 2.4 Speech encoder description The description of the speech coding algorithm of this recommendation is made in terms of mathematical operations of fixed point, exact in bits. The ANSÍ C code indicated in section 5, which is an integral part of this recommendation, reflects this descriptive approach of exact fixed point in bits. The mathematical descriptions of the encoder (section 3) and decoder (section 4) can be implemented in various other ways, possibly leading to a codee implementation that does not comply with this recommendation. Therefore, the algorithm description of the C code in section 5 will take precedence over the mathematical descriptions of sections 3 and 4, when discrepancies are found. A non-exhaustive set of test sequences that can be used in conjunction with the C code is available from ITU. 2.5 Notational Conventions This document attempts to maintain the following notational conventions. • Code books are denoted by calligraphic characters, (e.g. C) • Time signal is denoted by the symbol and the sample time index Ce.gr. s (n)). The symbol n is used as an index at the time of the sample. • superscript indexes (e.g. g < m >) refer to that variable corresponding to sub-table m.
• Superscripts identify a particular element in a set of coefficients. • An "identifies a quantized version of a parameter. • Range notations are made using claudátores, where the limits are included ([0.6, 0.9]). • log denotes a logarithm with base 10. Table 2 lists the most relevant symbols used throughout this document: a glossary of the most relevant signals is given in Table 3. Table 2: Glossary of symbols. Name Reference Description 1 / A (z) Ec. (2) synthesis filter LP HhI (z) Ec. (1) high-pass filter feed Hpp ((zz)) E Ecc .. ((7777)) post- pitch filter Hf (z) Eq. (83) short-term post-filter Ht (z) Eq. (85) tilt compensation filter Hh2 (z) Eq. (90) high-pass filter output PP (() zz)) E Ecc .. ((4466)) tone filter W (z) Ec. (27) weighting filter Table 4 summarizes relevant variables and dimensions. Constant parameters are listed in Table 5. Acronyms used in this recommendation are summarized in Table .6.
Table 3: Glossary of Signals. Name Description h (n) Weighted filter response and synthesis impulse. r (k) Self-correlation sequence r '(k) Modified auto-correlation sequence R (k) Correlation sequence sw (n) Weighted speech signal without) Signal signal ss '' ((nn)) Signal signal in window sf (n) Post-filtered output sf (n) Post-filtered output with adjustment in gain scale without) Reconstructed speech signal rr (inn)) Residual signal xin) Target signal x2 in) Second objective signal vin) Contribution of encrypted code target cc (inn)) Contribution of fixed encrypted code yin) v (n ) * h (n) z (n) c (n) * h (n) uin) Excitation to synthesis filter LP din) Correlation between objective signal and h (n) eeww ((nn)) Error signal Table: Glossary of variables Name Size Description gp 1 Adaptive encryption code gain gc 1 Fixed encryption code gain g0 l Modified gain for tone post gpst 1 Tone gain for tone post g filter 1 Post-filter of term-gain a short term gc 1 Post-filter tilt gain term T ° P 1 open loop tone delay aa., 1 100 Coefficients LP K 10 Coe Reflective factors o. 2 Coefficients LAR _ 10 Normalized frequencies LSF g_ 10 LSP coefficients rr, (kk)) 1 111 correlation coefficients i 10 weighting coefficients LSP 1, 10 LSP quantizer output Table 5: Glossary of Constants Name Value Description f 8000 sampling frequency f0 60 bandwidth expansion? j_ 0.94 / 0.98 weight-perceptual weighting filter? 2 0.60 / [0.4-0.7] weight-perceptual weighting filter? n 0.55 post-weighting factor filter ? d 0.70 post weight factor filter P 0.50 p "ost weight factor tone filter Tt 0.90 / 0.2 post weight factor inclination filter C Table 7 fixed code number (algebraic) SECTION 3.2.4 Average prediction code for movement L sseecccciioon 3.2.4 First stage LSP code L2 commissioning 3.2.4 second-stage LSP encryption code (lower part) L3 section 3.2.4 second-stage LSP encryption code (upper part) GA section 3.9 first-stage gain encryption code Table 5: Glossary of Constants (Continued) Name Value Description GB section 3. 9 encryption code of gain of second stage the Ec. (6) correlation delay window Wlp Ec. (3) LPC analysis window Table 6: Glossary of acronyms. Acronic Description CELP linear forecast excited by code MA average of movement MSB bit more significant LP linear forecast LSP line spectral pair LSF line spectral frequency VO quantization of vector 3. Functional description of the encoder In this section we describe the following functions of the encoder represented in the blocks of Figure 1. 3.1 Pre-processing As stated in section 2, the power to the speech coder is considered as a 16-bit signal PCM Two pre-processing functions are applied before the encoding process: 1) adjustment in signal scale, and 2) high-tone filtering.
The scale adjustment consists of dividing the power by a factor of two to reduce the possibility of overflow in the fixed point implementation. The high-pitched filter serves as a precaution against undesirable low-frequency components. A second-order O / polo filter with a cutoff frequency of 140 Hz is used. Both high-pass filtering and scaling are combined by dividing by the two coefficients in the numerator of this filter. The resulting filter is given by 0. 46363718 - 0.92724705 * - * -I- Q.463637I8 * -2 h Z) "1-1.9059465Z-1 + 0.91140242--1 (1) The power signal filtered through Hhl (x) is referred to as s in), and will be used in all subsequent encoder operations. 3.2 Analysis and quantification of linear forecast The short-term analysis and the synthesis filters are based on the linear tenth order forecast filters (LP > The LP synthesis filter is defined as 1 i where a ± = 1, ..., 10, are the linear (quantified) prediction coefficients (LP). The short-term forecast, or linear prediction analysis is performed once per speech box using the auto-correlation approach with a symmetric window of 30 ms. Every 80 samples (10 ms), the window correlation self-correlation coefficients are calculated and converted to the LP coefficients, using Levinson's algorithm. Then, the LP coefficients are transformed to the LSP domain for quantification and interpolation purposes. The quantized and unquantized interpolated filters are converted back to the LP filter coefficients (to build the synthesis and weighting filters in each subframe). 3.2.1 Window formation and auto-correlation calculations The LP analysis window consists of two parts: the first part is half of a Hamming window and the second part is a quarter of the cosine function cycle, the window is given by : í 054-046co_ (5 $), n = 0 199. U "n) - \.« (?? fjfffl.), n = 200,, 239. (3) There is a 5 ms search in the LP analysis, which means that 40 samples of the future speech picture are required. This is converted or translated into an extra delay of 5 ms in the encoder stage. The LP analysis window is applied to 120 samples of the past speech frames, 80 samples of the present speech frame and 40 samples of the future frame. The formation of windows in the LP analysis is illustrated in Figure 4.
VENTANAS LP P E s SSS SUB PICTURES Figure 4: formation of windows in LP analysis. The different shading patterns identify corresponding excitation and LP analysis frame. The self-correlation coefficients of the speech formed in the window * '(n) s w, p (n) s (n), n = 0 239, (4) are calculated by 0 10, (5) To avoid arithmetic problems for low level power signals, the value of r (0) has a lower limit of = 1.0 A 60 Hz. An expansion of bandwidth is applied, by multiplying the coefficients of auto-correlation with »! -. (*) =« P h l. »L 10. (6) where f0 = 60 Hz is the bandwidth expansion and fa = 8000 Hz is the sampling frequency. In addition, r (0) is multiplied by the white noise correction factor 1,0001, which is equivalent to adding a noise floor to -40 dB. 3.2.2 Levinson-Durbin Algorithm Modified auto-correlation coefficients r '(0) = 1.0001 r (0) r' (k) = wlaa (k) r (k), Jb = l 10 (7) are used to obtain the filter coefficients LP ai / = 1, ... , 10, solving the set of equations 10 (8) The set of equations in (8) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion: C (0) = r- (O) / or i = - 1 tu 10 l. - - z; - •} '-, (. -,)] * «. or ¡or j * l í «i - L + -7 £ (» > = »?) £ (< - 1). ? / £ (< 0 (* «- £ (• ')« 001 n_ The final solution will be aj = a 1, j = 1, ..., 10 3.2.3 Conversion LP to LSP The filter coefficients LP ai / = 1, ..., 10, are converted to the structural torque representation of line (LSP) for quantification and interpolation purposes. For an LP filter of the tenth order, the LSP coefficients are defined as the square roots of the polynomials of sum and difference (9) /? (,) S_4 (*) + * - "? (* - 1), (10) respectively. The polynomial F (z) is symmetric and the polynomial F '2 (z) is antisymmetric. It can be shown that all the square roots of these polynomials are in the unit circle and alternate with each other. F '1 (z) has a root z = -l (w = p) and F' 2 iz) has a root z = 1 (w = 0). To eliminate these two roots, we define the new polynomials f? (. ') = / r,' (í) / (l + í-1), (11) and Fl) = f. (?) / (l - • - *). (121 Each polynomial has 5 conjugate roots in the unit circle (eü l), so polynomials can be written as? W = p (i - ^ '"-t- .-- 3) 113) II (i-2í < * "l + '" ,, >. () where q = cos (Wi) with w £ which are the line spectral frequencies (LSF) and satisfy the ordering property 0 < wx < 2 < .... < w10 < 7T. We refer to qt as the LSP coefficients in the cosine domain. Since both polynomials Fx (z) and F2 (z) are symmetric only in the first 5 coefficients of each polynomial required to be calculated. The coefficients of these polynomials are found by the recursive relationships /? (? + L) «-Í + I + ß, 0-, - /? (»), I = 0, ... 4. ? (i + l) = OU.Í - aia-, +. (i), i = 0, .... (fifteen) where a (0) - f2 (0) = 1.0. The LSP coefficients are found by evaluating the polynomials F1 (z) and F2 (z) at 60 equally spaced points between 0 and f and verify sign change. A sign change means the existence of a root and the interval of change of sign is then divided 4 times to improve the tracking of the root. The Chebyshev polynomials used to evaluate are used to evaluate Fz (z) and F2 (z). In this method, the roots are directly in the cosine domain. { g? } . The polynomials F1 (z) and F2 (z) evaluated az = éjw can be written as with c («> > r, («) + / (i) r, (.) + / (2) t, ( .) + / (J) t, (ti + / (4) r, (_) + / (») /! A where Tm (x) = eos imw) is the Chebyshev polynomial of .p-th order and f) (i), i = 1, ... 5, are the coefficients of either F1 (z) or F2 (z) calculated using the equations in (15). The polynomial C ix) is evaluated at a certain value of x = eos (w) using the recursive relation: for k -e 4 iownto L bk = 2x6. + l - * »+ .. + / (5 - k) C (*) = x6. - 6. + / (d) / 2 with initial values bs = 1 and be = 0. 3.2.4 Quantification of the LSP coefficients The LP filter coefficients are quantized using the LSP representation in the frequency domain: this is ul, = uctotin), t = -l 10, (18) where w ± are the line spectral frequencies [0, p] in the normalized frequency domain LSF. A commanded fourth order MA forecast is used to forecast the current set of LSF coefficients. The difference between the computed and predicted set of coefficients is quantified using the two-stage vector quantifier. The first stage is a 10-dimensional VQ that uses the encrypted code Ll with 128 entries (7 bits). The second stage is a 10-bit VQ that has been implemented as a divided VQ using two 5-dimensional codebooks, L2 and L3 that contain 32 inputs (5 bits) each. To explain the quantification process, it is convenient to first describe the decoding process. Each coefficient is obtained from the sum of 2 code books: / _ í £ l, (Ll) + £ 2i ((LL22) t = l 5, (19) '". {l £ l, (Ll) + £ 3 (i_í (L3) i = ß 10. where Ll, L2 and L3 are the indexes of encrypted code. To avoid marked resonances in the quantized LP synthesis filters, the coefficients lx are arranged in such a way that adjacent coefficients have a minimum distance of J. The re-arrangement routine is illustrated below: for i = 2, ... 10 i, .l = (i, + i .-. l-7) / 2 li = (li + li-l -r J) / 2 CAI eni This re-arrangement process is executed twice. First with a value of J = 0.0001, then with a value of J = 0.000095. After this re-fix process, the quantized LSF coefficients w ± (m) for the current frame n are obtained from the weighted sum of outputs of previous quantization i (m-kl and the current quantizer output _J_ (m) where are the coefficients of the switched AM forecaster. What predictor MA is used, is defined by a separate LO bit. At the beginning the initial values of l (k > are given by 1. = After calculating wi t the corresponding filter is checked for stability.This is done as follows: 1. - Sort the coefficient w¿ in incremented value 2.- If w¿ <0.005 then j. = 0.005 3.- If w1 + 1-uri <0.0001, then wi + 1 = W. + 0.0001 i = l, ... 9, 4. - If w10 > 3.135 then w10 = 3.135. The procedure for encoding the LSF parameters can be established as follows. For each of the two predictors MA, the best approximation to the current LSF vector has to be found. The best approximation is defined as one that minimizes a weighted average square root error 10 EL * C = 5 «, < (W < -? <) to (21) such The weights w are made adaptive as a function of the unquantified LSF coefficients, "* =, 2 < i < 9 -« { ° tf «.« - «!., - I > 0. (M) 10 (w, +? - u, _? - 1) ' + 1 otAeruriM 1-0 i / - < -, + 0.í2? T - l> 0, • { 10 (- «_ + 0.92» -!) '+! Otherwiu In addition, the weights ws and we are multiplied by 1.2 each. The vector to be quantified for the current table is obtained from < - [? í ", -?" í4m "* 1 / (l-? mf). Í-1.-... 10. (23) The first encrypted code Ll is searched and the input Ll that minimizes the mean square root (unweighted) error is chosen. This is followed by a search of the second encrypted code L2 which defines the lower part of the second stage.
For each possible candidate, the partial vector wi r i = 1, ... 5 is reconstructed using the equation Ec. (20), and rearrange to guarantee a minimum distance of 0.0001. The vector with the index L2 that after addition to the candidate of the first stage and rearrange, approaches the bottom of the best corresponding target in the weighted MSE sense, is chosen. Using the selected first stage vector Ll and the lower part of the second stage (L2), the upper part of the second stage looks for the encrypted code L3. Again, the rearrangement procedure is used to guarantee a minimum distance of 0.0001. The vector L3 that minimizes the total weighted MSE is chosen. This process is performed for each of the two MA predictors defined by LO and the predicted MA LO that produces the lowest weighted MSE is chosen. 3.2.5 Interpolation of the LSP coefficients Quantized (and unquantified) LP coefficients are used for the second sub-frame. For the first sub-frame, the quantized (and unquantized) LP coefficients are obtained from linear interpolation of the corresponding parameters in the adjacent sub-frames. Interpolation is performed on the LSP coefficients in the q domain. Let q (m > be the LSP coefficients in the second subframe of the table m, and q "'1' the LSP coefficients in the second sub-table of the last table (m -1). The interpolated LSP coefficients (not quantized) in each of the two sub-tables are given by Subframt 1. «1, s OS ^" "11 + 0.5" 1 (, n \ i = 1,, 10, Suefram * 2: »_, = < tm) = 1, ..., 10. (24) The same interpolation procedure is used for the interpolation of the quantized LSP coefficients to subsist q? for q in Ec. (24) 3.2.6 LP to LSP Conversion Once the LSP coefficients are quantified and * interpolated, they are converted back to the LP coefficients. { to- . The conversion to the LP domain is done as follows. The coefficients of F (z) and F2 (z) are found by expanding equations (13) and (14) knowing the quantized and interpolated coefficients LSP. The following recursive relation is used to calculate f (i), i = 1, ..., 5, from q for ta »1 ío or? (¿) = -2? Í, -l /? (Il) + 2? («'- 2) for j = i - 1 downtß 1 with initial values f1 (i) = 1 and f2 (i). The coefficients f2 (i) are calculated similarly by replacing q2i_! by q2l.
Once the coefficients f ^^ ii) and f2 (i) are found F1 (z) and F2 (z) are multiplied by 1 + z "xy 1 - z" 1 repectively to obtain F (z) and F'2 . { z); this is / [(=? (¿) +? (? - i). Í =? 3,. (=? (-? (I-D. »=? 5. (25) Finally, the LP coefficients are found -I 0.5 / í (i) + O.5 / 5 (¿),, __!,. .,5. (2fl) 0.d /. { (.-d) -0.5 /_(.- 5), • = «,. 10 This is derived directly from the relation A (z) = (F (z) + F'2 (z) / 2 and because F (z) and F'2 (z) are symmetric and antisymmetric polynomials respectively. perceptual The perceptual weighting filter is based on unquantified LP filter coefficients and is given by The values of? T and? 2 determine the frequency response of the filter W (z). By appropriate adjustment of these variables it is possible to make the most effective weighting. Is this achieved by doing? and? 2 a function of the spectral shape of the power signal. This adaptation is performed once per frame of 10 ms, but an interpolation procedure for each first sub-frame is used to smooth this adaptation procedure. The spectral form is obtained from the second order linear forecast filter, obtained as a by-product of the Levinson-Durbin recursion (Section 3.2.2). The reflection coefficients kl f are converted to the ratio coefficients of area log (LAR) or These LAR coefficients are used for second sub-frames. The LAR coefficients for the first sub-table are obtained through linear interpolation with the LAR parameters from the previous table and are given by: Suoframe 1. ol, = O-ao ^ "'1 + 0 ho," "i = I 2. Sui / rame 2: o2, = o';" ", i = 1,, 2 (29) The spectral envelope is characterized either by being flat by (flat = 1) or inclined by (flat = 0). For each sub-frame, this characterization is obtained by applying a threshold function to the LAR coefficients. To avoid rapid changes, a hysteresis is used when taking into account the value of flat in the previous sub-frame (m - 1), 0 if I heard < -1.74 and oj > 0.65 and // a. < m-l > s l, fla m > l 1 if I heard > -1.52 and o. < 0.43 and /fa.- -l > = 0, (30) flo¿ (m- "otherwisß.
If the interpolated spectrum for a sub-frame is classified as flat (flat'm > = l). The weighting factors are adjusted to and? = 0.94 and? 2 = 0.6. If the spectrum is classified as inclined (flatlm > = 0), the value y1 is set to 0.98, and the value of? 2 is adapted to the strength of the resonances in the LP synthesis filter, but is limited between 0.4 and 0.7. If a strong resonance is present, the value of? 2 adjusts closer to the upper limit. This adaptation is achieved by a criterion based on the minimum distance between 2 successive LSP coefficients for the current sub-frame. The minimum distance is given by dmin = m.n (í_? < +. - w,] ¡sl, ..., 9. (31) The following linear relationship is used to calculate? 2 ? _ = -6.0 • dmtn + 1.0, and 0.4 < 2 < 0.7 (32) The weighted speech signal in a sub-frame is given by The weighted speech signal sw (n) is used to find an estimate of the pitch delay in the speech frame. 3.4 Open-loop tone analysis To reduce the complexity of the search for the best delay of the adaptive encrypted code, the search range is limited around a candidate Top delay, obtained from an open-loop tone analysis. This open-loop tone analysis is performed once per frame (10 ms). The open-loop tone estimation uses the weighted speech signal sw (n) of Eq. (33) and is performed as follows: In the first stage, 3 correlation maxima ? (t) = Tsu > (r.) ju (r.-faith) (34) they are found in the following three ranges i = 1: 80, ..., 143, i = 2: 40, ..., 79, i = 3: 20, ..., 39. The maximums retained are normalized through R (t1), i = 1, ..., 3, of * («.) -, m R { tÍ¡ ,,. . '= 1 3. (35) The winner among the three normalized correlations is chosen by favoring the delays with the values in the lower range. This is done by weighting the normalized correlations that correspond to the longest delays. The best open loop delay Top is determined as follows: RUTen) = * (.,) T "= í. end < /? '(tJ) > 08d? '(TV) X (TV) =? »(T,) tnd This procedure of dividing the range of delays into 3 sections and favoring the lower sections, is used to avoid choosing multiple tones. 3.5 Calculation of the impulse response The impulse response h (n) of the weighted synthesis filter W (z) / Á (z) is calculated for each sub-frame. This impulse response is required for the search of adaptive and fixed codebooks. The impulse response h (n) is calculated by filtering the vector of the filter coefficients A (z / y1) extended with zeros through the two filters 1 / Á (z) and 1 / A (z / y2). 3.6 Calculation of the target signal The target signal x (n) for the search for adaptive cipher code is usually calculated by subtracting the zero feed response of the weighted synthesis filter W (z) / Á (z) = A (z / y1 ) / [Á (z) A (z / y2)] from the weighted speech signal sw (n) of Eq. (33). This is done on a sub-frame basis. An equivalent procedure for calculating the target signal, which is used in this recommendation, is the filter of the signal LP r (n) through the combination of the 1 / Á (z) and the weighting filter A (z) / yx) / A (z / y2). After determining the excitation of the sub-frame for the sub-frame, the initial states of these filters are updated by filtering the differences between residual LP and excitation. The memory update of these filters is explained in Section 3.10. The residual signal r (n) that is required to find target vector is also used in the search for adaptive cipher code, to extend the past excitation buffer. This simplifies the adaptive cipher search procedure for delays less than the size of sub-frame 40 as will be explained in the next section. The residual LP is given by 10 r (n) s * (n) +? ?, «(N - i), n 0, ..., 39. (3ß) 3.7 Search for adaptive encryption code The adaptive encryption code parameters (or tone parameters) are the delay and the gain. In the adaptive cipher code approach for implementing the tone filter 5, the excitation is repeated by delays less than the sub-frame length. In the search stage, the excitation is extended by the residual LP to simplify the closed-loop search. The search for adaptive encrypted code is made every sub-frame (5 ms). ' In the first frame up, a fractional tone delay T is used with a resolution of 1/3 in the range [19 1/3, 84 2/3] and integers only in the range of [85, 143]. For the second sub-frame, a delay T2 with a resolution of 1/3 is always used in the range of [. { int) T1 - 5 2/3, (int) Tz + 4 2/3], where (int) Tx is the closest integer to the fractional tone delay T of the first sub-frame. This range is adapted for cases where T1 is straddling the limits of the delay range. For each sub-frame, the optimal delay is determined using closed loop analysis that minimizes root error square weighted average. In the first sub-frame, the delay T2 is found looking for a small range (6 samples) of delay values around open loop delay T ^ (see Section 3.4). The search limits tm? P and tmax are defined by 2? 5J tm? N = T op - 3- "? F tm? N <20 then tmtn = 20 'f tmas > l- * 3 í? In < ma. = 143 For the second sub-frame, the closed-loop tone analysis is performed around the selected tone in the first sub-frame to find the optimal delay T2. The search limits are between tmin - 2/3 and tmax + 2/3, where tmin and tmax are derived from Tx as follows: tm? H = (int) T? - 5? /, «< 20 tken tmtn = 20 'mae * - * mtn + 9 í .x = 143' »n? N • - 'ar ~ ^« na! The closed loop tone search minimizes the mean square root mean error between the original and synthesized speech. This is achieved by maximizing the term.
R { k) = '(»>» (»>, (37) e" ß? (»)? (") where zCn,) is the objective signal e yk (n) is the filtered excitation passed to the delay k (past excitation convolved with h (n)). It should be noted that the search range is limited around a pre-selected value, which is the open-loop tone T ^ for the first sub-frame, and T for the second sub-frame. The convolution yk (n) is calculated for the delay tmip and for the other integer delays in the search range k = tm? P + l,. . . , tmax, is updated using the recursive relationship and? (n) SW __ (n-l) + U (-:) / > < n), n = 39, ..., 0, (38) where u (n), n = -143, ..., 39, is the excitation buffer e yk.x (-1) = 0. It should be noted that in the search stage, the samples u (n ), n = 0, ..., 39 are not known and are fed for pitch delays less than 40. To simplify the search, the residual LP is copied au (n) to make the relation in Eq. (38) Valid for all delays. For the determination of T2 and Tz if the optimal whole closed loop delay is less than 84, the fractions around the optimal integer delay have to be tested. The fractional tone search is done by interpolating the normalized correlation in equation (37) and looking for its maximum. The interpolation is done using a FIR filter b12 based on a sync function formed in Hamming windows with truncated sync at ± 11 and filled with zeros at ± 12 (b12 (12) = 0) X The filter has its cutoff frequency ( -3dB) at 3600 Hz in the sampled domain. The interpolated values of R (k) for fractions -2/3, -1/3, 0, 1/3, and 2/3 are obtained using the interpolation formula ft (*)? -? R (k -?) Éu (í + i 3) +? (* + I +,) ¿, 2 (3 - t +, 3). < = O, l.2, (39) «= o where t = 0,1,2, corresponds to fractions 0, 1/3, and 2/3, respectively. It should be noted that it is necessary to calculate the correlation terms in equation (37), using a range tmln - 4, tmax + 4, to allow adequate interpolation. 3. 7.1 Generation of adaptive encrypted code vector Once the non-integer tone delay has been determined, the adaptive ciphered code vector v (n) is calculated by interpolating the excitation signal passed u (n) to given integral delay A; and the fraction tt 9 9"('») - = _ r «(' i- '+ * 3o (< +« 3) + 53u (n-) fc - (- l + 630 (3-. +? 3 ), n = 0,., 39, * = 0.1.2. (40) The b30 interpolation filter based on sinc functions in Hamming windows with truncated sync at ± 29 and filled with zeros at ± 30 (J30 (30) = 0). The filters have a cutoff frequency of (-3dB) at 3600 Hz in the sampled domain. 3. 7.2 Calculation of encrypted codeword for adaptive encrypted code delays The tone delay T- is encoded with 8 bits in the first sub-frame and the relative delay in the second sub-frame is encoded with 5 bits. A fractional delay, T is represented by its integer part (int) T, and a fractional fraction frac / 3, frac = -1,0,1. The pitch index Pl is now encoded as . { ((íni) T - 19) • 3 + frac - l, if 7 = [19, .... 8"5]", f / rroe - = [- 1,0.1] =. '(41) ((iní) T? - 85) + 197, if 7 = [86, .... 143], // rac = Q the tone delay value T2 is coded with respect to the value of T2. Using the same interpretation as before, the fractional delay T2 represented by its integer part (int) T2 and a fractional fraction frac / 3, frac = -1.0.1 is coded as Pl = ((int) Tt - tmtn) * 3 + frac + 2 (42) where tm? n is derived from Tz as before. To make the encoder more robust against random bit errors, a parity bit PO is calculated in the delay index of the first sub-frame. The parity bit is generated through an XOR operation in the 6 most significant bits of Pl. In the decoder, this parity bit is recalculated and if the recalculated value does not match the transmitted value, an error concealment procedure is applied. 3.7.3 Calculation of adaptive encrypted code gain Once the adaptive encrypted code delay is determined, the adaptive encrypted code gain gp is calculated as ft * y, X / (WW |. Bounded by? = ^ = T .2. (43) where y (n) is the filtered adaptive cipher code vector (zero state response from W (z) / Á (z) to v (n)). this vector is obtained by convolving v (n) with h (n) n lAn) =? ? (i) h (n - i) n to 0.. . ..39. (44) tsO It should be noted that by maximizing the term in Eq. (37) in most cases gp > 0. In the case that the signal only contains negative correlations, the value of gp is set to zero. 3.8 Fixed encryption code: structure and search The fixed encryption code is based on an algebraic code structure using a simple interspersed pulse permutation (ISSP) design. This encrypted code, each encrypted code vector contains 4 non-zero pulses. Each pulse can have either of the amplitudes +1 or - 1, and can acquire the positions given in Table 7. The ciphered code vector c (n) is constructed by taking a zero vector, and putting all four pulses units in the locations found, multiplied with their corresponding sign. c. { n) =? Q6 (n -? Q) + $ l 6 (n - d) + s26 (n -? 2) + s36. { n -? Z), to O ,. , 39 (45) where d (0) is a unit pulse. A special feature incorporated in the encrypted code is that the selected cipher code vector is filtered through an adaptive pre-filter P (z) that improves harmonic components to improve the synthesized speech quality. Here the filter P (z) = 1 / (1 - 3t-t) (46) Table 7: Structure of C fixed encrypted code. Pulse Siano Positions iO sO 0. 5, 10, 15, 2Q, 25, 30. 35 il yes 1. 6, 11, 16, 21, 26, 31. 36 i2 s2 2. 7, 12, 17, 22, 27 , 32. 37 i3 s3 3, 8, 13, 18, 23, 28, 33, 38 4. 9, 14, 19, 24, 29, 34. 39 is used, where T is the entire component of the pitch delay of the current sub-frame and ß is a tone gain. The value of ß is made adaptable by using the quantized adaptive codebook gain quantified from the previous sub-frame that is limited by 0.2 and 0.8. 3 = g (ml 02 < .3 < 0.8. (47) This filter improves the harmonic structure for delays less than the sub-frame size of 40. This modification is incorporated in the search for fixed cipher code when modifying the impulse response h (n), according to A (n) aA (n) + / 3? (N-r), n = r, .., 39. (48) 3. 8.1 Fixed code search procedure The fixed encrypted code is searched by minimizing the mean square root error between the weighted feed speech sw (n) of Equation (33), and weighted reconstructed speech. The target signal used in the closed-loop tone search is updated by subtracting the adaptive cipher contribution. This is * 2 (p) x í (n) - g, v (n), p a 0, ..., 39, (49) where y (n) is the filtered adaptive cipher code vector of Equation (44). The matrix H is defined as the convolution matrix Bottom triangular toepliz with diagonal h (0) and lower diagonals h (l),. . . , n (39). If diagonal ck is the algebraic code vector to the index k then the encrypted code is searched by maximizing the term and where d (n) is the correlation between the objective signal x2 (n) and the impulse response h (n) and F - H'H is the correlation matrix h (n). The signal d (n) and the matrix F are calculated before the search for an encrypted code. The elements of d (n) are calculated from 39 d (n) =? x (t) h (? - n), "= 0,. , 39. (51)?: N and the elements of the symmetric matrix F are calculated by 39 «* ('•.) = £? (n-?) h (n - j), (j >?) (2). ns; It should be noted that only the elements currently required are calculated and an effective storage procedure is designed to speed up the search procedure. The algebraic structure of the encrypted code C allows a quick search procedure since the encrypted code vector ck contains only four non-zero elements. The correlation in the numerator of equation (50) for a given ck vector is given by 3 C =? ai (mi), (53)? s0 where my t is the position of the i-th pulse and a ± is its amplitude. The energy in the denominator of equation (50) is given by To simplify the search procedure, the pulse amplitudes are determined by quantifying the signal d (n). This is done by readjusting the amplitude of a pulse in a certain position equal to the sign of d (n) in that position. Before the search of encrypted code, the following stages are carried out. First, the signal d (n) is broken down into two signals: the absolute signal d '(n) = \ d (n) \ and the sign signal [d (n)]. Second, the matrix F is modified by including the sign information, that is, F '(i, j) = sign [< _ (.)] signμ)] (i, j), «= 0. ... , 39, to i 39. (55) To remove factor 2 in equation (54) f '(i, i) s O.Sf (i, i), t = 0 39. (56) The correlation in equation (53) is now given by C = d '(mo) + d' (m?) + / (M,) + < / (rt! 3), (57) and the energy in equation (54) is given by E = o > '(p? o, o) + f '(m,, m?) + f' (mQ, ml) + © '(mj.mj) +?' (mo, m3) - (- © '(rn !, rp_) + ^' (m3lm3 ) + F '(m0, m3) - (- ©' (mi.ms) + (p '(m., M3) [58) A concentrated search approach is used to further simplify the search procedure. In this approach, a pre-calculated threshold is tested before entering the last loop, and the access loop only if this threshold is exceeded. The maximum number of times the loop can access is fixed, so that a small percentage of the encrypted code is searched. The threshold is calculated based on the correlation C. The maximum absolute correlation and the average correlation due to the contribution of the first three pulses max3 and av3 are found before the search for an encrypted code. The threshold is given by í? Rs s avj + A'símaxj -? A). (59) The fourth loop is accessed only if the absolute correlation (due to three pulses) exceeds t? r3 / where 0 = JO < 1. The K3 value controls the percentage of the encrypted code search and adjusts here to 0.4. Note that this results in a variable search time, and to further control the search the number of times the last loop is accessed (for the two sub-frames) can not exceed a certain maximum, which is set here to 180 (worst possible case average per sub-frame 90 times). 3.8.2 Codeword calculation of the fixed encrypted code The pulse positions of the pulses iO, il and i2 are coded with 3 bits each, while the position of i3 is coded with 4 bits. Each pulse amplitude is encoded with a bit. This gives a total of 17 bits for the 4 pulses. Defining s = 1 if the sign is positive and s = 0 if the sign is negative, the sign code word is obtained from S * «0 + 2« J1 + 4 ** 2 + 8 «« 3 (60 ) and the fixed codebook code word is obtained from C (t0 / 5) +8 • («1/5) + 64 # (¿2/5) + 512 * (2 • (¿3/5) + jx) (61) where jx = 0 if i3 = 3.8, .., and jx = 1 if i3 = 4.9, ... 3.9 Quantizing the gains The adaptive codebook gain (tone gain) and the code gain Fixed (algebraic) ciphers are vector quantized using 7 bits. The search for gain code is made by minimizing the mean squared root weight error and the reconstructed speech given by E = x'x + g; y'y + g * - 2j x'y - 2gex'z + and.?ey'í, (62) where x is the target vector (see Section 3.6) and is the filtered adaptive cipher code vector of equation (44) and z is the fixed cipher code vector convolved with h (n) : (n) - = c. { i) h (n - i) n = 0., 39. (63) isO 3. 9.1 Gain Forecast The fixed code gain g ° can be expressed as 9 = You .. (64) where g 'c is a predicted gain based on previous fixed code cipher energies and y is a correction factor. The average energy of the fixed codebook contribution is given by After scaling the vector c with the fixed cipher code analysis gc, the fixed cipher code energy scaled is given by 20 log gc + E. Let E (? N) be the average withdrawn energy (in dB) of the contribution of fixed encrypted code (adjusted in scale) in sub-frame m, given by £ < m) = 2Olog0e + E - É, (6ß) where E = 30dB is the average excitation energy of the fixed coded code. The gain gc can be expressed as a function of E, m > , E1 and É by 8. = 10 < s < - '+ * - ß) ». (67) The predicted gain g 'c is found by predicting the energy log of the contribution of the current fixed coded code from the energy log of the contributions of the code "previous fixed coding." The fourth-order MA prediction or prediction is made as follows The forecast energy is given by 4 £ &) = Y j? - "- > , (68). =? where [b2 b2 b3 = [0. 68 0. 58 0 .34 0. 19] are the forecast coefficients MA and R < m > is the quantized version of the forecast error R (a> in sub-box m, defined by ? - »= £ m > - £ m > . (69) The predicted gain g 'c is found by replacing E (m > by its predicted value in equation (67) The factor of £ l0li' -'.- í- £ VM. (7Q) cor rece ion? relates to the profit forecast error by # "» > = £ («) _ £ Í-> = 20 | og (?). (71) 3. 9.2 Search for encrypted code for gain quantization The gain of adaptive encrypted code gp and the factor? they are quantified vectorially using a structured encrypted code, conjugated in two stages. The first stage consists of a two-dimensional coded three-bit code QA, and the second stage consists of a 4-bit dimensional coded code QB. The first element in each encrypted code represents the quantized adaptive encrypted code gain gp, and the second element represents the correction factor for fixed and quantized encrypted code gain. Given the ciphered code indexes m and n for QA and QB respectively, the quantized adaptive codebook gain is given by and the gain of fixed encrypted code quantified by 9 * = e7 = 9c (GM ™ + GB? (N)). (73) This conjugated structure simplifies the search for encrypted code by applying a pre-selection process. The optimal tone gain gp and the fixed codebook gain gc are derived from equation (62) and used for pre-selection. The encrypted code QA contains 8 entries where the second element (corresponding to gc) has in general larger values than the first element (corresponding to gp). This derivation allows a pre-selection using the value of gc. In this pre-selection process, a swarm of 4 vectors whose second element is close to gxc where gxc is derived from gc and gp. Similarly, the encrypted code QB contains 16 entries where they have a derivation to the first element (corresponding to gp). A swarm of 8 vectors whose first elements are close to gpl is chosen. Therefore, for each cipher code, the best 50% candidate vectors are chosen. This is followed by exhaustive search on the remaining 4 * 8 = 32 possibilities, such that the combination of the two indices minimizes the weighted average square root error of Equation (62). 3.9.3 Cipher code calculation for gain quantizer The QA and QB code words for the gain quantizer are obtained from the indices corresponding to the best selection. To reduce the impact of simple bit errors, the encrypted code indexes are mapped. 3. 10 Memory update An update of the states of the synthesis and weighting filters is required to calculate the target signal in the next sub-frame. After the two gains are quantified, the excitation signal u (n) in the present sub-table is found by u (n) = g,? (N) + g, c (n), n = 0 ,. ., 39, (74) where gp and §c are the quantized fixed and adaptive codebook gains, respectively v (n) the adaptive ciphered code vector excitation passed interpolar) and c (n) is the fixed encrypted code vector (algebraic code vector including tuning tone) filter states can be updated by filtering the signal r (n) - u (n) (difference between residual and excitation) through the filters 1 / Á (z) and A (z / y1) / A (z / y2) for the sub-frame of 40 samples and save the states of the filters. This will require 3 filter operations. A simpler approach, which only requires a filter is as follows. The local synthesis speech, ñ (n) is calculated by filtering the excitation signal through 1 / Á (z). The filter output due to the power r (n) -u (n) is equivalent to e (n) = s (n). In this way the synthesis filter states 1 / Á (z) are given by e (n), n = 30, ..., 39. the update of filter states A (z / and x) / A (z / y2) can be done by filtering the error signal e (n) through this filter to find the perceptually weighted error ew (n). However, the signal ew (n) can be found equivalently by ew (n)? (N) - gpu (n) + gt * (n). (75) Since the signals x (n), y (n), and z (n) are available, the weight filter states are updated by calculating ew (n) as equation (75) for n = 30, ..., 39 This saves two filter operations. 3.11 Initialization of Encoder and Decoder All static encoder variables should be initialized to zero, except for the variables listed in Table 8. These variables require initializing for the decoder equally. Table 8: Description of parameters without non-zero initialization Variable Reference Initial Value ß Section 3.8 0.8 ll Section 3.2. .4 ip / 11? R_ Section 3.2, .4 0.9595, .., R'k > Section 3.9, .1 -14 4. Functional description of the decoder The signal flow in the decoder is illustrated in Section 2 (Figure 3). First, the parameters are decoded (LP coefficients, adaptive encrypted code vector, fixed encrypted code vector and gains). These decoded parameters are used to calculate the reconstructed speech signal. This process is described in Section 4.1. This reconstructed signal is enhanced by a post-processing operation consisting of a post-filter and a high-pitched filter (Section 4.2). Section 4.3 describes the error concealment procedure used when either a parity error has occurred or when the frame erasure flag has been placed. 4.1 Parameter decoding procedure The parameters transmitted are listed in the Table 9. Table 9: Description of parameter indices transmitted. The ordering of the bitstream is reflected by the order in the Table. For each parameter, the most significant bit (BMS) is transmitted first. Symbol Description Bi s Lo quantifier forecaster quantized switch LSP 1 Ll First stage vector of quantifier LSP 7 L2 Lower vector of second stage 5 of quantizer LSP L3 Vector superior of second stage of quantizer LSP 5_ Table 9: Description of parameter indices transmitted , (Continued) Symbol Description Bits Pl First tone delay sub-frame 8 PO Parity bit for tone 1 51 First sub-frame pulse signs 4 Cl First sub-frame fixed encryption code 13 GAl First sub-frame code encryption gain (stage 1) 3 GB1 First sub-frame code gain code (stage 2) 4 P2 Second sub-frame tone delay 5 52 Second sub-frame pulse signs 4 C2 Second sub-frame fixed encryption code 13 GA2 Second sub-frame gain encryption code (stage 1) 3 GB2 Second sub-frame code gain code (stage 2) 4 At the start all the encoder variables should be initialized to zero, except for the variables listed in Table 8. The decoding procedure is performed in the following order: 4.1.1 Decoding of LP filter parameters The received LO, Ll, L2 and L3 indices of the LSP quantizer are used to reconstruct the LSP coefficients quantified using the procedure as described in Section 3.2.4. The interpolation procedure as described in Section 3.2.5 is used to obtain 2 interpolated LSP vectors (corresponding to two sub-frames). For each sub-frame, the interpolated LSP vector becomes the filter coefficients LP ai; which are used to synthesize the reconstructed speech in the sub-frame. The following stages are repeated for each sub-frame: 1. Decoding of adaptive encrypted code vector. 2. - Decoding of fixed encrypted code vector. 3.- Decoding of the gains of adaptive and fixed code books. 4.- Calculation of reconstructed speech. 4.1.2 Decoding of adaptive encrypted code vector The received adaptive encrypted code index is used to find the integer and fractional parts of the step delay. The integer part (int) T and the frac fractional part of T are obtained from Pl as follows: else (int) T? = Pl - 112 frac -s 0 end The whole and fractional part of T2 are obtained from P2 and tmin, where tmin is derived from Pl as follows: í é? a (n *) ^ - 5 tma * - 1 3 'rnin =' mu * "* enal Now T2 is obtained from (inl) T? = (P2 + 2) / 3 -l +, / r c = P2 -2 - ((P2 + 2) / 3 -l) »3 The adaptive cipher code vector v (n) is found by interpolating the past excitation u (n) (in the step delay) using Equation (40). 4.1.3 Fixed Code Encryption Vector Decoding The fixed encrypted code C index received is used to extract the excitation pulse positions. The pulse signs are obtained from S. Once the pulse and sign positions are decoded, the fixed cipher code vector c (n) can be constructed. If the entire part of the step delay, T, is less than the size of sub-frame 40, the step improvement procedure is applied, which modifies c (n) according to equation (48). 4.1.4 Decoding of fixed and adaptive encryption code gains The received gain code encryption code gives the adaptive encrypted code gain gp and the fixed encryption code gain correction factor?. This procedure is described in detail in Section 3.9. The estimated fixed encrypted code gain g 'p is found using equation (70). The fixed cipher code vector is obtained from the product of the gain correction factor quantized with this predicted gain (equation (64)). The adaptive codebook gain is reconstructed using equation (72). 4.1.5 Parity bit calculation Before the speech is reconstructed, the parity bit is recalculated from the adaptive cipher code delay (Section 3.7.2). If this bit is not identical to the transmitted parity bit PO, it is likely that bit errors will occur during transmission and the error concealment procedure in Section 4.3 will be used. 4. 1.6 Calculation of reconstructed speech The excitation u (n) in the feed of the synthesis filter (see equation (74)) is fed to the synthesis filter LP. The speech reconstructed by the sub-table is given by 10 (n) = u (n) -?, I (n - i), n = 0, ..., 39. (76)? * L where a2 are the interpolated filter coefficients LP. The reconstructed speech § (n) is then processed by a post-processor that is described in the next section. 4.2 Post-processing Post-processing consists of three functions: adaptive post-filtering, high-pass filter, and up-scaling signal adjustment. The adaptive post filter is the cascade of three filters: a step post-filter Hp (z), a short-term postfilter Hf (z) and a tilt compensation filter Ht (z), followed by a gain control procedure adaptive These filters are updated each sub-frame of 5 ms. The post-filtering process is organized as follows. First, the synthesized speech § (n) is filtered back through Á (z / yp) to produce the residual signal r (n). The signal r (n) is used to calculate the step delay T and gain gp? T. The signal r (n) is filtered through the pass postfilter Hp (z) to produce the signal r '(n) which in turn is separated by the synthesis filter 1 / [gfÁ (z / yd), finally the signal at the output of the synthesis filter 1 / [g ^ A (z / yd) is passed to the tilt compensation filter Ht (z) resulting in the post-filtered synthesis speech signal sf (n). The adaptive gain control is then applied between sf (n) and s (n) resulting in the postfiltered signal sf '(n). The low pass filtering operation and scale adjustment act on the postfiltered signal sf '(n). 4.2.1 Post-filter of step The post filter of step or harmonic is given by where T is the step delay and ga is a gain factor given by Jo = irit »< (78 > where gpit is the step delay. Both the step delay and the gain are determined from the output signal of the decoder. Note that gpit is limited by l, and adjusts to zeros and the gain of step forecast is less than 3 dB, the factor yp controls the amount of harmonic post-filtering and has the value yp = 0.5. The step delay and gain are calculated from the residual signal r (n) obtained by filtering the speech § (n) through Á (z / yn), which is the numerator of the short-term post-filter (see Section 4.2.2) 10 r (") = * (") +? RJ. «- - (79) istl The step delay is calculated using a two step procedure the first step chooses the first integer T0 in the range [T2 - 1, TX + 1] where T2 is the integer part of the step delay (transmitted) in the first subframe . The best integer delay is that which maximizes the correlation 39 R (k) s? r (n) r (n - k). (80) - The second step chooses. the best fractional delay T with resolution 1/8 around T0. This is done by finding the delay with the highest normalized correlation where rk (n) is the residual signal to the delay k. Once the optimum delay T is found, the corresponding correlation value is compared against a threshold. If R '(T) < 0.5 then the harmonic post-filter is deactivated by setting gplt = 0. Otherwise the value of gplC is calculated from:? N '? Q r (n) ri (n) - «or * W-'W limited by 0 < grpi £ l.0 (82) The non-integer delayed signal r¿ (n) is first calculated using an interpolation filter of length 33. After the selection of T, rk (n) is recalculated with an interpolation filter plus length of length 129. The new signal replaces the previous one, only if the longer filter increases the value of R '(T). 4.2.2 Short-term post-filter The short-term post-filter is given by 1 9, A (zhi) 9f 1 + Ei ».7¿ ** - l where Á (z) is the received quantized LP reverse filter (LP analysis does not perform on the decoder), and factors 7p and? they control the amount of postfiltering in the short term and adjust to yn = 0.55 and yd = 0.7. The gain term gf is calculated in the truncated impulse response hf (n) of the filter Á (z / y / A (z / yd) and is given by 19 í / =? > / (") nal] (84) 4.2.3 Slope Compensation Finally, the filter Ht (z) compensates for the inclination in the short-term post filter Hf (z) and is given by / rl (*) - if (i + 7. *. .rl). (85) 9t where ytkx is a tilt factor, Je. What is the first reflection coefficient calculated in hf (n) with ? h, (j) h, (j +?). (86) The gain term grt = 1 - l? T il calculates the decreasing effect of gf on Hf (z). In addition, it has been shown that the product filter Hf (z) Ht (z) generally has no gain. Two values of? T are used depending on the sign of kl t if k is negative, and t = 0.9, and if kx is positive? T = 0.2. 4. 2.4 Adaptive gain control Adaptive gain control is used to compensate for gain differences between the reconstructed speech signal (n) and the postfiltered signal sf (n). The gain factor adjustment factor G for the present sub-table is calculated by The processed signal adjusted in scale sf '(n) is given by * / '(") = 9 (n) tf (n), n = 0 ..., 39, (88) where g (n) is updated on a sample-by-sample basis and given by í (n) = 0.85f (n-l) + 0.15s,? = 0, ..., 39. (89) The initial value of gr (-l) = 1.0 4.2.5 High pass filter and adjustment in ascending scale A high pass filter at the cutoff frequency of 100 Hz is applied to reconstructed and post-filtered speech sf '(n). The filter is given by H i, Q 939M581 - l.879S834? -t -I- 0.93980581 * - '«*» (*) «l - l.9330735z-l + 0.93589199í-í (' The adjustment in ascending scale consists of multiplying the output high pass filter by a factor of 2 to recover the power signal level 4.3 Frame erase and parity error concealment An error concealment process has been incorporated into the decoder, to reduce the impairments in reconstructed speech due to frame erasures or random errors in the bit stream This error concealment process is functional when either i) the encoder parameter table (corresponding to a 10 ms frame) has been identified as erased, or ii) a checksum error occurs in. the parity bit for the pitch delay index Pl. The latter can occur when the bitstream has been corrupted by random bit errors.
If a parity error occurs in Pl, the delay value Tx is adjusted to the delay value of the previous frame. The value of T2 is derived with the procedure outlined in Section 4.1.2, using this new value of Tx. If consecutive parity errors occur, the previous value of tl t incremented by 1 is used. The mechanism for detecting table erasures is not defined in the recommendation, and depending on the application, the concealment strategy has to reconstruct the current table based on the information previously received. The method used replaces the missing excitation signal with a similar characteristic, while gradually deteriorating its power. This is done by using a voice sorter based on the long-term forecast gain, which is calculated as part of the long-term post-filter analysis. The tone post-filter (see Section 4.2.1) is the long-term predictor for which the forecast gain is greater than 3 dB. This is done by adjusting a threshold of 0.5 in the normalized correlation R '(K) (equation (81)). For the error concealment process, these tables will be classified as periodic. Otherwise the table is declared non-periodic. An erased box inherits its class from the preceding (reconstructed) speech box. It should be noted that the voice classification is continuously updated based on this reconstructed speech signal. Therefore, for many consecutive deleted frames, the classification may change. Typically, this only happens if the original classification was periodic. The specific steps that are carried out for an erased table are: 1. Repetition of the LP filter parameters, 2. Attenuation of fixed and adaptive codebook gains, 3. Attenuation of the gain predictor memory, 4. Generation of replacement excitation. 4.3.1 Repetition of LP filter parameters The LP parameters of the last good frame are used.
The forecaster states LSF contains two values of the code words received li. Since the current code word is not available, it is calculated from the repeated LSF parameters w ± and the predictor memory from f, = [?} »- m, i / - *, l / d -? »•?),. = 1,. . . , 10. (91) -al. yes 4. 3.2 Fixed and adaptive encryption code gain attenuation An attenuated version of the previous fixed encryption code gain is used g [m) = 099 im-l) (92) The same is done for the adaptive codebook gain. In addition, a trimming operation is used to maintain its value below 0.9 9pm) - = 0.9 ^ -l > and * < m > < 0.9. (93) 4. 3.3 Gain Predictor Memory Attenuation The Gain Forecaster uses the energy of the previously selected code. To allow a uniform continuation of the encoder once good frames are received, the gain predictor memory is updated with an attenuated version of the code-encrypted power. The value of R (a> for the current sub-frame n is adjusted to the average quantized gain forecast error attenuated by 4 dB. 4 Aí "i) a (0.25 #" - >) - 4.0 and «" " > > -14. (94) 4. 3.4 Generation of replacement excitation The excitation used depends on the periodicity classification. If the last table correctly received is classified as periodic, the current table is considered periodic alike. In that case only the adaptive cipher code is used, and the fixed cipher code contribution is set to zero. The step delay is based on the last received step delay correctly, and is repeated for each successive frame, to avoid excessive periodicity, the delay is increased by one for each sub-frame but is limited by 143. The code gain Adaptive encryption is based on an attenuated value according to equation (93). If the last table correctly received is classified as non-periodic, the current table is considered equally non-periodic, and the adaptive codebook contribution is set to zero. The contribution of fixed encrypted code is generated by randomly choosing an index of encrypted code and sign index. The random generator is based on the function seed = aeed «31821 + 13849, (95) With the initial seed value of 21845. The randomized codebook index is derived from the thirteen least significant bits of the next random number. The random sign is derived from the four least significant bits of the next random number. The fixed codebook gain is attenuated according to Equation (92). 5. Exact bit description of the CS-ACELP encoder The ANSÍ C code simulating the CS-ACELP encoder at the 16-bit fixed point is available from ITU-T. The following sections summarize the use of this simulation code and how the software is organized. 5.1 Use of simulation software The C code consists of 2 main programs coder.c, which simulates the encoder and decoder.c that simulates the decoder. The encoder is run as follows: coder inputfile bstreamfile The inputfile and outputfile are sampled data files containing 16-bit PCM signals. The current-bit file contains 81 16-bit words, where the first word can be used to indicate frame erasure, and the remaining 80 words contain 1 bit each. The decoder takes this bit stream file and produces a post-filtered output file containing a 16-bit PCM signal. bstreamfile outputfile decoder 5.2 Organization of simulation software In the fixed point ANSÍ C simulation, only two types of fixed point data are used as illustrated in Table 10.
Table 10: Data types used in simulation ANSÍ C Type Value Value Minimum maximum description Wordl6 0x7fff 0x8000 16-bit word with sign complement 2 Word32 0x7ffffffL 0x80000000L 32-bit word with sign 2 complement To facilitate the implementation of the simulated code, loop indexes, boolean values and flags, type flag is used, which would be either 16 bits or 32 bits depending on the platform or target. All calculations are used using a pre-determined set of basic operators. The description of these operators is shown in Table 11. The tables used by the simulation coder are summarized in Table 12. These main programs use a library of routines that are summarized in Tables 13, 14 and 15. Table 11: Basic operations used in simulation ANSÍ C Operation Description Wordl6 satura (ord32 L_varl) Limit to 16 bits ordl6 add (Wordl6 vari, Wordl6 var2) Addition short ordl6 sub (ordl6 vari, ordl6 var2) Subtraction short Wordl6 abs_s (Word 16 vari) abs short Wordl6 shl (Wordl6 vari, Wordl6 var2) Short left offset Table 11: Basic operations used in simulation ANSÍ C Operation Description Wordl6 shr (Wordl6 vari, Wordlß var2) Short scroll Wordld mult (Wordl6 vari, Wordl6 var2) Short multiplication Word32 L_mult (Wordld vari, Wordl6 var2) Long multiplication Wordl6 negate (Wordl6 vari) Short negation Wordld exeract_h (Word32 L_varl) Extraction high Wordld extract_l (Word32 L_varl) Extraction low Wordl6 round (Word32 L_varl) Round up Word32 L_mac (Word32 L_var3, Wordl6 vari, Wordl6 var2) Mac Word32 L_msu (Word32 L_var3, Wordl6 vari, Wordl6 var2) MSU Word32 L_macNs (Word32 L_var3, Wordl6 vari, Wordl6 var2) Mac without sat Word32 L_msuNs (Word32 L_var3, Wordl6 vari, Wordl6 var2) Msu without sat Word32 L_add (Word32 L_varl, Word32 L_var2) Add Word32 L_sub ( Word32 L_varl, Word32 L_var2) Long subtraction Word32 L_add_c (Word32 L_varl, Word32 L_var2) Long addition with c Word32 L_sub_c (Word32 L_varl, Subtraction Table 11: Basic operations used in simulation ANSÍ C Operation Description Word32 L_var2) Long with c Word32 L_negate (Word32 L_varl) Long negation Wordl6 mult_r (Wordld vari, Wordl6 var2) Multiplication with rounding Word32 L_shl (Word32 L_varl, Wordl6 var2) D long left offset Word32 L_shr (Word32 L_varl, Wordl6 var2) Long right scroll Wordl6 shr_r (Wordl6 vari, Wordl6 var2) Right scroll with rounding Wordl6 mac_r (Word32 L_var3, Wordl6 vari, Wordl6 var2) Mac with rounding Wordl6 msu_r (Word32 L_var3, Wordl6 vari, Wordl6 var2) Msu with rounding Word32 L_deposit_h (Wordl6 vari) 16 bit vari - MSB Word32 L_deposit_l (Wordld vari) 16 bit vari - LSB Word32 L_shr_r (Word32 L_varl, Wordld var2) Long right scroll with rounding Word32 L_abs (Word32 L_varl) abs long Word32 L sat (Word32 L vari) Long saturation Table 11: Basic operations used in simulation ANSÍ C Operation Description Wordl6 norm_s (Wordl6 vari) Short rule Wordl6 div_s (Wordl6 vari, Wordl6 var2) Short division Wordl6 nora_l (Word32 L_varl) Long rule Table 12: Summary of Tables File Name of Tables Size ab_hup. c tab_hup .8 28 tab_hup.c tab_hup_l 112 inter_3. c inter_3. c 13 pred_it3. c inter_3 31 lspcb. tab lspcabl 128 X 10 lspcb. tab lspcb2 32 x 10 lspcb. tab fg 2 x 4 x 10 lspcb. tab fg_sum 2 x 10 lspcb. tab fg_sum_inv 2 x 10 qua_gain. tab gbkl 8 x 2 qua_gain. tab gbk2 16 x 2 qua_gain. tab mapl 8 qua_gain. tab imapl 8 qua_gain. tab map2 16 qua_gain. tab ima21 16 window. tab window 240 lag_wind. tab lag_h 10 lag_wind. tab lag_l 10 Table 12: Summary of Tables File Name of Tables Size grid.tab grid 61 inv_sqrt. tab table 49 log2.tab table 33 lsp_lsf.tab table 65 lsp_lsf.tab slope 64 pow2. tab table 33 acelp.h ld8k.h typedef .h Table 12: Summary of Tables (continued) File Description tab_hup. c Upstream sampling filter for post_filter tab_hup. c Upstream sampling filter for post-filter inter_3. c FIR filter to interpolate the correlation pred_it3.c FIR filter to interpolate the last excitation ispcb. Tab LSP quantifier (first stage) lspcb. tab LSP quantifier (second stage) lspcb. tab Predictors MA in LSP VQ lspcb. tab Used in LSP VQ lspcb. tab Used in LSP VQ qua_gain. tab Code encryption GA in VQ gain qua_gain. tab Code encryption GB in VQ gain Table 12: Summary of Tables (continued) File Description qua_gain. tab Used in VQ gain qua_gain. tab Used in VQ gain qua_gain.tab Used in VQ gain qua_gain. tab Used in VQ gain window. tab Analysis window LP lag_wind. tab Delay window for bandwidth expansion (upper part) lag_wind.tab Delay window for bandwidth expansion (lower part) grid. tab Grid points in LP conversion to LSP inv_sqrt. tab Search table in inverse square root calculation log2. tab Search table lsp_lsf .tab Search table in calculation logarithm base 2 lsp_lsf. Tab Table of search in conversion LSF to LSP and vice versa pow2. tab Slope of line in conversion LSP to LSF acelp.h Search prototypes for fixed code books ldßk.h prototypes and constants typedef, h Type definitions Table 13 Summary of encoder-specific routines Name File description. acelp_co.c Search for fixed encryption code autocorr.c Calculate autocorrelation for LP analysis az_lsp.c Calculate the LSP from LP coefficients cod_ld8k. c convolute encoder routine. c Convolution operation corr_xy2. c Calculate correlation terms for gain quantification enc_lag3. c Encode adaptive encryption code index g__pitch.c Calculate adaptive encryption code gain gainpred. c Profit forecaster int_lpc. c Interpolation of LSP inter_3. c Interpolation with fractional delay lag_wind.c Formation of delay windows levinson.c Recursion Levinson lspenc.c LSP encoding routine lspgetq.c LSP quantizer lspgett .c Calculate LSP quantizer distortion lspgetw.c Calculate LSP weights lsplast .c Select LSP predictor MA lsppre.c Pre-selection of the first LSP cipher code lspprev.c LSP forecaster routine Table 13 Summary of encoder-specific routines Name File description. lspell .c First stage LSP quantifier lspel2.c Second stage LSP quantizer lspstab.c Stability test for LSP quantifier pitch_fr .c Closed loop pitch search pitch_ol .c Open loop pitch search pre_proc. c Pre-processing (HP filtering and scale adjustment) pwf .c Calculation of perceptual weighting coefficient qua_gain. c Gain quantifier qua_lsp.c LSP quantifier relspwe.c LSP quantifier Table 14: Summary of decoder-specific routines Name File description d_lsp.c Decode LP information de_acelp. c Decode algebraic code dec_gain. c Decode gains dec_lag3. c Decode adaptive cipher code index dec_ld8k.c Decoder routine lspdec.c LSP decode routine Table 14: Summary of decoder-specific routines Name File description post_pro.c Post-processing (for HP filtering and scaling) pred_lt3.c Generation of adaptive encrypted code pst, c Post filtering routines Table 15: Overview of general routines Name Description of basicop2 file. c Basic operators bits.c Routines for manipulating gainpred bits. c Profit forecaster int_lpc. c Interpolation of the LSP inter_3. c Fractional delay interpolation lsp_az .c Calculate LP from LSP coefficients lsp_l-2.c Conversion between LSP and LSF lsp_lsf2.c High precision conversion between LSP and LSF lspexp.c LSP coefficient expansion lspstab.c Stability test for LSP quantizer p__parity.c Calculate parity of tone pred_lt3.c Generation of adaptive encrypted code randos. c Random generator residu.c Calculate residual signal syn_filt. c Synthesis filter Table 15: Summary of general routines Name File description weight_a. c LP bandwidth expansion coefficients It is noted that in relation to this date, the best method known by the applicant to carry out the aforementioned invention is that which is clear from the present description of the invention. Having described the invention as above, property is claimed as contained in the following:

Claims (19)

  1. CLAIMS 1. Method for use in a speech processing system that includes a first portion comprising an adaptive encrypted code and a corresponding adaptive encrypted code amplifier and a second portion comprising a fixed encrypted code coupled to a pass filter, The pass filter comprises a delay memory coupled to a step filter amplifier, the method is characterized in that it comprises: determining the step filter gain based on a periodicity measurement of a speech signal, and amplifying samples of a signal in the pass filter based on the given step filter gain.
  2. 2. Method according to claim 1, characterized in that the adaptive codebook gain is delayed by a sub-frame.
  3. 3. Method according to claim 1, characterized in that the signal that reflects the gain of the adaptive code is delayed in time.
  4. 4. Method according to claim 1, characterized in that the signal that reflects the gain of the adaptive code includes values that are greater than or equal to a lower limit and less than or equal to an upper limit.
  5. 5. Method according to claim 1, characterized in that the speech signal comprises a coded speech signal.
  6. 6. Method according to claim 1, characterized in that the speech signal comprises a synthesized speech signal.
  7. 7. A speech processing system characterized in that it comprises: a first portion that includes an adaptive encrypted code and a means for applying an adaptive encrypted code gain, and a second portion that includes a fixed encrypted code, a pass filter in wherein the through filter includes a means for applying a step filter gain, and wherein the improvement is characterized in that it comprises: means for determining the step filter gain, based on a periodicity measurement and a speech signal.
  8. 8. - The speech processing system according to claim 7, wherein the signal that reflects the adaptive codebook gain is delayed by a sub-frame.
  9. 9. - The speech processing system according to claim 7, characterized in that the step filter gain is equal to the delayed adaptive codebook gain.
  10. 10. The speech processing system according to claim 7, characterized in that the gain is limited to a range of values greater than or equal to 0.2 and less than 0.8, and within that range, comprises an encrypted code gain adaptive delayed.
  11. 11. - The speech processing system according to claim 7, characterized in that the signal that reflects the adaptive codebook gain is limited to a range of values greater than or equal to 0.2 and less than 0.8 and within that range comprises an adaptive codebook gain.
  12. 12. The speech processing system according to claim 7, characterized in that the first and second portions generate first and second output signals and wherein the system further comprises: means for adding the first and second output signal; and a linear forecast filter, coupled to the summing means, to generate a speech signal in response to the first and second signals summed.
  13. 13. - The speech processing system according to claim 12, characterized in that it also comprises a post-filter to filter the speech signal generated by the linear forecast filter.
  14. 14. - The speech processing system according to claim 7, characterized in that the speech processing system is used in a speech coder.
  15. 15. - The speech processing system according to claim 7, characterized in that the speech processing system is used in a speech decoder.
  16. 16. - The speech processing system according to claim 5, characterized in that the means for determining whether they still comprise a memory for delaying a signal that reflects the adaptive codebook gain used in the first portion.
  17. 17. - A method for determining a gain of a pass filter for use in a speech processing system, the system includes a first portion comprising a suitable adaptive encrypted code and corresponding adaptive code amplifier, and a second portion comprising a fixed encrypted code coupled to a pass filter, the pass filter comprises a delay memory coupled with a step filter amplifier to apply the determined gain, the speech processing system for processing a speech signal the method is characterized because it comprises: determining the step filter gain based on the periodicity of the speech signal.
  18. 18. A method for use in a speech processing system, including a first portion comprising an adaptive encrypted code and a corresponding adaptive encrypted code amplifier of a second portion comprising a fixed encrypted code coupled to a pass filter the pass filter comprises a delay memory coupled to a pass filter amplifier, the method is characterized in that it comprises: retarding the gain of adaptive encrypted code; determining the step filter gain that is equal to the delayed adaptive codebook gain, except when the adaptive codebook gain is either less than 0.2 or greater than 0.8, in which cases the step filter gain is set equal to 0.2 or 0.8, respectively; and amplifying samples of a signal in the pass filter based on the filter gain of determined pitch.
  19. 19. - A speech processing system is characterized in that: a first portion including an adaptive encrypted code and means for applying an adaptive encrypted code gain, and a second portion including a fixed encrypted code, a pass filter, means for applying a second gain, wherein the step filter includes a means for applying a step filter gain and wherein the improvements are characterized in that it comprises: means for determining the step filter gain, means for determining include means for adjust the step filter gain equal to an adaptive code gain, the signal gain is either equal to 0.2 or greater than 0.8 in which case the filter gain is set equal to 0.2 or 0.8 respectively.
MXPA/A/1996/002143A 1995-06-07 1996-06-04 System for speech compression based on adaptable codigocifrado, better MXPA96002143A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/482,715 US5664055A (en) 1995-06-07 1995-06-07 CS-ACELP speech compression system with adaptive pitch prediction filter gain based on a measure of periodicity
US08482715 1995-06-07

Publications (2)

Publication Number Publication Date
MX9602143A MX9602143A (en) 1997-09-30
MXPA96002143A true MXPA96002143A (en) 1998-07-03

Family

ID=

Similar Documents

Publication Publication Date Title
AU700205B2 (en) Improved adaptive codebook-based speech compression system
AU709754B2 (en) Pitch delay modification during frame erasures
EP2054879B1 (en) Re-phasing of decoder states after packet loss
EP0747883B1 (en) Voiced/unvoiced classification of speech for use in speech decoding during frame erasures
EP1088304A1 (en) A frequency domain interpolative speech codec system
JPH09127996A (en) Voice decoding method and device therefor
US6826527B1 (en) Concealment of frame erasures and method
US5913187A (en) Nonlinear filter for noise suppression in linear prediction speech processing devices
EP0747884B1 (en) Codebook gain attenuation during frame erasures
JP2001154699A (en) Hiding for frame erasure and its method
JP2968109B2 (en) Code-excited linear prediction encoder and decoder
MXPA96002143A (en) System for speech compression based on adaptable codigocifrado, better
MXPA96002142A (en) Speech classification with voice / no voice for use in decodification of speech during decorated by quad