MXPA96002142A

MXPA96002142A - Speech classification with voice / no voice for use in decodification of speech during decorated by quad

Info

Publication number: MXPA96002142A
Application number: MXPA/A/1996/002142A
Authority: MX
Inventors: Shoham Yair; Kroon Peter
Original assignee: Lucent Technologies Inc
Priority date: 1995-06-07
Filing date: 1996-06-04
Publication date: 1998-07-03

Abstract

The present invention relates to a method for use in a speech decoder including a first portion comprising an adaptive encrypted code and a second portion comprising a fixed encrypted code, the decoder generates a speech excitation signal selectively based on signals of output from the first and second portions when the decoder fails to reliably receive at least a portion of a current frame of the compressed speech information, the method is characterized in that it comprises: classifying a speech signal to be generated by the decoder as periodic or non-periodic, based on the classification of the speech signal, either: generate the excitation signal based on the output signal from the first portion and not on the output signal of the second portion if the Speech signal is classified as periodic, or generate the excitation signal based on the output signal from the second gives portion and not on the output signal of the first portion, if the speech signal is classified as non-periodic

Description

VOICE / VOICE SPEECH CLASSIFICATION FOR USE IN DECODING SPEAKING DURING DELETING OF TABLES FIELD OF THE INVENTION The present invention relates in general to assemblies for speech coding, to be used in communication systems and more particularly to the forms in which they are used. which these speech coders work in the case of transmission type burst errors. BACKGROUND OF THE INVENTION Many communication systems, such as personal communication systems and cellular telephony, rely on wireless channels to communicate information. In the course of communicating this information, wireless communication channels may suffer from several sources of error such as multipath fading. These sources of error can cause, among other things, the problem of deletion of the frame. Erase refers to the total loss or total or partial corruption of a set of bits communicated to a receiver. A frame is a predetermined fixed number of bits that can be communicated as a block through a communications channel. A frame can therefore represent a time segment of a speech signal. REF: 22576 If a bit frame is totally lost, then the receiver has no bits to interpret. Under these circumstances, the receiver may produce an insignificant result. If a received bit frame is corrupted and therefore not reliable, the receiver can produce a severely distorted result. In any case, the bit frame can be considered as "erased" since the frame is not available or is unusable by the receiver. As the demand for wireless system capacity has increased, a need has arisen to make the best use of the available wireless system bandwidth. One way to improve the efficient use of system bandwidth is to use a signal compression technique. For wireless systems that carry speech signals, speech compression techniques (or speech coding) can be used for this purpose. These speech coding techniques include analysis-by-synthesis speech coders, such as the well-known Code Encoder Excited by Code (or CELP) speech coder. The problem of packet loss in packet switched networks employing speech coding assemblies is very similar to frame erasure in the wireless context. That is, due to packet loss, a speech decoder may already fail to receive a frame or receive a frame that has a significant number of bits missing. In any case, the speech decoder presents itself with the same essential problem - the need to synthesize speech despite the loss of compressed speech information. Both "frame erasure" and "packet loss" refer to a communication channel (or network) problem that causes the loss of transmitted bits. For purposes of this description, the term "frame erasure" can be considered to include "packet loss". Among other things, CELP speech coders employ an encrypted code of excitation signals, to encode an original speech signal. These excitation signals adjusted in scale by an excitation gain are used to "excite" filters that synthesize a speech signal (or some precursor to a speech signal) in response to the excitation. The synthesized speech signal is compared to the original speech signal. The encrypted code excitation signal is identified, which produces a synthesized speech signal that most closely corresponds to the original signal. The gain representation and the encrypted code index of the identified excitation signal (which is often an index of gain code) are then communicated to a CELP decoder (depending on the type of CELP system, other types of information, such as filter coefficients with linear prediction (LPC) can communicate equally). The decoder contains books of codes identical to those of the CELP encoder. The decoder uses the transmitted indices to choose an excitation signal and gain value. This excitation signal set at the selected scale is used to drive the LPC filter of the decoder. Thus excited, the LPC filter of the decoder generates a decoded (or quantized) speech signal - the same speech signal that was previously determined closer to the original speech signal. Some CELP systems also use other components, such as the periodicity model (for example, a passive predictive filter or an adaptive encrypted code). This model simulates the frequency of speech with voice. In these CELP systems, parameters referring to these components must also be sent to the decoder. In the case of an adaptive codebook, signals representing a step period (delay) and adaptive codebook gain should also be sent to the decoder, such that the decoder can reproduce the operation of the adaptive codebook in the synthesis process of speech. Wireless systems and others that employ speech coders may be more sensitive to the problem of frame erasure than systems that do not compress speech. This sensitivity is due to the reduced redundancy of coded speech (compared to uncoded speech) making the possible loss of each bit transmitted, more significant. In In the context of a CELP, speech coders experiencing frame erasure, cipher excitation code indices, and other signals that represent speech in the frame may already be loss or remain substantially corrupted while avoiding proper speech synthesis in decoder. For example, because the deleted frame (s), the CELP decoder will not be able to reliably identify which entry in its encrypted code will be used to synthesize speech. As a result, the performance of the speech coding system can be significantly degraded. Because the frame erasure causes the loss of cipher indices of the excitation signal, the LPC coefficients, the adaptive ciphered code delay information, and the fixed and adaptive ciphered code gain information, the normal techniques are ineffective to synthesize an excitation signal in a speech decoder. Therefore, these normal techniques should be replaced by alternative measures. BRIEF DESCRIPTION OF THE INVENTION According to the present invention, a speech decoder includes a first portion comprising an adaptive encrypted code and a second portion comprising a fixed encrypted code. The decoder generates a speech excitation signal selectively based on the output signals from the first and second portions, when the decoder fails to reliably receive at least a portion of a current frame of compressed speech information. The decoder does this by classifying the speech signal to be generated as periodic or non-periodic and then generating an excitation signal based on this classification. If the speech signal is classified as periodic, the excitation signal is generated based on the output signal from the first portion and not on the output signal from the second portion. If the speech signal is classified as non-periodic, the excitation signal is generated based on the output signal from the second portion and not on the output signal from the first portion. See sections II.B.l. and 2. of the Detailed Description for a discussion concerning the present invention. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 presents a block diagram of a G.729 Project decoder modified in accordance with the present invention. Figure 2 presents an illustrative wireless communication system employing the embodiment of the present invention presented in Figure 1.

DETAILED DESCRIPTION I. Introduction The present invention relates to the operation of a speech coding system that undergoes frame erasure - that is, the loss of a group of consecutive bits in the compressed bitstream, this group is ordinarily used to synthesize speech. The description that follows relates to characteristics of the present invention applied in illustrative form to a speech coding system of 8 kbits / s CELP proposed to the ITU for adoption as its international standard G.729. For the convenience of the reader, a draft project recommendation for the G.729 standard is appended to this as an Appendix (the project will be referred to here as the "G.729 Project"). Project G.729 includes detailed descriptions of the speech coder and decoder (see Project G.729, sections 3 and 4, respectively). The illustrative embodiment of the present invention is directed to modifications of the normal G.729 decoder operation, as detailed in Project G.729 section 4.3. No modifications are required to the encoder to implement the present invention. The applicability of the present invention to the proposed G.729 standard however those with ordinary skill in the art will appreciate that features of the present invention have applicability to other speech coding systems. The knowledge of erasing one or more frames is a feeding signal e, to the illustrative embodiment of the present invention. This knowledge can be obtained in any of the conventional ways well known in the art. For example, totally or partially corrupted frames can be detected through the use of a conventional error detection code. When it is determined that a box has been deleted, e = 1 and special procedures are initiated as described below. 'Otherwise, if they are not erased (e = 0) normal procedures are used. Conventional error protection codes can be implemented as part of a conventional radio transmission / reception subsystem and a wireless communication system. In addition to the application of the complete set of remedy measures applied as a result of an erasure (e = 1), the decoder employs a subset of these measures when a parity error is detected. A parity bit is calculated based on the step delay index of the first two sub-frames of a coded speech frame. See Project G.729 Section 3.7.1. This parity bit is calculated by the decoder and verifies against the parity bit that is received from the encoder. If the two parity bits are not the same, the delay index is said to be corrupt (PE = l, in the mode) and special processing of step delay is invoked. For clarity of explanation, the illustrative embodiment of the present invention comprising individual functional blocks is presented. The functions presented by these blocks can be provided through the use of either shared or dedicated physical equipment, including but not limited to hardware capable of running software. For example, the blocks presented in Figure 1 can be provided by a single shared processor. (The use of the term "processor" shall not be considered to refer exclusively to physical equipment capable of running software). Exemplary embodiments may comprise physical digital signal processor (DSP) equipment, such as the AT &T DSP16 or DSP32C, read-only memory (ROM) for storing software that performs the operations discussed below and random access memory (RAM) to store DSP results. Modalities of physical equipment with very large scale integration (VLSI) as well as custom VLSI circu, in combination with a general purpose DSP circuit can also be provided.

II. An Illustrative Modality Figure 1 presents a block diagram of a modified G.729 Project decoder according to the present invention (Figure 1 is a version of Figure 3 of the draft G.728 standard, which has been increased to more clearly illustrate characteristics of the claimed invention). In normal operation (ie without experiencing frame erasure), the decoder operates in accordance with Project G.729 as described in sections 4.1 - 4.2. During frame erasure, the operation of the mode of Figure 1 is increased by special processing to replenish by erasing information from the encoder. A. Normal Decoder Operation The encoder described in Project G.729 provides a table of data representing compressed speech every 10 ms. The table comprises 80 bits and is detailed in the Tables 1 and 9 of Project G.729. Each 80-bit compressed speech frame is sent over a communications channel to a • decoder that synthesizes speech signals (representing two subframes) based on the picture produced by the encoder. The channel in which the frames communicate (not shown) may be of any type (such as conventional telephone networks, packet-based networks, cellular or wireless networks, ATM networks, etc.) and / or may comprise a medium storage (such as magnetic storage, ROM or semiconductor RAM, optical storage such as CD-ROM, etc. ). The illustrative decoder of Figure 1 includes both an adaptive encrypted code portion (ACB) and a fixed encrypted code portion (FCB). The ACB portion includes ACB 50 and a gain amplifier 55. The FCB portion includes an FCB 10, a pitch forecast filter (PPF) 20, and gain amplifier 30. The decoder decodes the transmitted parameters (see Project G.729). Section 4.1) and perform synthesis to obtain reconstructed speech. The FCB 10 operates in response to an index I, sent by the encoder. The index I is received through the switch 40. The FCB 10 generates a vector, c (n), of length equal to a subframe. See Project G.729 Section 4.1.2. This vector is applied to the PPF 20. PPF 20 operates to result in a vector for application to the gain amplifier FCB 30. See Project G.729 Sections 3.8 and 4.1.3. The amplifier, which applies a gc gain from the channel, generates a scaled version of the vector produced by PPF 20. See Project G.729 Section 4.1.3. The output signal of the amplifier 30 is supplied to the adder 85 (via the switch 42). The gain applied to the vector produced by PPF 20 is determined based on the information that is provided by the encoder. This information is reported as indexes of encrypted code The decoder receives these indices and synthesizes a factor for gain correction T. See Project G.729 Section 4.1.4. This factor for gain correction, r, is supplied to the processor with code vector forecast power (E-) 120. The processor E-120 determines a value of the predicted error energy with code vector, i? according to the following expression: R < n > = 20 log T [dB] The value of R is stored in a processing buffer that holds the five most recent (successive) values of R. R < n > represents the predicted error energy of the fixed code vector in sub-frame n. The average predicted withdrawal energy of the code vector is formed as a weighted sum of past values of R: 1 = 1 where b = [0.68 0.58 0.34 0.19] and where the past values of R are obtained from the buffer. This predicted energy is then outputted from the processor 120 to a predicted gain processor 125. The processor 125 determines the current energy of the code vector supplied to the codebook 10. This is done according to the following expression: where i indexes the vector samples. The predicted gain is then calculated as follows: where E is the average energy of FCB (for example 30 dB). Finally, the current scale factor (or gain) is calculated by multiplying the received gain correction factor, T, by the predicted gain g 'c in the multiplier 130. This value is then supplied to the amplifier 30 to scale the fixed encryption code contribution that is provided by PPF 20. The output signal generated by the ACB portion of the decoder is also provided to the adder 85. The portion ACB comprises the ACB 50, which generates an excitation signal v (n), of length equal to a subframe based on the past excitation signals and the period of passage ACB, M, received (via the switch 43) from the encoder through the channel. See Project G.729 Section 4.1.1. This vector is scaled by the amplifier 250 based on the gain factor, gp, received in the channel. This scaled vector is the output of the ACB portion.

The adder 85 generates an excitation signal u (n), in response to signals from the FCB and ACB portions of the decoder. The excitation signal u (n) is applied to an LPC synthesis filter 90 that synthesizes a speech signal based on LPC coefficients to the t that is received on the channel. See Project G.729 Section 4.1.6. Finally, the output of the synthesis filter LPC 90 is supplied to a post processor 100 that performs adaptive post-filtering (see Project G.729 Sections 4.2.1 -4.2.2), high pass filtering (see Project G.729 Section 4.2.5), and adjustment in ascending scale (see Project G.729 Section 4.2.5). B. Excitation Signal Synthesis During Frame Erase In the presence of frame erasures, the decoder of Figure 1 does not receive reliable information (if it receives any de facto) from which an excitation signal u (n) can be synthesized. As such, the decoder will not know which vector of signal samples will be extracted from the codebook 10, or what is the frame delay value to be used by the adaptive encrypted code 50. In this case, the decoder must obtain a signal from excitation suJbsti uta to use in synthesizing a speech signal. The generation of a substitute excitation signal during periods of frame erasure depends on whether the erased frame is classified as voice (periodic) or voiceless (aperiodic). An indication of periodicity for the erasure table is obtained from the post processor 100, which classifies each table adequately received as periodic or aperiodic. See Project G.729 Section 4.2.1. The deleted table is carried to have the same periodicity classification as the previous table processed by the postfilter. The binary signal representing periodicity, v, is determined according to the postfilter variable gp? T *. The signal v = 1 if gp? T > 0; otherwise v = 0. As such, for example if the last good table is classified as periodic, v = 1; otherwise v = 0. 1. Deleted Frames Representing Periodic Speech For an erased frame (e = l) that is considered to have represented speech that is periodic (v = 1), the contribution of the fixed cipher code is set to zero . This is achieved by the switch 42 which switches states (in the direction of the arrow) from its normal (derived) operating position which couples the amplifier 30 to the adder 85, to a position that decouples the fixed code contribution from the signal of excitation u (n). This state switching is achieved according to the control signal developed by gate Y (AND) 110 (which tests the condition that the frame is erased, e = 1, and was a periodic frame v = 1). On the other hand, with the contribution of adaptive encrypted code, it remains in its normal operative position by the switch 45 (since e = 1 but not_v = 0). The pitch delay M, used by the adaptive scrambled code during an erased frame is determined by the delay processor 60. The delay processor 60 stores the most recently received pitch delay from the encoder. This value is overwritten with each received successive tone delay. For the first deleted frame following a "good" frame (received correctly), the delay processor 60 generates a value for M that is equal to the pitch delay of the last good frame (ie, previous frame). To avoid excessive periodicity, for each successive deletion, the frame processor 60 increases the value of M by one (1). The processor 60 restricts the value of M to be less than or equal to 143 samples. The switch 43 effects the application of the tone delay from the processor 60 to the adaptive coded code 50, by changing the state from its normal operating position to its "frame-by-voice" position in response to an indication of a deletion of a voice box • (since e = 1 and v = 1). The adaptive codebook gain is also synthesized in the case of a deletion of a frame with speech, according to the procedure discussed below in section C. It should be noted that the switch 44 operates identically on the switch 43 since it performs the application of an adaptive adaptive code gain synthesized by changing state from its normal operating position to its "voice box erase" position. 2. Erasing Charts Representing Aperiodic Speech For an erased table (e = 1) that is considered to have speech represented that is aperiodic (v = 0), the contribution of the adaptive cipher code is set to zero. This is achieved by the switch 45 which changes states (in the direction of the arrow) from its normal (derived) operating position which couples the amplifier 55 to the adder 85, to a position that decouples the adaptive codebook contribution from the excitation signal u (n). This state switching is achieved in accordance with the control signal developed by gate Y (AND) 75 (which tests the condition that the frame is cleared e = 1, and that it was an aperiodic frame, not_v = 1). On the other hand, the contribution of the fixed coded code is maintained in its normal operating position by the switch 42 (since e = 1 but v = 0). The fixed encryption code index I, and the encrypted code vector signal are not available due to deletion. In order to synthesize a fixed cipher code index and a sign index from which an encrypted code vector c (n) can be determined, a random number generator 45 is employed. The output of the random number generator 45 is coupled to the fixed encrypted code 10 through the switch 40. Switch 40 is normally a state that couples the index I and a sign information to the fixed encrypted code. However, gate 47 applies a control signal to the switch, which causes the switch to change state when an aperiodic frame deletion occurs (e = 1 and not_v = 1). The random number generator 45 uses the function: seed = seed * 31821 + 13849 to generate the fixed encrypted code index and sign. The initial seed value for the generator 45 is equal to 21845. For a given encoder sub-frame, the encrypted code index is the least significant 13 bits of the random number. The random sign is the least significant 4 bits of the next random number. In this way, the random number generator is run twice for each required fixed codebook vector. It should be noted that an interference vector could have been generated on a sample-by-sample basis instead of using the random number generator in combination with the FCB. The fixed encryption code gain is also synthesized in the case of an aperiodic frame deletion, according to the procedure discussed below in Section D. It should be noted that switch 41 operates identically to switch 40, since effect the application of a gain of fixed encrypted code synthesized by changing state from its normal operating position to its "voice box erase" position. Since PPF 20 adds periodicity (when the delay is less than a subframe), PPF 20 will not be used in the case of a deletion of an aperiodic frame. Therefore, switch 21 chooses either the output of FCB 10 when e = 0 or the output of PPF 20 when e = 1. C. Coefficients LPC Filter for Deleted Frames The excitation signal u (n), synthesized during a The deleted frame is applied to the LPC 90 synthesis filter. As with other decoder components that depend on encoder data, the LPC 90 synthesis filter must have substitute LPC coefficients, a? l during deleted pictures. This is achieved by repeating the LPC coefficients of the last good frame. The LPC coefficients received from the encoder in a non-erased frame are stored by the memory 95. Recently received LPC coefficients replace previously received coefficients in the memory 95. Upon occurrence of a frame erasure, the coefficients stored in the memory 95 are supplied to the frame filter. LPC synthesis by the switch 46. The switch 46 is normally directed to couple the LPC coefficients received in a good frame to the filter 90. However, in the case of a deleted frame (e = 1), the switch changes state (in the direction of the arrow) by coupling memory 95 with filter 90. D. Attenuation of Cipher Code Gains Fixed and Adaptive As discussed above, both adaptive and fixed codebooks 50 have an amplifier corresponding gain 55, 30 which applies a scaling factor to the encrypted code output signal. Ordinarily, the values of the scale factors for these amplifiers are supplied by the encoder. However, in the case of a frame erasure, • scaling factor information is not available from the encoder. Therefore, the scaling factor information must be synthesized. For both the fixed and adaptive encrypted code, the synthesis of the scale factor is achieved by processors 65 and 115 that adjust in scale (or attenuate) the value of the scale factor used in the previous sub-frame. Thus, in the case of a frame erasure following a good frame, the value of the scale factor of the first subframe of the erased frame to be used by the amplifier is the second scale factor from the good frame multiplied by an attenuation factor. In the case of successive deletion sub-boxes, the latter sub-frame deleted (sub-frame n) uses the value of the scale adjustment factor from the previous erased sub-frame (sub-frame n-l) multiplied by the factor of attenuation. This technique is used, no matter how many successive deleted frames (and subframes) occur. The attenuation processors 65, 115 store each new scale factor, whether received in a good frame or synthesized from an erased frame, in the case that the next subframe will be an erased subframe. Specifically, the attenuation processor 115 synthesizes the fixed encryption code gain gc, for the subframe erased n according to: gc in > = 0.98 gc (n-1 '.) The attenuation processor 65 synthesizes the adaptive cipher code gain, gp for the erasing subframe n according to: gp M = 0.9 gp (n-1>. limits (or trims) the value of the synthesized gain to be less than 0.9.The process of attenuating gains is made to avoid undesired perceptual effects E. Attenuation of Gain Predictor Memory As discussed above, there is a buffer that It is part of the E 120 processor, which stores the five most recent forecast error energy values This buffer is used to predict a value for the predicted energy of the code vector from the fixed cipher code.

However, due to frame erasure, there will be no information communicated to the decoder from the encoder from which new values of the prognostic error power. Therefore, these values will have to be synthesized. This synthesis is achieved by the E-120 processor according to the following expression: ? (n) = (0.25? (n)) - 4.0.

In this way, a n IIuI CeTvG? O vras_l1 ro-r- Hdcea / c- »aalT / c-nuil 1a a .c-I? oTmW? o the average of the four previous values of R minus 4 dB. The attenuation of the value of R is done to ensure that once a good frame is received, no undesirable speech distortion is created. The value of the synthesized R is limited so that it does not fall below -14 dB. F. An Illustrative Wireless System As stated above, the present invention has application to wireless speech communication systems. Figure 2 presents an illustrative wireless communication system, employing one embodiment of the present invention. Figure 2 includes a transmitter 600 and a receiver 700. An illustrative embodiment of the transmitter 600 is a wireless base station. An illustrative embodiment of the receiver 700 is a mobile user terminal, such as a cellular or wireless telephone, or other personal communication system device.

(Naturally, a wireless base station and user terminal may also include receiver and transmitter circuitry, respectively). The transmitter 600 includes a speech encoder 610, which may for example be an encoder according to the G.729 Project. The transmitter further includes a conventional channel encoder 620 to provide error detection capability (or detection and correction); a conventional modulator 630; and conventional radio transmission circuitry; all well known in the art. The radio signals transmitted by the transmitter 600 are received by the receiver 700 through a transmission channel. Due for example to possible destructive interference of various multiple path components of the transmitted signal, the receiver 700 may be in a deep fading, preventing clear reception of transmitted bits. Under these circumstances, frame erasure may occur. The receiver 700 includes conventional radio receiver circuitry 710, conventional demodulator 720, channel decoder 730, and a speech decoder 740 in accordance with the present invention. It should be noted that the channel decoder generates a frame erasure signal when the channel decoder determines the presence of a substantial number of bit errors (or bits not received). Alternately (or in addition to a frame erasure signal to from the channel decoder), the demodulator 720 can provide a frame erasure signal to the decoder 740. G. Discussion Although specific embodiments of this invention have been illustrated and described herein, it will be understood that these embodiments are merely illustrative of the many Possible specific assemblies that can be designed in application of the principles of the invention. Numerous and various other assemblies may be designed in accordance with these principles, by those of ordinary skill in the art, without departing from the spirit and scope of the invention. In addition, although the illustrative embodiment of the present invention refers to "amplifiers" of encrypted code, it will be understood by those of ordinary skill in the art that this term encompasses the scaling of digital signals. Furthermore, this scaling can be achieved with scale factors (or gains) that are less than or equal to one (including negative values), as well as greater than one. P. Kroon 5-10 INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATIONS STANDARDIZATION SECTION Date: June 1995 Original: E STUDIES GROUP 15 CONTRIBUTION - Q. 12/15 RECOMMENDATION PROJECT G.729 Speech Coding 8 kbits / s using Linear Predictive Coding Excited by Code - Algebraic - Conjugate Structure (CS-ACELP) June 7, 1995 version 4.0 Note: Until this recommendation is approved by ITU, neither the C code nor the test vectors will be available from ITU. To have the source code C, contact: Mr. Gerhard Shroeder, Rapporteur SG15 / Q.12 Deutsche Telekom AG, Postfach 10003, 64276 Darmstadt, Germany Tel .: +49 6151 83 3973, Fax: +49 6151 83 7828, Email: gerhard . shroederofzl3 fz .dbp. from Contents 1. Introduction 30 2. General description of the encoder 31 2. 1 Encoder 33 2. 2 Decoder 36 2. 3 Delay 37 2. 4 Description of speech coder 37 2.5 Notational conventions 38 3. Functional description of the encoder 43 3. 1 Pre-processing 43 3. 2 Analysis and quantification of linear projection 44 3.2.1 Window formation and self-correlation calculations 45 3.2.2 Levinson-Durbin algorithm 47 3.2.3 LP to LSP conversion 48 3.2.4 Quantification of LSP coefficients 50 3.2.5 Interpolation of the LSP coefficients 54 3.2.6 LP to LSP conversion 55 3. 3 Perceptual weighting 56 3.4 Open loop tone analysis 58 3. 5 Calculation of impulse response 60 3. 6 Calculation of the target signal 60 3. 7 Searching the adaptive encryption code 61 3. 7.1 Generation of the adaptive encrypted code vector 65 3. 7.2 Calculation of encrypted codeword for adaptive codebook delays 65 3. 7.3 Calculation of adaptive encrypted code gain 66 3.8 Fixed encrypted code: structure and search 67 3. 8.1 Fixed code search procedure 69 3. 8.2 Calculation of code word for fixed encrypted code 72 3.9 Quantization of gain 73 3. 9.1 Profit Forecast 74 3. 9.2 Search of encrypted code for quantification of gain 76 3. 9.3 Calculation of encrypted code for gain quantizer 77 3. 10 Memory upgrade 78 3. 11 Encoder and decoder initialization 79 4. Functional description of the decoder 79 4. 1 Parameter decoding procedure 80 4.1.1 Decoding LP filter parameters 82 4.1.2 Decoding of adaptive encrypted code vector 82 4.1.3 Decoding the fixed encrypted code vector 83 4.1.4 Decoding the gains of the fixed and adaptive encrypted code 84 4.1.5 Calculation of the parity bit 84 10 4.1.6 Calculation of reconstructed speech 85 4.2 Post-processing 85 4.2 .1 Post-tone filter 86 4.2.2 Short-term post-filter 88 4.2.3 Compe 4.3.1 Repetition of LP filter parameters 92 4.3.2 Gain attenuation of the fixed and adaptive code 92 4.3.3 Attenuation of the gain predictor memory 93 4. 3.4 Generation of the replacement excitation 93 5. Exact bit description of the CS-ACELP 94 encoder . 1 Use of simulation software 95 . 2 Organization of simulation software 95 1. Introduction This recommendation contains the description of an algorithm for the coding of speech signals at 8 kbits / s using the Linear Predictive Coding Excited by Code -Algebraic- of Conjugate Structure (CS-ACELP). This encoder is designed to operate with a digital signal obtained first by filtering telephone bandwidth (ITU Rec. G.710) of the analog power signal, then sample it at 8000 Hz, by conversion to 16-bit linear PCM for power to the encoder. The decoder output must be converted back to an analog signal by similar means. Other power / output characteristics such as those specified by ITU Rec. G.711 for 64 kbit / s PCM data shall be converted to the 16-bit linear PCM when decoding, or 16-bit linear PCM to the appropriate format after decoding . The bit stream from the encoder to the decoder is defined within this standard. This recommendation is organized as follows: Section 2 gives a general profile of the CS-ACELP algorithm. Sections 3 and 4, the principles of the CS-ACELP encoder and decoder are discussed respectively. Section 5 describes the software that defines this encoder in 16-bit fixed-point arithmetic. 2. Coding Overview The CS-ACELP encoder is based on the code-driven linear predictive coding (CELP) model. The encoder operates in 10 ms speech frames corresponding to 80 samples at a sampling rate of 8000 samples / second. For each frame of 10 msec, the speech signal is analyzed to extract the parameters of the CELP model (LP filter coefficient, indices and gains of fixed and adaptive code). These parameters are encoded and transmitted. The bit allocation of the encoder parameters is illustrated in Table 1. In the decoder, these parameters are used to retrieve the synthesis and excitation filter parameters. Table 1: Bit allocation of the 8 kbit / CS-ACELP algorithm (10 msec frame). Parameter Code Subframe Subframe Total of word 1 2 per Frame LSP LO, Ll, L2, L3 18 Adaptive codebook delay P1, P2 8 5 13 Delay parity PO 1 1 Fixed codebook index Cl, C2 13 13 26 Fixed code book signal SI, S2 4 4 8 Table 1: Assignment of the algorithm 8 kbits / CS-ACELP (10 msec frame). (Continued) Parameter Code Subframe Subframe Total word 1 2 per Table Code book gains (stage 1) GA1, GA2 3 3 6 Code book winnings (stage 2) GB1. GB2 4 4 8 Total 80 Speech is reconstructed by filtering this excitation through the LP synthesis filter, as illustrated in Figure 1. The short-term synthesis filter is based on a linear tenth order (LP) forecast filter.

Figure 1: Block diagram of the conceptual CELP synthesis model. The tone synthesis or long-term filter is implemented using the adaptive encryption code approach so-called, for delays less than the length of the sub-frame. After calculating the reconstructed speech, it is further improved by a post-filter. 2.1 Encoder The signal flow in the encoder is illustrated in Figure 2. The power signal is filtered by high pitch and scaled in the pre-processing block.

Figure 2: Signal flow in the CS-ACELP encoder.

The pre-processed signal serves as the power signal for all subsequent analysis. The LP analysis is performed once per 10 ms frame to calculate the LP filter coefficients. These coefficients are converted to line spectrum pairs (LSP) and quantified using two-stage (VQ) predictive quantization with 18 bits. The excitation sequence, is chosen when using the analysis-by-synthesis search procedure in which the error between the original and synthesized speech is minimized, according to a perceptually weighted dispersion measure. This is done by filtering the error signal with a perceptual weighting filter, whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to improve the performance for power signal with a flat frequency response. The excitation parameters (adaptive and fixed cipher code parameters) are determined per sub-frame of 5 ms (40 samples) each. The quantized and unquantized LP filter coefficients are used for the second sub-frame, while in the first sub-frame, the interpolated LP filter coefficients are used (both quantized and unquantized). An open loop tone delay is estimated once per frame of 10 ms, based on the perceptually weighted speech signal. Then, the following operations are repeated for each sub-table. The objective signal x (n), is calculated by filtering the residual LP through the weighted synthesis filter W (z) / Á (z). The initial states of these filters are updated by filtering the error between residual LP and excitation. This is equivalent to the common approach of subtracting the zero feed response of the weighted synthesis filter from the weighted speech signal. The impulse response h (n) of the filter-weighted synthesis is calculated. The closed loop tone analysis is then performed (to find the adaptive cipher code delay and gain) using the target x (n) and the impulse response h (n) when searching for the open loop tone delay value. A fractional tone delay with resolution 1/3 is used. The pitch delay is encoded with 8 bits in the first sub-frame and differentially encoded with 5 bits in the second sub-frame the target signal x (n) is updated by removing the adaptive code contribution (filtered adaptive code vector) ) and this new objective x2 (n) is used in the search for fixed algebraic cipher code (to find the optimal excitation.) An algebraic code with 17 bits is used for the excitation of fixed encrypted code. The fixed memories are quantized in vectors with 7 bits (with forecast MA applied to the gain of the fixed encrypted code.) Finally, the filter memories are updated using the determined excitation signal. 2. 2 Decoder The signal flow in the decoder is illustrated in Figure 3, first, the parameter indices are extracted from the received bit stream. These indices are decoded to obtain the encoder parameters that correspond to a speech frame of 10 ms. These parameters are the LSP coefficients, the 2 fractional pitch delays, the 2 fixed codebook vectors, and the 2 sets of fixed and adaptive codebook gains. The LSP coefficients are interposed and converted to LP filter coefficients for each sub-frame. Then, for each sub-frame of 40 samples, the following steps are carried out: • The excitation is built by adding the adaptable and fixed ciphered code vectors adjusted in scale for their respective gains.

Figure 3: Signal flow in the CS-ACELP decoder. • Speech is reconstructed by filtering the excitation through the LP synthesis filter.

• The reconstructed speech signal is passed through a post-processing stage, which comprises an adaptive post-filter based on long-term and short-term synthesis filters, followed by a high-pass filter operation and adjustment in scale. 2.3 Delay This encoder processes speech and other audio signals with a 10 ms frame. In addition, there is an anticipation of 5 ms, which results in a total algorithmic delay of 15 ms.

All additional delays in a practical implementation of this encoder are due to: • Processing time required for encoding and decoding operations. • Communication link transmission time, • Multilayered delay when combining audio data with other data. 2.4 Speech encoder description The description of the speech coding algorithm of this recommendation is made in terms of mathematical operations of fixed point, exact in bits. The ANSÍ C code indicated in section 5, which is an integral part of this recommendation, reflects this descriptive approach of exact fixed point in bits. The mathematical descriptions of the encoder (section 3) and decoder (section 4) can implemented in other diverse ways, which possibly lead to a codec implementation that does not comply with this recommendation. Therefore, the algorithm description of the C code in section 5 will take precedence over the mathematical descriptions of sections 3 and 4, when discrepancies are found. A non-exhaustive set of test sequences that can be used in conjunction with the C code is available from ITU. 2.5 Notational Conventions This document attempts to maintain the following naming conventions. • Code books are denoted by calligraphic characters, (e.g. C) • Time signal is denoted by the symbol and the sample time index (eg g.s (n)). The symbol n is used as an index at the time of the sample. • superscript indexes (eg g &g <; m > ) refer to that variable corresponding to sub-table m.

• Superscripts identify a particular element in a set of coefficients. • A ? identifies a quantized version of a parameter. • Range notations are made using claudátores, where the limits are included ([0.6, 0.9]). • log denotes a logarithm with base 10. Table 2 lists the most relevant symbols used throughout this document: a glossary of the most relevant signals is given in Table 3. Table 2: Glossary of symbols. Name Reference Description 1 / A (Z) Ec. (2) LP synthesis filter Hhl (z) Ec. (1) high pass filter feed # Hpp ((zz) E Ecc .. ((7777)) post- Tone filter Hf (z) Eq. (83) Short-term post-filter Ht (z) EC (85) Tilt compensation filter Hh2 (z) Eq. (90) High output filter PP (() zz)) E EcC .. ((4466)) tone filter W (z) EC. (27) weighting filter Table 4 summarizes variables and relevant dimensions.Constant parameters are listed in Table 5. Acronyms used in this recommendation they are summarized in Table 6.

Table 3: Glossary of Signals. Name Description h (n) Weighted filter response and synthesis impulse. r (k) Self-correlation sequence r '(k) Modified auto-correlation sequence R (k) Correlation sequence sw (n) Weighted speech signal without) Signal signal ss '' ((nn)) Signal signal in window sf (n) Post-filtered output sf (n) Post-filtered output with gain scale adjustment s (n) Reconstructed speech signal rr ((nn)) Residual signal x (n) Target signal x2 (n) Second target signal v (n) Contribution of target cipher code cc ((nn)) Contribution fixed cipher code yin) v (n) * h (n) z (n) c (n) * h (n) u (n) Excitation to synthesis filter LP din) Correlation between objective signal and h (n) eeww ( inn)) Error signal Table 4: Glossary of variables Name Size Description gp 1 Gain of adaptive encryption code gc 1 Gain of fixed encryption code ga 1 Changed gain for post-filter gpat 1 Gain tone for tone post-filter gf 1 Post-filter of short-term gain-term gt 1 Post-tilt filter gain term T op 1 open-loop tone delay aa ,, 1 100 Coefficients LP 10 Reflection coefficients 1 2 Coefficients LAR 10 Standard frequencies LSF. ? r »10 LSP coefficients rrf (kk), 1 111 correlation coefficients W¿ 10 weighting coefficients LSP i, 10 LSP quantizer output Table 5: Glossary of Constants Name Value Description f 8000 sampling frequency f0 60 bandwidth expansion y1 0.94 / 0.98 perceptual weighting factor weight factor filter? 2 0.60 / [0.4-0.7] perceptual weighting filter weighting factor ? n 0.55 post filter weighting factor 7d 0.70 post weighting factor filter ? P 0.50 post weighting factor tone filter? T 0.90 / 0.2 post weighting factor inclination filter C Table 7 fixed coded quality code (algebraic) SECTION 3.2.4 Average prediction code for movement L sseecccciioon 3.2.4 First stage LSP code L2 commissioning 3.2.4 second-stage LSP encryption code (lower part) L3 section 3.2.4 second-stage LSP encryption code (upper part) QA section 3.9 first-stage gain encryption code Table 5: Glossary of Constants (Continued) Name Value Description GB section 3.9 second stage gain encryption code wlag Ec. (6) correlation delay window wlp Ec. (3) LPC analysis window Table 6: Glossary of acronyms. Acronyms Description CELP linear forecast excited by code MA average of movement MSB bit more significant LP linear forecast LSP line spectral pair LSF line spectral frequency VO quantization of vector 3. Functional description of the encoder In this section we describe the following functions of the encoder represented in the blocks in Figure 1. 3.1 Pre-processing As stated in section 2, the power to the speech coder is considered as a 16-bit signal PCM Two pre-processing functions are applied before the encoding process: 1) adjustment in signal scale, and 2) high-tone filtering.

The scale adjustment consists of dividing the power by a factor of two to reduce the possibility of overflow in the fixed point implementation. The high-pitched filter serves as a precaution against undesirable low-frequency components. A second-order O / polo filter with a cutoff frequency of 140 Hz is used. Both high-pass filtering and scaling are combined by dividing by the two coefficients in the numerator of this filter. The resulting filter is given by 0. 46363718- 0.92724705? -! + 0.46363718 * - * hM ~ l-1.9059465z-l + 0.9114024z-a (1) The feed signal filtered through Hhl ix) is referred to as s (n), and will be used in all subsequent coder operations. 3.2 Analysis and quantification of linear forecast The short-term analysis and the synthesis filters are based on the linear tenth order (LP) forecast filters. The LP synthesis filter is defined as where a¿ = 1, ..., 10, are the linear (quantified) forecast coefficients (LP). The short-term forecast, or linear prediction analysis is performed once per table of speak using the auto-correlation approach with a symmetric window of 30 ms. Every 80 samples (10 ms), the window correlation self-correlation coefficients are calculated and converted to the LP coefficients, using Levinson's algorithm. Then, the LP coefficients are transformed to the LSP domain for quantification and interpolation purposes. The quantized and unquantized interpolated filters are converted back to the LP filter coefficients (to build the synthesis and weighting filters in each subframe). 3.2.1 Window formation and auto-correlation calculations The LP analysis window consists of two parts: the first part is half of a Hamming window and the second part is a quarter of the cosine function cycle, the window is given by : There is a 5 ms search in the LP analysis, which means that 40 samples of the future speech picture are required. This is converted or translated into an extra delay of 5 ms in the encoder stage. The LP analysis window is applied to 120 samples of the past speech frames, 80 samples of the speech frame present and 40 samples of the frame future. The formation of windows in the LP analysis is illustrated in Figure 4.

VENTANASLP YZK ^^ r YOUR PICTURES Figure 4: formation of windows in LP analysis. The different shading patterns identify corresponding excitation and LP analysis frame. The self-correlation coefficients of speech formed in window s' (n) S, (n) * (n), n s 0,. . , 239, (4) are calculated by 239 »" (*) *? V (n) * '? -fc), * = 0 10, naa (5) To avoid arithmetic problems for low-level power signals, the value of r (0) has a lower limit of = 1.0 at 60 Hz. An expansion of bandwidth is applied, by multiplying the auto-correlation coefficients with - "- [-? My i * L .... 10. (6) where f0 = 60 Hz is the bandwidth expansion and fa = 8000 Hz is the sampling frequency. In addition, r (0) is multiplied by the white noise correction factor 1,0001, which is equivalent to adding a noise floor to -40 dB. 3.2.2 Levinson-Durbin Algorithm Modified auto-correlation coefficients r '(0) = l0001r (0) (7) are used to obtain the LP filter coefficients at f = 1, ..., 10, by solving the set of equations (8) The set of equations in (8) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion: i / f (í) < 0 U «. £ (¡) »001 The final solution will be a ^ = aD. { : L0), j = 1,. . . , 10. 3. 2.3 LP to LSP conversion The LP filter coefficients a1, = 1, ..., 10 are converted to line structural pair (LSP) representation for quantization and interpolation purposes. For an LP filter of the tenth order, the LSP coefficients are defined as the square roots of the polynomials of sum and difference (9) (10) respectively. The polynomial F iz) is symmetric and the polynomial F '2 (z) is antisymmetric. It can be shown that all the square roots of these polynomials are in the unit circle and alternate with each other. F (z) has a root z = -1 (w = t) and F '2 (z) has a root z = 1 (w = 0). To eliminate these two roots, we define the new polynomials F? (. ') = R,' (í) / (l + t-1), (11) and fj (í) = í "í (i) / (l -:" ') (121 Each polynomial has 5 conjugate roots in the unit circle (e ± j l), therefore the polynomials can be written as (13) FJ < * > * II O -? »« * "1 + * - '), (14) where qt = cos (Wi) with w¿ which are the line spectral frequencies (LSF) and satisfy the ordering property 0 < wx < w2 < .... < w10 < TG. We refer to q ± as the LSP coefficients in the cosine domain. Since both polynomials Ft iz) and F2 (z) are symmetric only in the first 5 coefficients of each polynomial required to be calculated. The coefficients of these polynomials are found by the recursive relations /, (t + l) -? +? +? O_, -? (»),? = 0 4.? (I + t) =«,? - ol0-, +? (i), = 0 4. (15) where f1 (0) = f2 (0) = 1.0. The LSP coefficients are found by evaluating the polynomials F1 (z) and F2 (z) at 60 points equally spaced between 0 and TG and verify change of sign. A sign change means the existence of a root and the interval of change of sign is then divided 4 times to improve the tracking of the root. The Chebyshev polynomials used to evaluate are used to evaluate FS (z) and F2 (z). In this method, the roots are directly in the cosine domain. { g? } . The polynomials F1 (z) and F2 (z) evaluated at z = ejw can be written as r (u) .2 .- "- C (-). (i«) with c (»> -nw + / (im (») + / (!) Tj (.) + / (3) r, ( »? + / (4) r, («) + / (») /». (I7i where Tm (x) = eos (mw) is the polynomial Chebyshev of jn-th order and f) ii), i = 1, ... 5, are the coefficients of either F1 (z) or F2 (z) calculated using the equations in (15). The polynomial C (x) is evaluated at a certain value of x = eos (w) using the recursive relationship: for k to 4 downto l kk = 2zb, ^ - * »+ j + / (5 - fc) end with initial values bB = 1 and b6 = 0. 3.2.4 Quantification of the LSP coefficients The LP filter coefficients are quantized using the LSP representation in the frequency domain: that is, = «eeo» (í (). '= 1 10, (18) where wt are the line spectral frequencies [0, p] in the normalized frequency domain LSF. A commanded fourth order MA forecast is used to forecast the current set of LSF coefficients. The difference between the set Computed and predicted coefficients are quantified using the two-stage vector quantifier. The first stage is a 10-dimensional VQ that uses the encrypted code Ll with 128 entries (7 bits). The second stage is a 10-bit VQ that has been implemented as a divided VQ using two 5-dimensional codebooks, L2 and L3 that contain 32 inputs (5 bits) each. To explain the quantification process, it is convenient to first describe the decoding process. Each coefficient is obtained from the sum of 2 code books: where Ll, L2 and L3 are the indexes of encrypted code. To avoid marked resonances in the quantized LP synthesis filters, the coefficients lx are arranged in such a way that adjacent coefficients have a minimum distance of J. The re-arrangement routine is illustrated below: / or i a 2 ... . 10 eni eni This re-fix process is executed twice. First with a value of J = 0.0001, then with a value of J = 0.000095. After this re-fix process, the quantized LSF coefficients w ^ "'for the current frame n are obtained from the weighted sum of outputs of previous quantization i lm-k) and the current quantizer output > - i -? , ') r, +? go-? > . < =? ... 10. l 20) 4 = 1 k = l where m are the coefficients of the switched AM forecaster. What predictor MA is used, is defined by a separate LO bit. At the beginning the initial values of l <; k > are given by lx = ip / 11 for all k < 0. After calculating wi f the corresponding filter is checked for stability. This is done as follows: 1. - Order the coefficient w ± in incremented value 2.- If w ± < 0.005 then Wi = 0.005 3.- If wi + 1-nti < 0.0001, then wi + 1 = wa + 0.0001 i = l, ... 9, 4.- If w10 > 3.135 then w10 = 3.135. The procedure for encoding the LSF parameters can be established as follows. For each of the two predictors MA, the best approximation to the current LSF vector has to be found. The best approximation is define as one that minimizes a weighted average square root error 10 f¿w =? t »¡(« < -? i) a. (21) such The wx weights are made adaptive as a function of the unquantified LSF coefficients, tüm - In addition, the weights ws and w6 are multiplied by 1.2 each. The vector to be quantified for the current table is obtained from ': = [? ím) -? m? íím - *,] / (l-? "'?). = 1 10. (23) such k = l The first encrypted code Ll is searched and the input Ll that minimizes the mean square root (unweighted) error is chosen. This is followed by a search of the second encrypted code L2 which defines the lower part of the second stage. For each possible candidate, the partial vector w i = 1, ... 5 is reconstructed using the equation Eq. (20), and rearrange to guarantee a minimum distance of 0.0001. The vector with the index L2 that after addition to the candidate of the first stage and rearrange, approaches the bottom of the best corresponding target in the weighted MSE sense, is chosen. Using the selected first stage vector Ll and the lower part of the second stage (L2), the upper part of the second stage looks for the encrypted code L3. Again, the reset procedure is used to guarantee a minimum distance of 0.0001. The vector L3 that minimizes the total weighted MSE is chosen. This process is performed for each of the two MA predictors defined by LO and the predicted MA LO that produces the lowest weighted MSE is chosen. 3.2.5 Interpolation of the LSP coefficients Quantized (and unquantified) LP coefficients are used for the second sub-frame. For the first sub-frame, the quantized (and unquantized) LP coefficients are obtained from linear interpolation of the corresponding parameters in the adjacent sub-frames. Interpolation is performed on the LSP coefficients in the q domain. Let q (m) ± the LSP coefficients in the second sub-frame of table m, and q ± '"" 11 the LSP coefficients in the second sub-table of the last table (m -1). Interpolated LSP coefficients (not quantified) in each of the two sub-tables are given by Its? / Ram? L:? L¡ = 0.5 «fJ ~ n 4- Q.5q¡m),? '= L, ..., 10, Subfr me 2:? 2 < =? im) i = 1, ..., 10. (24) The same interpolation procedure is used for the interpolation of the quantized LSP coefficients by subsisting g¿ by g¿ in Eq. (24) 3.2.6 LP to LSP conversion Once the LSP coefficients are quantized and interpolated, they are converted from return to the LP coefficients. { to} . The conversion to the LP domain is done as follows. The coefficients of Fx (z) • and F2 iz) are found by expanding equations (13) and (14) knowing the quantized and interpolated coefficients LSP. The following recursive relation is used to calculate fii), i = 1, ..., 5, from qx for I a 1 io o / l (¡) = -27?, - l /? (- l) +2/1 («- 2) for j = i - 1 iovmta 1 with initial values fí (i) = 1 and f2 (i). The coefficients f2 (i) are calculated similarly by replacing q ^^ by q2i. Once the coefficients fxü) and f2 (i) are found F1 (z) and F2 (z) are multiplied by 1 + Z "1 and 1 - z" 1 respectively to obtain F (z) and F'2 (z) ); this is /! (»&? =? (+? (? - D. -i ..., s.Finalme /, (•) = /.(i) -? (¿- 1), ¿= i ,. ., or. (25) nte, the LP coefficients are found by ßt_ í 0.5 / ¡(¿) + 0.d / S (?). «= 1, 5, This is' \} 0O..5S. { J ((ií - S8)) - 00..55 / í5 ((¿? - 55)) ,. 1. == 6ß ,. 10. (W) derived directly from the relation A (z) = (F'1 (z) + F'2iz) / 2 and because F (z) and F'2 (z) are symmetric and antisymmetric polynomials respectively . 3.3 Perceptual weighting The perceptual weighting filter is based on unquantified LP filter coefficients and is given by W (x) = "i * 7 '* ß ÍZ (27) T he values of y1 and? 2 determine the frequency response of the filter W (z) By adjusting these variables appropriately, it is possible to make the most effective weighting. This is achieved by making x and y 2 a function of the spectral shape of the power signal This adaptation is performed once per 10 ms frame, but an interpolation procedure for each first sub-frame is used to smooth this procedure of adaptation The spectral form is obtained from the second order linear forecast filter, obtained as a by-product of the recursion Levinson-Durbin (Section 3.2.2). The reflection coefficients k? / are converted to the ratio coefficients of area log (LAR) or by These LAR coefficients are used for second subframes. The LAR coefficients for the first sub-table are obtained through linear interpolation with the LAR parameters from the previous table and are given by: Subframe 1: ol. = Q.?o,"1 '^ ^ O.doj "". S |,, 2, Subframe 2: s2, = ? = l 2. (29) The spectral envelope is characterized either by being flat by (flat = 1) or inclined by (flat = 0). For each sub-frame, this characterization is obtained by applying a threshold function to the LAR coefficients. To avoid rapid changes, a hysteresis is used when taking into account the value of flat in the previous sub-frame (m - 1), 0 if 0? < -1.74 aad oj > 0.66 and // a-í "» - 1 »= l, / 'ß < () ß { 1 if oí > -1 52 and o, < 0.43 and fotí" »- 1» = 0, (30) ß4 (™ -0 otherw? E.

If the interpolated spectrum for a sub-table is classified as flat (fíat1"'= 1), the weighting factors are adjusted to y1 = 0.94 and y2 = 0.6 If the spectrum is classified as inclined (flatí = 0), the value y ± is adjusted to 0.98, and the value of y2 adapts to the strength of the resonances in the synthesis filter LP, but is limited between 0.4 and 0.7. If a strong resonance is present, the value of y2 is adjusted closer to the upper limit. This adaptation is achieved by a criterion based on the minimum distance between 2 successive LSP coefficients for the current sub-frame. The minimum distance is given by dm? n = m? 'n (w < + l -? <] i = l, ..., 9. (31) The following linear relationship is used to calculate y2 TJ = -ß.O • 4 *, 4- 1.0, and 0.4 < 72 < 0.7 (32) The weighted speech signal in a sub-frame is given by ßw (n) aa t () + i'r7, tx ?? (n - i), p = 0, ..., 3 ». (33) The weighted speech signal sw (n) is used to find an estimate of the pitch delay in the speech frame. 3.4 Open-loop tone analysis To reduce the complexity of the search for the best delay of the adaptive codebook, the search range is limited around a candidate delay T ^, obtained at from an open-loop tone analysis. This open-loop tone analysis is performed once per frame (10 ms). The open loop tone estimate uses the weighted speech signal sw (n) of Eq. (33) and is performed as follows: In the first stage, 3 maxima of the correlation 79 RW? swWsiLin - k) (3) they are found in the following three ranges i = 1: 80, ..., 143, i = 2: 40, ..., 79, i = 3: 20, ..., 39. The maximums retained are normalized through R (tx), i = 1, ..., 3, of The winner among the three normalized correlations is chosen by favoring the delays with the values in the lower range. This is done by weighting the normalized correlations that correspond to the longest delays. The best open loop delay Top is determined as follows: KiT ^) = /? (.,) ./ «< ((,) > 0.85? '(T.,)? - (T.,) = # (t7) t "= ÍI? /?' (eí) > 0.ßß? '(Tß,) tüíT * ) = ü * (t,) r ^ = IS This procedure of dividing the range of delays into 3 sections and favoring the lower sections, is used to avoid choosing multiple tones. 3.5 Calculation of the impulse response The impulse response h (n) of the weighted synthesis filter W (z) / Á (z) is calculated for each sub-frame. This impulse response is required for the search of adaptive and fixed codebooks. The impulse response h (n) is calculated by filtering the vector of filter coefficients A (z / y1) extended with zeros through the two filters 1 / Á (z) and l / A (z / y2). 3.6 Calculation of the target signal The target signal x (n) for the search for adaptive cipher code is usually calculated by subtracting the zero feed response of the weighted synthesis filter N (z) / Á (z) = A (z / y ±) / [Á (z) A (z / y2)] from the weighted speech signal sw (n) of Eq. (33). This is done on a sub-frame basis.

An equivalent procedure for calculating the target signal, which is used in this recommendation, is the filter of the signal LP r in) through the combination of synthesis filter 1 / A iz) and the weighting filter A (z / y1 ) / A (z / y2). After determining the excitation of the sub-frame for the sub-frame, the initial states of these filters are updated by filtering the differences between the residual LP and excitation. The memory update of these filters is explained in Section 3.10. The residual signal r in) that is required to find target vector is also used in the search for adaptive cipher code, to extend the past excitation buffer. This simplifies the adaptive cipher search procedure for delays less than the size of sub-frame 40 as will be explained in the next section. The residual LP is given by 10 r (n) = s (n) +?, «(N - i), n = 0 39. (36) 3. 7 Search for adaptive encryption code The adaptive encryption code parameters (or tone parameters) are the delay and the gain. In the adaptive cipher code approach for implementing the tone filter, the excitation is repeated by delays less than the sub-frame length. In the search stage, the excitation is extended by the residual LP to simplify the closed-loop search. The search for adaptive encrypted code is made every sub-frame (5 ms). In the first subframe, a fractional tone delay T ± is used with a resolution of 1/3 in the range [19 1/3, 84 2/3] and integers only in the range of [85, 143]. For the second sub-frame, a delay T2 with a resolution of 1/3 is always used in the range of [iint) T1 - 5 2/3, iint) Tx + 4 2/3], where iint) T? is the integer closest to the fractional tone delay Tz of the first sub-frame. This range is adapted for cases where T ± is straddling the limits of the delay range. For each sub-frame, the optimal delay is determined using closed loop analysis that minimizes the weighted average square root error. In the first sub-frame, the delay Tj, is found to look for a small range (6 samples) of delay values around the open loop delay T ^ (see Section 3.4). The search limits ta? A and t max are defined by t = T - 3 tm "= 143 ttnin = im t ~ O eni For the second sub-frame, the closed-loop tone analysis is performed around the selected tone in the first sub-frame to find the optimal delay T2. The search limits are between ta? A - 2/3 and tmmx + 2/3, where t n and are derived from T as follows: tmm = (mt) T? - 5? / Ím, «< 20 then tm? N = 20 end The closed loop tone search minimizes the mean square root mean error between the original and synthesized speech. This is achieved by maximizing the term. where z in) is the objective signal e yk (n) is the filtered excitation passed to the delay k (past excitation convolved with h in)). It should be noted that the search range is limited around a pre-selected value, which is the open-loop tone T ^ for the first sub-frame, and T for the second sub-frame. The convolution yk (n) is calculated for the delay tm? P and for the other integer delays in the search range k = ta? N + 1, ... / tauuf * is updated using the recursive relationship yk (n) - |? -? (n - 1) + u (-k) h (n), n x 39 ..... 0, (38) where u (n), n = -143, ..., 39, is the excitation buffer e yk.x (-1) = 0. It should be noted that in the search stage, the samples u (n ), n = 0, ..., 39 are not known and are fed for pitch delays less than 40. To simplify the search, the residual LP is copied au (n) to make the relation in Eq. (38) Valid for all delays. For the determination of T2 and Tx if the optimal whole closed loop delay is less than 84, the fractions around the optimal integer delay have to be tested. The fractional tone search is done by interpolating the normalized correlation in equation (37) and looking for its maximum. The interpolation is performed using a FIR filter b12 based on a sync function formed in Hamming windows with truncated sync at ± 11 and filled with zeros at ± 12 (b12 (12) = 0). The filter has its cutoff frequency (-3dB) at 3600 Hz in the sampled domain. The interpolated values of R (k) - for fractions -2/3, -1/3, 0, 1/3, and 2/3 are obtained using the interpolation formula 3 3 R (k) t *? R (k - «) 6,? (Í + i 3) +? R (k + l +?) *, J (3 - t +, L3). í = 0, l.2. (39)? = 0, = 0 where t = 0,1,2, corresponds to fractions 0, 1/3, and 2/3, respectively. It should be noted that it is necessary to calculate the Correlation terms in equation (37), using a range tm? n - 4, tmax + 4, to allow adequate interpolation. 3.7.1 Generation of adaptive cipher code vector Once the non-integer tone delay has been determined, the adaptive cipher code vector v (n) is calculated by interpolating the excitation signal passed u (n) to a given integral delay k and the tt fraction v (n) (40) The interpolation filter b30 based on sync functions in Hamming windows with truncated sync at ± 29 and filled with zeros at ± 30 (£> -, <, (30) = 0). The filters have a cutoff frequency of (-3dB) at 3600 Hz in the sampled domain. 3.7.2 Cipher codeword calculation for adaptive cipher code delays The tone delay Tx is coded with 8 bits in the first sub-frame and the relative delay in the second sub-frame is encoded with 5 bits. A fractional delay, T is represented by its integer part (int) T, and a fractional fraction frac / 3, frac = -1,0,1. The pitch index Pl is now encoded as ! ((int) T? - 19) • 3 + / roe - 1, if Tt = [19 85], frac s [-1, 0, 1] Pl (41) ((int) T? - 85) + 197 , if 7? = [8ß, .... 143], frac = 0 the tone delay value T2 is coded with respect to the value of T. Using the same interpretation as before, the fractional delay T2 represented by its entire part (int) T2 and a fractional part frac / 3, frac = -1,0,1 is coded as P2 = ((int) Tj - tmtn) * 3 + fr e + 2 (42) where tm? n is derived from Tx as before. To make the encoder more robust against random bit errors, a parity bit PO is calculated in the delay index of the first sub-frame. The parity bit is generated through an XOR operation in the 6 most significant bits of Pl. In the decoder, this parity bit is recalculated and if the recalculated value does not match the transmitted value, an error concealment procedure is applied. 3.7.3 Calculation of adaptive encrypted code gain Once the adaptive encrypted code delay is determined, the adaptive encrypted code gain gp is calculated as bounded X 0 < i < 1.2. (43) where and in) is the filtered adaptive cipher code vector (zero state response from W (z) / Á (z) to v (n)). This vector is obtained by convolving v (n) with h in) n VÍ ") =?? (i) h (n - i) p 30 ..., 39. (44)? s? It should be noted that by maximizing the term in Eq. (37) in most cases gp > 0. In the case that the signal only contains negative correlations, the value of gp is set to zero. 3.8 Fixed encryption code: structure and search The fixed encryption code is based on an algebraic code structure using a simple interspersed pulse permutation (ISSP) design. This encrypted code, each encrypted code vector contains 4 non-zero pulses. Each pulse can have either of the amplitudes +1 or - 1, and can acquire the positions given in Table 7. The ciphered code vector c (n) is constructed by taking a zero vector, and putting all four pulses units in the locations found, multiplied with their corresponding sign. c (n) -s «0í (fi -i0) - # lí (n-il) -rj2í (n-? 2) + j3á (? t-i3) 1 p = 0, ..., 39. (Four. Five) where 6 (0) is a unit pulse. A special feature incorporated in the encrypted code is that the vector of Select encrypted code is filtered through an adaptive pre-filter P iz) that improves harmonic components to improve the quality of synthesized speech. Here the filter / »(_-) = 1 / (1 -J * -t) (46) Table 7: Structure of C fixed encrypted code. Pulse Siano Positions iO sO 0. 5, 10, 15, 2Q, 25, 30. 35 il si i. 6, 11. 16, 21, 26, 31. 36 i2 s2 2. 7, 12, 17, 22, 27, 32. 37 i3 S3 3, 8, 13, 18, 23, 28, 33, 38 4. 9 , 14, 19, 24, 29, 34. 39 is used, where T is the integer component of the tone delay of the current sub-frame and ß is a tone gain. The value of ß is made adaptable by using the quantized adaptive codebook gain quantified from the previous sub-frame that is limited by 0.2 and 0.8. 3 = g, m-l), 0.2 < 3 < Q.S. (47) This filter improves the harmonic structure for delays less than the sub-frame size of 40. This modification is incorporated in the search for fixed cipher code by modifying the impulse response h (n), in accordance with ? (r ») =? (n) +0? (n-r), n = r, .., 39. (48) 3. 8.1 Fixed code search procedure The fixed encrypted code is searched by minimizing the mean square root error between the weighted feed speech sw (n) of Equation (33), and weighted reconstructed speech. The target signal used in the closed loop tone search is updated by subtracting the adaptive codebook contribution. This is * (N) m t (n) ~ gty (n), n = 0, ..., 39, (49) where and in) is the filtered adaptive cipher code vector of Equation (44). The matrix H is defined as the convolution matrix Bottom triangular toepliz with diagonal h (0) and lower diagonals h (l),. . . , h (39). If diagonal ck is the algebraic code vector to the index k then the encrypted code is searched by maximizing the term and where d in) is the correlation between the target signal x2 (n) and the impulse response h (n) and * - H ^ is the correlation matrix h (n). The signal d in) and the matrix * are calculated before the search for an encrypted code. The elements of d (n) are calculated from 39 (O =? * («')" (»' -B).? = 0 39, (51) and the elements of the symmetric matrix * are calculated by 39 • ('•. »') =? Z? (n" ')? (n -), (> i). (52) na; It should be noted that only the elements currently required are calculated and an effective storage procedure is designed to speed up the search procedure. The algebraic structure of the encrypted code C allows a quick search procedure since the encrypted code vector ck contains only four non-zero elements. The correlation in the numerator of equation (50) for a given ck vector is given by 3 C =? aid (mi), (53)? s0 where mi (is the position of the i-th pulse and a is its amplitude.) The energy in the denominator of equation (50) is given by To simplify the search procedure, the pulse amplitudes are determined by quantifying the signal d (n). This is done by readjusting the amplitude of a pulse in a certain position equal to the sign of d (n) in that position. Before the search of encrypted code, the following stages are carried out. First, the signal d (r?) Is broken down into two signals: the absolute signal d '(n) = | d (p) | and the sign of sign [d (n)]. Secondly, the matrix F is modified by including the sign information, that is, * '(•'. I) - «ßnWO]« gn »! (i.), «= 0, .... 39, j * i 39. (55) To remove factor 2 in equation (54) The correlation in equation (53) is now given by C = d, (m0) + d '(ml) -r (mj) + d' (mj), (57) and the energy in equation (54) is given by + é (tn, m \) + F m, m \) + f '("> i," il) + f' (m < 3, mi) + o '(m ?, mj) + 9' (mj, m3) - ¥ F '(tn < i, m) + 0' (m?, m3) + f '(m?, m3). (58) A concentrated search approach is used to further simplify the search procedure. In this approach, a pre-calculated threshold is tested before entering the last loop, and the access loop only if this threshold is exceeded. The maximum number of times the loop can access is fixed, so that a low percentage of the encrypted code is searched. The threshold calculates based on the correlation C. The maximum absolute correlation and the average correlation due to the contribution of the first three pulses max3 and av3 are found before the search for an encrypted code. The threshold is given by t? Ra s avs + A'a mazs - aua). (59) The fourth loop is accessed only if the absolute correlation (due to three pulses) exceeds thr3, where 0 = JG < 1. The K3 value controls the percentage of the encrypted code search and adjusts here to 0.4. Note that this results in a variable search time, and to further control the search the number of times the last loop is accessed (for the two sub-frames) can not exceed a certain maximum, which is set here to 180 (worst possible case average per sub-frame 90 times). 3.8.2 Calculation of the code word of the fixed encrypted code The pulse positions of the pulses iO, il and i2, are coded with 3 bits each, while the position of i3 it is encoded with 4 bits. Each pulse amplitude is encoded with a bit. This gives a total of 17 bits for the 4 pulses. Defining s = 1 if the sign is positive and s = 0 if the sign is negative, the sign code word is obtained from S »$ 0 + 2 ** 1 + 4 ** 2 + 8 ** 3 (60 ) and the fixed codebook code word is obtained from C = (t0 / 5) + 8 * (1/5) + 64 * (¿2/5) + 512 * (2 * (¿3/5 ) + ha) (61) where jx = 0 if i3 = 3.8, .., and jx = 1 if i3 = 4.9, ... 3.9 Quantizing the gains The adaptive codebook gain (tone gain) and the code gain Fixed (algebraic) ciphers are vector quantized using 7 bits. The gain code search is done by minimizing the mean squared root mean error and the reconstructed speech that is given by E = x'x + o-y'y -r?, And «- 2gßx'y - 2gcx'z + 2opíey '?, 62) where x is the target vector (see Section 3.6) and is the filtered adaptive cipher code vector of equation (44) and z is the fixed cipher code vector convolved with h (n) : (n) rs? c (i) h (n -?) n = 0,, 39. (63)? = 0 3. 9.1 Gain Forecast The fixed code gain g ° can be expressed as 9c = T? 'E. (64) where g 'c is a predicted gain based on previous fixed and encrypted code energies? It is a correction factor. The average energy of the fixed codebook contribution is given by After scaling the vector c with the fixed cipher code analysis gc, the fixed cipher code energy scaled is given by 20 log gc + E. Let E < m > the average withdrawal energy (in dB) of the fixed-code contribution (adjusted in scale) in sub-table m, given by where E = 30dB is the average excitation energy of the fixed coded code. Ga gain can be expressed as a function of E (a), Ex and E by gt = Ují * - »** - *) /» (67) The predicted gain g 'c is found by predicting the energy log of the contribution of the current fixed coded code from the energy log of the previous fixed codebook contributions. The 4th order MA prediction or prediction is made as follows. The forecast energy is given by 4 £ < • "&rt; = r &1 # p, -, \ (68). =? where [b b2 b3 b4] = [0.68 0.58 0.34 0.19] are the forecast coefficients MA and R (a > is the quantized version of the forecast error R < m > in subfield m, defined by The predicted gain g 'c is found when replacing E < a > for its predicted value in equation (67) The correction factor and is related to the profit forecast error by / * • »> = £ "» > - £ * "• > = 20 log (). (71) 3. 9.2 Search for encrypted code for gain quantization The gain of adaptive encrypted code gp and factor 7 are vector quantized using a structured encrypted code, conjugated in two stages. The first stage consists of a two-dimensional coded three-bit code QA, and the second stage consists of a 4-bit dimensional coded code QB. The first element in each encrypted code represents the quantized adaptive codebook gain §p, and the second element represents the correction factor for quantized fixed codebook gain 7. Given the code indexes m and n for QA and QB respectively, the quantified adaptive codebook gain quantified is given by 9P = QA (m) + QB (n), (72) and the gain of fixed encrypted code quantified by i. = «7 =» g't (QA? (M) + ffff, (n)). (73) This conjugated structure simplifies the search for encrypted code by applying a pre-selection process. The optimal tone gain gp and the fixed codebook gain gc are derived from equation (62) and used for preselection. The encrypted code QA contains 8 entries where the second element (corresponding to gc) has in general values larger than the first element (corresponding to gp). This derivation allows a pre-selection using the value of gc. In this pre-selection process, a swarm of 4 vectors whose second element is close to gxc where gxc is derived from gc and gp. Similarly, the encrypted code QB contains 16 entries where they have a derivation to the first element (corresponding to gp). A swarm of 8 vectors whose first elements are close to gp, is chosen. Therefore, for each cipher code, the best 50% candidate vectors are chosen. This is followed by exhaustive search on the remaining 4 * 8 = 32 possibilities, such that the combination of the two indices minimizes the weighted average square root error of Equation (62). 3.9.3 Cipher code calculation for gain quantizer The QA and QB code words for the gain quantizer are obtained from the indices corresponding to the best selection. To reduce the impact of simple bit errors, the encrypted code indexes are mapped. 3.10 Memory update An update of the states of the synthesis and weighting filters is required to calculate the target signal in the next sub-frame. After the two gains are quantified, the excitation signal u in) in the present sub-table is found by u (fi) = i, t (n) + j, c (n), ns0, ..., 39, (74) where gp and gc are quantized fixed and adaptive codebook gains, respectively v in) the adaptive ciphered code vector excitation passed interpolar) and c (n) is the fixed encrypted code vector (algebraic code vector including pitch tuning) ) the states of the filters can be updated by filtering the signal r in) - u (n) (difference between residual and excitation) through the filters 1 / Á iz) and A (z / and x) / A (z / y2 ) for the sub-frame of 40 samples and save the states of the filters. This will require 3 filter operations. A simpler approach, which only requires a filter is as follows. The local synthesis speech, § in) is calculated by filtering the excitation signal through 1 / Á (z). The filter output due to the feed r (n) -u (n) is equivalent to e in) = s (n). In this way the synthesis filter states 1 / Á iz) are given by e in), n = 30, ..., 39. the updating of the filter states A (z / y) / A (z / y2 ) can be done by filtering the error signal and in) through this filter to find the perceptually weighted error ew (n). However, the signal ew (n) can be found equivalently by ew (n) to x (n) - gpy. { n) + g «z (n). (75) Since the signals x (n), y in), and z (n) are available, the weight filter states are updated by calculating ew (n) as equation (75) for n = 30, ..., 39 . This saves two filter operations. 3.11 Initialization of Encoder and Decoder All static encoder variables should be initialized to zero, except for the variables listed in the Table 8. These variables require initializing for the decoder equally. Table 8: Description of parameters without non-zero initialization Variable Reference Initial Value ß Section 3. 8 0. 8 ll Section 3 .2 .4 ip / 11 (Ji Section 3 .2 .4 .9595,., R < k > Section 3 .9. 1 - 14 4. Functional description of the decoder The signal flow in the decoder is illustrated in Section 2 (Figure 3) .First the parameters are decoded (LP coefficients, adaptive cipher code vector, fixed cipher code vector and gains) .These decoded parameters are used to calculate the reconstructed speech signal This process is described in Section 4.1 This reconstructed signal is enhanced by a post-processing operation consisting of a post-filter and a high-pitched filter (Section 4.2). error concealment procedure used when either a parity error has occurred or when the frame erase flag has been placed. 4.1 Parameter decoding procedure The parameters transmitted are listed in the Table 9. Table 9: Description of parameter indices transmitted. The ordering of the bitstream is reflected by the order in the Table. For each parameter, the most significant bit (BMS) is transmitted first. Symbol Description Bits LO index forecaster switched quantifier LSP 1 Ll First stage vector of the LSP quantifier 7 L2 Lower vector of second stage 5 of quantifier LSP L3 Vector upper of second stage of quantifier LSP 5_ Table 9: Description of parameter indices transmitted, (Continued) Symbol Description Bits Pl First sub-frame of tone delay 8 PO Parity bit for tone 1 51 First sub-frame pulse signs 4 Cl First sub-frame fixed encryption code 13 GAl First sub-frame gain encryption code (stage 1) 3 GB1 First sub-frame gain encryption code (stage 2) 4 P2 Second sub-frame tone delay 5 52 Second sub-frame pulse signs 4 C2 Second sub-frame fixed encryption code 13 GA2 Second sub-frame gain encryption code (stage 1) 3 GB2 Second sub-frame code gain code (stage 2) 4 At the beginning all the encoder variables must be initialized to zero, except for the variables listed in Table 8. The decoding procedure is performed in the following order: 4.1.1 Decoding of LP filter parameters The received LO indexes, Ll, L2 and L3 of the LSP quantizer are used to reconstruct the quantized LSP coefficients using the procedure as described in Section 3.2.4. The interpolation procedure as described in Section 3.2.5 is used to obtain 2 interpolated vectors LSP '(corresponding to two sub-frames). For each sub-frame, the interpolated LSP vector is converted into the filter coefficients LP a, which are used to synthesize the reconstructed speech in the sub-frame. The following stages are repeated for each sub-frame: 1. Decoding of adaptive encrypted code vector. 2. - Decoding of fixed encrypted code vector. 3.- Decoding of the gains of adaptive and fixed code books. 4.- Calculation of reconstructed speech. * 4.1.2 Adaptive encrypted code vector decoding The received adaptive encrypted code index is used to find the integer and fractional parts of the code. step delay. The integer part (int) Tx and the frac fractional part of Tx are obtained from Pl as follows: (? p.) 7? = (Pl + 2) / 3 + 19 frac = Pl - (<) Ti * 3 + 58 etse (? Pt) Ti = Pl - 112 frac s 0 end The integer and fractional part of T2 are obtained from P2 and tmxn, where tmxp is derived from Pl as follows: tmin s (? N *) 2 - 5 • f tmin < 20 tken tmÍ? = 20 t ax - 'm < n + tmat - 143 'm «n =' ma *" ~ • » Now T2 is obtained from / rac = P2 -2 - ((P2 + 2) / 3 -1) '3 The adaptive ciphered code vector v in) is found by interpolating the past excitation u (in) (in the step delay) using Equation (40). 4.1.3 Fixed Code Encryption Vector Decoding The fixed encrypted code C index received is used to extract the excitation pulse positions. The pulse signs are obtained from S. Once the pulse positions and signs are decoded, it can be constructed the fixed encrypted code vector c (n). If the entire part of the step delay, T, is less than the size of sub-frame 40, the step improvement procedure is applied, which modifies c in) according to equation (48). 4.1.4 Decoding the gains of fixed and adaptive encryption code The received gain code encryption code gives the adaptive codebook gain §py and the fixed encryption code gain correction factor 7. This procedure is described in detail in the Section 3.9. The estimated fixed codebook gain § 'p is found using equation (70). The fixed cipher code vector is obtained from the product of the gain correction factor quantized with this predicted gain (equation (64)). The adaptive codebook gain is reconstructed using equation (72). 4.1.5 Parity bit calculation Before the speech is reconstructed, the parity bit is recalculated from the adaptive cipher code delay (Section 3.7.2). If this bit is not identical to the transmitted parity bit PO, it is likely that bit errors will occur during transmission and the error concealment procedure in Section 4.3 will be used. 4. 1.6 Calculation of reconstructed speech The excitation u in) in the synthesis filter feed (see equation (74)) is fed to the synthesis filter LP. The speech reconstructed by the sub-table is given by? O ¿(n) = u (n) - < M (n-?), N = 0 39. (76) txl where x are the interpolated LP filter coefficients. The reconstructed speech § in) is then processed by a post-processor that is described in the next section. 4.2 Post-processing Post-processing consists of three functions: adaptive post-filtering, high-pass filter, and up-scaling signal adjustment. The adaptive post filter is the cascade of three filters: a step post-filter Hp (z), a short-term postfilter Hf (z) and a tilt compensation filter Hc (z), followed by a control procedure. adaptive These filters are updated each sub-frame of 5 ms. The post-filtering process is organized as follows. First, the synthesized speech in) is filtered inverse through A / z / a) to produce the residual signal r (n). The signal r (n) is used to calculate the step delay T and gain gp? T - The signal r (n) is filtered through the post-filter step Hp (z) to produce the signal r 'in) that in turn it is separated by the synthesis filter 1 / [g ^ iz / y, finally the signal at the output of the synthesis filter 1 / [g i iz / yj) is passed to the tilt compensation filter Ht (z) resulting in the post-filtered synthesis speech signal sf in). The adaptive gain control is then applied between sf (n) and § in) resulting in the postfiltered signal sf '(n). The operation of low pass filtering and scale adjustment acts on the postfiltered signal sf in). 4.2.1 Post-filter of step The post filter of step or harmonic is given by i T go where T is the step delay and g0 is a gain factor given by Jo = lft? i (78) where gpxt is the step delay. Both the step delay and the gain are determined from the output signal of the decoder. It should be noted that gpit is limited by 1, and is adjusted to zeroes and the step forecast gain is less than 3 dB the factor yp controls the amount of harmonic post-filtering and has the value yp = 0.5. The step delay and gain are calculated from the residual signal r (n) obtained by filtering the speech a in) through Ála / ya), which is the numerator of the short-term post-filter (see Section 4.2). .2) r (n) = i (n) +? yniti (n - i). (79)? * L The step delay is calculated using a two step procedure the first step chooses the first integer T0 in the range [Tx - 1, TX + 1] where Tx is the integer part of the step delay (transmitted) in the first subframe . The best integer delay is the one that maximizes the correlation 39 R (k) =? rin) r (n - k). (80) > nsO The second step chooses the best fractional delay T with resolution 1/8 around T0. This is done by finding the delay with the highest normalized correlation where rk (n) is the residual signal to the delay k. Once the optimum delay T is found, the corresponding correlation value is compared against a threshold. If R '(T) < 0.5 then the harmonic post-filter is deactivated by setting gpxt = 0. Otherwise the value of gpit is calculated from: ? n * or ") r» (n) bounded by 0 <gpir <l.O (82) The non-integer delayed signal rk (n), is first calculated using a interpolation filter of length 33. After the selection of T, it is recalculated rk in) with a longer interpolation filter of length 129. The new signal replaces the previous one, only if the longer filter increases the value of R '(T) 4.2.2 Post-filter a short term The short-term post-filter is given by where Á (z) is the quantized LP reverse filter received (LP analysis does not perform in the decoder), and factors n and 7d control the amount of short-term post-filtering and adjust to yn = 0.55 and yd = 0.7. The gain term gf is calculated in the truncated impulse response hf (n) of the filter Á (z / y / A (z / yd) and is given by (84) 4. 2.3 Tilt Compensation Finally, the filter Ht (z) compensates the tilt in the short term post filter Hf (z) and is given by H < (z) = (l + 7tkl: -1). (85) where ytkx is a tilt factor, k which is the first reflection coefficient calculated in hf (n) with r (1) l9 ~ '*? = ~ 7¡ Q) 'G * («> = S W- / U + - (8ß) The gain term gt = 1 - |? T when calculating the decreasing effect of grf on Hf (z). In addition, it has been shown that the product filter Hf (z) Ht (z) generally has no gain. Two values of yt are used depending on the sign of k? l if kx is negative, yt = 0.9, and if kx is positive and t = 0.2. 4. 2.4 Adaptive gain control Adaptive gain control is used to compensate for gain differences between the reconstructed speech signal § in) and the postfiltered signal sf (n). The gain factor adjustment factor G for the present sub-table is calculated by STww? (8) The processed signal adjusted in scale sf (n) is given by * / '(*) = »(O« / (O, n = 0, ..., 39, (88) where g in) is updated on a sample-by-sample basis and given by g (n) = 0.85 (n- l) + 0.15 G, n = 0 39- (89) The initial value of g (-l) = 1.0 4.2.5 High pass filter and adjustment in ascending scale A high pass filter at the cutoff frequency of 100 Hz is applied to reconstructed and post-filtered speech sf (n). The filter is given by 0. 93980581 - 1.8795834-1 + 0.93980581 * - * «Ml«) «i-i.9330735? -l + 0.93589199í- * (' The scaling adjustment consists of multiplying the high pass filter output by a factor of 2 to recover the power signal level. 4.3 Hiding from frame erasure and parity errors An error concealment process has been incorporated into the decoder, to reduce the degradations in reconstructed speech due to frame erasures or random errors in the bit stream. This error concealment process is functional when either i) the encoder parameter table (corresponding to a 10 ms frame) has been identified as cleared, or ii) a checksum error occurs in the parity bit for the step delay index Pl. The latter can occur when the bit stream has been corrupted by random bit errors.

If a parity error occurs in Pl, the delay value Tx is adjusted to the delay value of the previous frame. The value of T2 is derived with the procedure outlined in Section 4.1.2, using this new value of Tx. If consecutive parity errors occur, the previous value of T, incremented by 1, is used. The mechanism for detecting table erasures is not defined in the recommendation, and depending on the application, the concealment strategy has to reconstruct the current table based on the information previously received. The method used replaces the missing excitation signal with a similar characteristic, while gradually deteriorating its power. This is done by using a voice sorter based on the long-term forecast gain, which is calculated as part of the long-term post-filter analysis. The tone post-filter (see Section 4.2.1) is the long-term predictor for which the forecast gain is greater than 3 dB. This is done by adjusting a threshold of 0.5 in the normalized correlation R '(K) (equation (81)). For the error concealment process, these tables will be classified as periodic. Otherwise the table is declared non-periodic. An erased box inherits its class from the preceding (reconstructed) speech box. It should be noted that voice classification is continually updated based on this signal from It has been rebuilt. Therefore, for many consecutive deleted frames, the classification may change. Typically, this only happens if the original classification was periodic. The specific steps that are carried out for an erased table are: 1. Repetition of the LP filter parameters, 2. Attenuation of fixed and adaptive codebook gains, 3. Attenuation of the gain predictor memory, 4. Generation of replacement excitation. 4.3.1 Repeat LP filter parameters The LP parameters of the last good frame are used.

The forecaster states LSF contains two values of the received codewords lx. Since the current code word is not available, it is calculated from the repeated LSF parameters wx and the forecaster memory from i - [? r, -? '"i» l-á (l -?' n?). • =! .- -.10. (91) such »sl 4. 3.2 Fixed and adaptive encryption code gain attenuation An attenuated version of the previous fixed encryption code gain is used or £ m) = 098 ^ m-1) (92) The same is done for the adaptive codebook gain. In addition, a trimming operation is used to maintain its value below 0.9 o) to 0.9 «j'n-l > and «jm > < 0.9. (93) 4. 3.3 Gain Predictor Memory Attenuation The Gain Forecaster uses the energy of the previously selected code. To allow a smooth continuation of the encoder once good frames are received, the gain predictor memory is updated with an attenuated version of the encrypted code energy. The value of R (m> for the current sub-frame n is set to the average quantized gain forecast error attenuated by 4 dB. 4 #) = (0.25? # "- •>) - 4.0 and fcm) >-14. (94)? Al 4. 3.4 Generation of replacement excitation The excitation used depends on the periodicity classification. If the last table correctly received is classified as periodic, the current table is considered periodic alike. In that case only the adaptive encrypted code is employed, and the contribution of fixed encrypted code adjusts to zero The step delay is based on the last received step delay correctly, and is repeated for each successive frame, to avoid excessive periodicity, the delay is increased by one for each sub-frame but is limited by 143. The code gain Adaptive encryption is based on an attenuated value according to equation (93). If the last table correctly received is classified as non-periodic, the current table is considered equally non-periodic, and the adaptive codebook contribution is set to zero. The contribution of fixed encrypted code is generated by randomly choosing an index of encrypted code and sign index. The random generator is based on the seed s seed function • 31821 + 13849, (95) With the initial seed value of 21845. The randomized codebook index is derived from the thirteen least significant bits of the next random number. The random sign is derived from the four least significant bits of the next random number. The fixed codebook gain is attenuated according to Equation (92). 5. Exact bit description of the CS-ACELP encoder The ANSÍ C code simulating the CS-ACELP encoder at the 16-bit fixed point is available from ITU-T. The The following sections summarize the use of this simulation code and how the software is organized. 5.1 use of simulation software The C code consists of 2 main programs coder.c, which simulates the encoder and decoder.c that simulates the decoder. The encoder is run as follows: coder inputfile bstreamfile The inputfile and outputfile are sampled data files containing 16-bit PCM signals. The current-bit file contains 81 16-bit words, where the first word can be used to indicate frame erasure, and the remaining 80 words contain 1 bit each. The decoder takes this bit stream file and produces a post-filtered output file containing a 16-bit PCM signal. bstreamfile outputfile decoder 5.2 Organization of simulation software In the fixed point ANSÍ C simulation, only two types of fixed point data are used as illustrated in Table 10.

Table 10: Data types used in simulation ANSÍ C Type Value Value Minimum maximum description Wordld 0x7fff 0x8000 16-bit word with sign complement 2 Word32 0x7ffffffL 0x80000000L 32-bit word with sign complement 2 To facilitate the implementation of the simulated code, indexes of loop, boolean values and flags, type flag is used, which would be either 16 bits or 32 bits depending on the platform or target. All calculations are used using a pre-determined set of basic operators. The description of these operators is shown in Table 11. The tables used by the simulation coder are summarized in Table 12. These main programs use a library of routines that are summarized in Tables 13, 14 and 15. Table 11: Basic operations used in simulation ANSÍ C Operation Description ordl6 sature (Word32 L_varl) Limit to 16 bits ordld add (Wordl6 vari, Wordl6 var2) Addition short ordl6 sub (Wordl6 vari, Wordl6 var2) Subtraction short ordld abs_s (Word 16 vari) abs short Wordl6 shl (Wordl6 vari, Wordl6 var2) Short left shift Table 11: Basic operations used in simulation ANSÍ C Ogera? Ión Description Wordl6 shr (ordl6 vari, ordl6 var2) Short right scroll Wordl6 mult (ordl6 vari, ordl6 var2) Short multiplication Word32 L_mult (Wordl6 vari, ordl6 var2) Long multiplication Wordl6 negate (Wordl6 vari) Short negation Wordl6 exeract_h (ord32 L_varl) Extraction high Wordld extract_l (Word32 L_varl) Extraction low Wordl6 round (Word32 L_varl) Round ord32 L_mac (Word32 L_var3, Wordld vari, Wordl6 var2) Mac Word32 L_msu (ord32 L_var3, ordl6 vari, Wordld var2) Msu Word32 L_macNs (Word32 L_var3, Wordl6 vari, Wordl6 var2) Mac without sat Word32 L_msuNs (Word32 L_var3, ordl6 vari, Wordl6 var2) Msu without sat ord32 L_add (Word32 L_varl, Word32 L_var2) Add Word32 L_sub ( Word32 L_varl, Word32 L_var2) Long ord32 subtraction L_add_c (Word32 L_varl, ord32 L_var2) Long addition with c ord32 L sub c (Word32 L vari, Subtraction Table 11: Basic operations used in simulation ANSÍ C Operation Description Word32 L_var2) Long with c Word32 L_negate (Word32 L_varl) Long negation Wordl6 mult_r (Wordl6 vari, ordl6 var2) Multiplication with ord32 rounding L_shl (ord32 L_varl, ordl6 var2) Long left offset Word32 L_shr (ord32 L_varl, Wordld var2) Offset long right Wordl6 shr_r (ordl6 vari, ordl6 var2) Right scroll with rounding Wordld mac_r (Word32 L_var3, Wordl6 vari, ordl6 var2) Mac with rounding Wordld msu_r (Word32 L_var3, Wordl6 vari, Wordl6 var2) Msu with rounding ord32 L_deposit_h (Wordl6 vari) 16 bit vari -MSB ord32 L_deposit_l (Wordl6 vari) 16 bit vari -LSB Word32 L_shr_r (Word32 L_varl, ordl6 var2) Long right scroll with rounding Word32 L_abs (Word32 L_varl) abs long Word32 L sat (Word32 L ^ varl) Long saturation Table 11: Basic operations used in simulation ANSÍ C Operation Description Wordl6 norm_s (ordld vari) Short rule Wordl6 div_s (ordl6 vari, ordl6 var2) Short division ordl6 nora_l (ord32 L_varl) Long rule Table 12: Summary of Tables File Name of Tables Size tab_hup. c tab_hup.8 28 tab_hup.c tab_hup_l 112 inter_3. c inter_3. c 13 pred_it3. c inter_3 31 lspcb. tab lspcabl 128 x 10 lspcb. tab lspcb2 32 x 10 lspcb. tab fg 2 x 4 x 10 lspcb. tab fg_sum 2 x 10 lspcb. tab fg_sum_inv 2 x 10 qua_gain. tab gb l 8 x 2 qua_gain. tab gbk2 16 x 2 qua_gain. tab mapl 8 qua_gain. tab imapl 8 qua_gain. tab map2 16 qua_gain. tab ima21 16 window. tab window 240 lag_ ind. tab lag_h 10 lag_wind. tab lag_l 10 ? úo Table 12: Summary of Tables File Name of Tables Grid size. tab grid 61 inv_sqrt. tab table 49 log2.tab table 33 lsp_lsf.tab table 65 lsp_lsf.tab slope 64 pow2. tab table 33 acelp.h ld? k.h typedef, h Table 12: Summary of Tables (continued) File Description tab_hup.c Upward sampling filter for post-filter tab_hup.c Upward sampling filter for post-filter inter_3. c FIR filter to interpolate the correlation pred_it3. c FIR filter to interpolate the last excitation ispcb. Tab LSP quantifier (first stage) lspcb. tab LSP quantifier (second stage) lspcb. tab Predictors MA in LSP VQ Ispcb. tab Used in LSP VQ lspcb. tab Used in LSP VQ qua_gai. tab Code encryption GA in VQ gain qua_gain. tab Code encrypted GB in VQ gain Table 12: Summary of Tables (continued) File Description qua_gain. tab Used in VQ gain qua_gain. tab Used in VQ gain qua_gain. tab Used in VQ gain qua_gain. tab Used in VQ gain window. tab Analysis window LP lag_wind. tab Delay window for bandwidth expansion (upper part) lag_wind. tab Delay window for bandwidth expansion (lower part) grid. tab Grid points in LP conversion to LSP inv_sqrt. tab Search table in inverse square root calculation log2.tab Search table lsp_lsf. tab Search table in calculation logarithm base 2 lsp_lsf. Tab Table of search in conversion LSF to LSP and vice versa pow2. tab Slope of line in conversion LSP to LSF acelp.h Search prototypes for fixed code books ldßk.h prototypes and constants typedef .h Type definitions Table 13 Summary of encoder-specific routines Name File description. acelp_co c Search for fixed encryption code autocorr .c Calculate autocorrelation for LP analysis az_lsp.c Calculate the LSP from LP coefficients cod_ld8k. c convolute encoder routine. c Convolution operation corr_xy2. c Calculate correlation terms for gain quantification enc_lag3. c Encode adaptive encryption code index g_pitch.c Calculate adaptive encryption code gain gainpred. c Profit forecaster int_lpc. c Interpolation of LSP inter_3. c Interpolation with fractional delay lag_wind.c Formation of delay windows levinson.c Recursion Levinson lspenc.c LSP encoding routine lspgetq.c LSP quantifier lspget .c Calculate LSP quantizer distortion lspgetw.c Calculate LSP weights lsplast .c Select LSP predictor MA lsppre.c Pre-selection of the first LSP encrypted code lspprev.c LSP predictor routine Table 13 Summary of encoder-specific routines Name File description. lspell.c First stage LSP quantifier lspel2.c Second stage LSP quantizer lspstab.c Stability test for LSP quantifier pitch_fr.c Closed loop pitch search pitch_ol. c Open loop tone search pre_proc. c Pre-processing (HP filtering and scale adjustment) pwf .c Calculation of perceptual weighting coefficient qua_gain. c Gain quantifier qua_lsp.c LSP quantifier relspwe.c LSP quantifier Table 14: Summary of decoder-specific routines Name File description d_lsp.c Decode LP information de_acelp. c Decode algebraic cipher code dec_gain.c Decode dec_lag3 gains. c Decode adaptive cipher code index dec_ld8k.c Decoder routine lspdec.c LSP decode routine Table 14: Summary of decoder-specific routines Name Description of file post_pro.c Post-processing (for HP filtering and scaling) pred_lt3.c Generation of adaptive encrypted code pgt-c Post filtering routines Table 15: Summary of general routines Name Description of basicop2.c file Basic operators bits .c Routine manipulation of gainpred bits. c Profit forecaster int_lpc. c Interpolation of the LSP inter_3. c Fractional delay interpolation lsp_az .c Calculate LP from LSP coefficients lsp_l-2.c Conversion between LSP and LSF lsp_lsf2.c High precision conversion between LSP and LSF lspexp.c LSP coefficient expansion lspstab.c Stability test for LSP quantizer p_parity.c Calculate parity of tone pred_lt3.c Generation of adaptive encrypted code randos. c Random generator residu.c Calculate residual signal syn_filt .c Synthesis filter Table 15: Summary of general routines Name File description weight_a.c Bandwidth Expansion LP Coefficients It is noted that in relation to this date, the best method known by the applicant to carry out said invention is the which is clear from the present description of the invention. Having described the invention as above, property is claimed as contained in the following:

Claims

CLAIMS 1. A method for use in a speech decoder including a first portion comprising an adaptive encrypted code and a second portion comprising a fixed encrypted code, the decoder generates a speech excitation signal selectively based on output signals to Starting from the first and second portions when the decoder fails to reliably receive at least a portion of a current frame of the compressed speech information, the method is characterized in that it comprises: classifying a speech signal to be generated by the decoder as periodic or not periodic based on the classification of the speech signal, either: generate the excitation signal based on the output signal from the first portion and not on the output signal of the second portion if the speech signal is classified as periodic, or generate the excitation signal based on the output signal from the second portion and not on the output signal of the first portion, if the speech signal is classified as non-periodic.
2. The method according to claim 1, characterized in that the step of classifying is carried out based on the information that is provided by an adaptive post-filter.
3. The method according to claim 1, characterized in that the speech signal classification is based on compressed speech information that is received in a previous frame.
The method according to claim 1, characterized in that the output signal of the first portion is generated based on a vector signal from adaptive encrypted code, the method further comprising: determining an encrypted code delay signal adaptive based on a measurement of a speech signal tone period that is received by the decoder in a previous frame; and selecting the vector signal with use of the adaptive encrypted code delay signal.
The method according to claim 4, characterized in that the step of determining the adaptive ciphered code delay signal comprises increasing the measurement of the speech signal tone period by one or more speech signal sample intervals.
The method according to claim 1, characterized in that the first portion further comprises an amplifier for generating an amplified signal based on a vector signal from the adaptive cipher code and a scaling factor, the method further comprises determine the scale adjustment factor based on Scale adjustment factor information received by the decoder in a previous frame.
The method according to claim 6, characterized in that the step of determining the scale adjustment factor comprises attenuating a scale adjustment factor that corresponds to scale factor information of the previous frame.
The method according to claim 1, characterized in that the output signal of the second portion is based on a vector signal from the fixed encrypted code, the method further comprising: determining a fixed encrypted code index signal with use of a random number generator; and selecting the vector signal with use of the fixed encrypted code index signal.
The method according to claim 1, characterized in that the second portion further comprises an amplifier for generating an amplified signal based on a vector signal from the fixed coded code and a scaling factor, the method further comprises determine the adjustment factor in scale based on scale adjustment factor information, received by the decoder in a previous frame.
10. The method according to claim 9, characterized in that the step of determining the scale adjustment factor comprises attenuating an adjustment factor in scale that corresponds to information of adjustment factor in scale of the previous table.
11. A speech decoder for generating a speech signal based on compressed speech information that is received from a communication channel, the decoder is characterized in that it comprises: an adaptive encrypted code memory; a fixed encrypted code memory; means for classifying the speech signal to be generated by the decoder as periodic or non-periodic; means for forming an excitation signal, the means comprise first means for forming an excitation signal when the decoder fails to reliably receive at least a portion of a current frame of compressed speech information, the first means forming the excitation signal with base on a vector signal from the adaptive encrypted code memory and not based on a vector signal from the fixed encrypted code memory, when the speech signal to be generated is classified as periodic, and based on a vector signal from the fixed encrypted code memory and not in a vector signal from adaptive encrypted code memory, when the speech signal to be generated is classified as non-periodic, and a linear predictive filter to synthesize a speech signal based on the excitation signal.
12. The decoder according to claim 11, characterized in that the means for classifying comprises a portion of an adaptive postfilter.
13. The decoder according to claim 11, characterized in that the means for classifying classify the speech signal based on compressed speech information that is received in a previous frame.
14. The decoder according to claim 11, characterized in that it further comprises: means for determining an adaptive cipher code delay signal, based on a measurement of a speech signal tone period received by the decoder in a previous frame; and means for choosing the vector signal from the adaptive encrypted code memory with use of the adaptive encrypted code delay signal.
The decoder according to claim 14, characterized in that the means for determining the adaptive cipher code delay signal comprises means for increasing the measurement of the first tone of the speech signal by one or more speech signal sample intervals. .
16. The decoder according to claim 11, characterized in that it further comprises: an amplifier for generating an amplified signal based on a vector signal from the adaptive cipher code and scale adjustment factor; and means for determining the scaling factor, based on scaling factor information received by the decoder in a previous frame.
The decoder according to claim 16, characterized in that the means for determining the scaling factor comprise means for attenuating a scaling factor corresponding to the previous frame.
18. The decoder according to claim 11, characterized in that it further comprises a random number generator, the generator determines a fixed encrypted code index signal to be used in choosing the fixed encrypted code vector signal.
19. The decoder according to claim 11, characterized in that it further comprises: an amplifier for generating an amplified signal based on the vector signal from the fixed coded code and a scale adjustment factor; and means for determining the scale adjustment factor based on scaling factor information received by the decoder in a previous frame.
20. The decoder according to claim 19, characterized in that the means for determining the scale adjustment factor comprise means for attenuate an adjustment factor in scale that correspond to the previous table.