CN1701353A

CN1701353A - A transcoding scheme between CELP-based speech codes

Info

Publication number: CN1701353A
Application number: CNA038055198A
Authority: CN
Inventors: M·A·贾布里; J·王; S·戈徳
Original assignee: Dilithium Networks Inc
Current assignee: Di Lee Sim (for the benefit of creditors) Ltd.; Di Lee Sim Network Inc.; Dilithium Networks Inc
Priority date: 2002-01-08
Filing date: 2003-01-08
Publication date: 2005-11-23
Anticipated expiration: 2023-01-08
Also published as: WO2003058407A2; WO2003058407A3; EP1464047A4; EP1464047A2; AU2003207498A1; AU2003207498A8; KR20040095205A; JP2005515486A; CN100527225C

Abstract

Transcoding a CELP based compressed voice bitstream from source codec to destination codec relate to embodiments of a system and method. The method includes processing a source codec input bitstream to unpack CELP parameters from the input CELP bistream and may interpolate the unpacked CELP parameters from is a difference of destination codec parameters and source codec parameters exists. If the method maps CELP from source codec format to a destination codec format, the parameter mapping strategy may be singly preset or selected. The method inludes encoding the CELP parameters for the destination codec and processing a destination CELP bitstream by packing the CELP parameters for the destination codec.

Description

Based on the code conversion scheme between the phonetic code of CELP

The cross reference of relevant application

The application require the preference of the following U.S. Provisional Application own together, respectively be proposed on January 8th, 2002 60/347,270,60/364 of proposition on March 12nd, 2002,403,60/421 of proposition on October 25th, 2002,446,60/421 of proposition on October 25th, 2002,60/421 of proposition on October 25th, 449 and 2002,270, here in practical application in conjunction with as a reference.

Under research that federal government subsidizes or exploitation, make the statement of invention right

Inapplicable

With reference to " sequence list ", a kind of form of on CD, submitting to or computer program tabulation appendix

Inapplicable

Background of invention

The present invention relates generally to some technology of process information.Especially, the invention provides a kind of method and apparatus, be used for changing the CELP frame to another based on the standard of CELP and/or in the still different pattern of single standard from a standard based on CELP.In whole instructions, especially below, provide further detailed description of the present invention.

Coding is that original signal (speech, image, video etc.) is converted to the process that can admit transmission or formats stored.Usually coding causes a large amount of compressions, but generally comprising important signal Processing reaches.The result of coding is the bit stream (sequence of frame) according to the encoded parameter of given compressed format.The redundant information that makes signal become the various technology of model to remove in statistical and the perception by use obtains compression.Therefore encoded form is called " compressed format " or " parameter space ".Demoder is obtained compressed bit stream, and produces original signal again.In the situation of voice coding, compression generally causes information dropout.

The process of the bit rate of coded signal is code conversion before the process of known conversion between different compressed formats and/or the minimizing.Bandwidth be can so save, or incompatible client and/or server unit connected.Code conversion can not be visited original signal to visit compressed signal only with the different transcoders (transcoder) that are of directly compression processing.

Use level and smooth (brute force) technology (it has the recompression process of following removing the compression process back) such as " cascade " can the completion code conversion.Because need a large amount of processing usually and may postpone,, can consider the code conversion in compression stroke or the parameter space signal is removed compression and then compression.This code conversion helps the mapping between the compressed format, is retained in simultaneously in any possible parameter space.The time marquis that " intelligence (smart) " code conversion algorithm of complexity that Here it is begins to work.Though progressive to some extent in code conversion, wish further to improve the code conversion technology.In whole instructions, especially below, will further describing of restriction in the conventional art be described more completely.

Brief summary of the invention

According to the present invention, provide some technology of process information.Especially, the invention provides a kind of method and apparatus, be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but different patterns is carried out the conversion of frame.In whole instructions, especially below, will provide of the present invention further describing.

In a certain embodiments, the invention provides a kind of equipment, be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but different patterns is carried out the conversion of frame.This equipment has a bit stream and removes package module, is used for obtaining one or more CELP parameters from a source codec.This equipment also has an interpolator module that is coupled to bit stream removal package module.Interpolator module is applicable to the interpolation between the sampling rate of different frame size, subframe size and/or source codec and purpose codec.A mapping block is coupled to interpolator module.Mapping block is applicable to from one or more CELP parameter maps of source codec one or more CELP parameters to the purpose codec.This equipment has the purpose bit stream package module that is coupled to mapping block.Purpose bit stream package module is applicable to according at least one or a plurality of CELP parameter from the purpose codec and constitutes at least one purpose output CELP frame.A controller is coupled to purpose bit stream package module, mapping block, interpolator module and bit stream at least removes package module.Best, controller is applicable to the operation of the one or more modules of management, and is applicable to the instruction of reception from one or more external applications.Controller is applicable to status information is offered one or more external applications.

In other specific embodiment, the invention provides a kind of method, be used for carry out code conversion based on the compressed voice bitstream of CELP from the source codec to the purpose codec.This method comprises that process source codec input CELP bit stream makes it at least one or a plurality of CELP parameter from input CELP bit stream are removed encapsulation, comprise frame sign if exist, subframe size, and/or one or more in a plurality of purpose codecs parameter of the sampling rate of purpose codec format and comprise frame sign, subframe size, and/or the one or more difference in the multiple source codecs parameter of the sampling rate of source codec format, then one or more the CELP parameter of a plurality of removals encapsulation from the source codec format to purpose codec format interpolation.This method comprises encodes to one or more CELP parameters of purpose codec, and by the one or more CELP parameters that encapsulate the purpose codec at least processing intent CELP bit stream.

In other specific embodiment, the invention provides a kind of method, be used for carry out processing based on the compressed voice bitstream of CELP from the source codec to the purpose codec format.This method comprises in a plurality of control signals of self-application program process always and transmits a control signal, and at least according to from a plurality of different CELP mapping policys, selecting a CELP mapping policy from the control signal of application program.This method also comprises uses selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.

Further again, the invention provides a kind of system, be used for carry out processing based on the compressed voice bitstream of CELP from the source codec to the purpose codec format.This system comprises one or more storeies.Sort memory can comprise one or more codes, and being used for always, a plurality of control signals of self-application program process receive a control signal.Also comprise one or more codes, be used for basis and select a CELP mapping policy from the control signal of application program from a plurality of different CELP mapping policys at least.One or more storeies also comprise one or more codes, be used to use selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.According to embodiment, can also be useful on carry out function described herein and this explanation outside, other computer code of the function that can combine with the present invention.

Use the present invention to obtain many benefits.According to embodiment, can obtain one or more in these benefits.

● reduce the computational complexity of transcode process.

● reduce delay by transcode process.

● the quantity of the storer that the minimizing code conversion needs.

● introduce dynamic rate control.

● support quiet (silence) frame by the speech activity detector that embeds.

● the framework that can use various parameter maps strategies is provided.

● provide general code conversion foundation structure to adapt to current and codec in the future based on diversity CELP.

Code conversion invention can obtain one or more in these benefits.In a particular embodiment, code conversion equipment comprises:

● source CELP parameter is removed package module, and it obtains the CELP parameter from input coding CELP bit stream;

● CELP parameter interpolate device, it becomes purpose CELP parameter to input source CELP Parameters Transformation, and this purpose CELP parameter is corresponding to the subframe size difference between the source and destination codec; If the sub-Frame size of source and destination codec is different, then the operation parameter interpolation.

● purpose CELP parameter maps and tuning engine, it is transformed into purpose CELP codecs parameter to the CELP parameter from described interpolator module;

● purpose CELP code wrapper, it is encapsulated into the CELP parameter through mapping in the purpose CELP code frame;

● the advanced features manager, its management is in optional feature and the feature of the CELP-in-CELP code conversion;

● controller, it manages whole transcode process;

● the status report function, it provides the state of transcode process.

It is the CELP demoder that does not have the simplification of format filter and back-wave filter that source CELP parameter is removed package module.

CELP parameter interpolate device comprises one group of interpolater with one or more CELP relating to parameters.

Purpose CELP parameter maps and tuner module comprise parameter maps strategy handover module, and one or more in the following parameters mapping policy a: module of CELP parameter straight space mapping, analyze a module of excitation space mapping, analyze a module through the excitation space mapping of filtering.

The present invention is the run time version conversion on the basis of a sub-Frame of a sub-Frame.That is, when the code converting system received (the source compressed information) frame, transcoder can begin to operate thereon, and produced the sub-Frame of output.In case produced the sub-Frame of sufficient amount, just can produce (according to the compressed information of purpose form) frame, and if communication be purpose, just can send to communication channel.If storage is a purpose, then can store the frame that is produced on demand.If the extended period of the frame by source and destination format standard definition is identical, then single incoming frame will produce single output frame, otherwise will need to cushion other incoming frame, or produce a plurality of output Frame.Extended period as fruit Frame is different, then will need the interpolation between the sub-Frame parameter.Therefore, code conversion operation comprises four kinds of operations: (1) bit stream is removed encapsulation, the interpolation of (2) sub-Frame buffering and source CELP parameter, (3) mapping and be tuned to purpose CELP parameter, and (4) encapsulation code is with the generation output frame.

So when receiving frame, the encapsulation of transcoder removal bit stream is included in the CELP parameter (Figure 10, square frame (1)) of each the sub-Frame in the frame with generation.Parameters of interest is that LPC coefficient, excitation (producing from self-adaptation and fixed codeword) and pitch lag behind.Note, for the low-complexity solution that produces excellent quality, only need be to excitation rather than to decoding whole the synthesizing of speech waveform.Sub-if desired Frame interpolation is then finished by intelligent interpolation engine at this moment.

Present sub-Frame is in a kind of form, and this form can be admitted by the processing of purpose parameter maps and tuner module (Figure 10, square frame (5)).Be independent of excitation CELP parameter and shine upon short-term LPC filter coefficient.Can use the simple linear mapping in LSP puppet-frequency space, be used for the LSP coefficient of purpose codec with generation.Many methods that can correspondingly provide better quality output by the cost of computational complexity are shone upon excitation CELP parameter.In presents, described three kinds of so mapping policys, and be the part (Figure 10, square frame (4)) of mapping and tuning policy module:

● CELP parameter straight space mapping (DSM);

● analyze the excitation space territory;

● analyze excitation space territory through filtering

The selection of mapping and tuning strategy is by mapping and tuning tactful handover module (Figure 10, square frame (3)).

Because three kinds of methods are to quality tradeoffs in order to reduce computation burden, thus at equipment owing under the situation that a large amount of simultaneously channels transship, can use them, so that appropriate degrading to be provided aspect the quality.Therefore the performance of code converter can be adapted to available resource.On the other hand, can use the quality of only generation requirement and a kind of strategy of performance to construct transcoding system.In this case, will be not in conjunction with mapping and tuning tactful handover module (Figure 10, square frame (3)).

If can be applicable to the purpose standard, then can also use speech activity detector (in parameter space, operating) to reduce the bandwidth of output this moment.Then, can be encapsulated into (Figure 10, square frame (7)) in the purpose bitstream format frame to parameter, and produce and be used for sending or storage through mapping.

The present invention includes and be used between based on the voice coding standard of CELP, carrying out algorithm and the method that smart code is changed.The present invention also comprises the code conversion in the single standard, so that carry out rate controlled (arriving than low mode or the speech activity detector introducing silent frames by embedding by code conversion).

Manage whole transcode process (Figure 10, square frame (8)) by control module, described control module sends order according to the state and the external command of code conversion.

In order to adapt to different code conversion requirements, equipment of the present invention provides the possibility (Figure 10, square frame (6)) of adding optional feature and function.

From description below in conjunction with accompanying drawing, will be more clear to other features and advantages of the present invention, in all accompanying drawings, do corresponding identification with identical mark.

The accompanying drawing summary

Special statement believes it is novel purpose of the present invention, feature and advantage in appending claims.By with reference to following explanation, can with further purpose and advantage understanding preferably arranged to invention aspect tissue of the present invention and the mode of operation two together with accompanying drawing.

Fig. 1 is the simplified block diagram of the decoder level of general celp coder;

Fig. 2 is the simplified block diagram of the encoder level of general celp coder;

Fig. 3 is the simplified block diagram that the mathematical model of codec is shown;

Fig. 4 is the simplified block diagram that the mathematical model of cascade conversion codec (transcodec) is shown;

Fig. 5 is the simplified block diagram that the mathematical model of intelligent conversion codec is shown;

The explanation that Fig. 6 one of is based in the legacy equipment of code conversion of CELP;

The explanation that Fig. 7 one of is based in the legacy equipment of code conversion of CELP;

Fig. 8 is a simplified block diagram, and the general code conversion between the CELP codec is shown;

Fig. 9 is a simplified block diagram, illustrates to be used for GSM-AMR and sub-Frame interpolation G.723.1;

The simplified block diagram of the system that Figure 10 describes to constitute according to one embodiment of present invention is with convert the output CELP bit stream of purpose codec to from the input CELP bit stream code of source CELP codec;

Figure 11 is that source codec CELP parameter is removed the more detailed simplified block diagram of package module;

Figure 12 is a simplified block diagram, and the interpolation for a subframe that G.723.1 arrives GSM-AMR and a sampling parameters of a sampling is shown;

Figure 13 is a simplified block diagram, and the excitation by the encoded LPC coefficient correction of source codec LPC coefficient and purpose codec is shown;

Figure 14 is a simplified block diagram, and the parameter maps and the tuner module of more detailed CELP parameter maps is shown;

Figure 15 is the simplified block diagram of more detailed purpose CELP parameter tuner module;

Figure 16 is a simplified block diagram, and the purpose CELP code that is encapsulated in the GSM-AMR frame is shown

Embodiment;

Figure 17 describes G.723.1 to arrive an embodiment of GSM-AMR transcoder; And

Figure 18 describes GSM-AMR and arrives a G.723.1 embodiment of transcoder.

Detailed description of the present invention

According to the present invention, provide the technology of process information.Especially, the invention provides a kind of method and apparatus, be used for changing the CELP frame to another kind based on the standard of CELP and/or in the still different pattern of single standard from a kind of standard based on CELP.In whole instructions, especially below, provide further detailed description of the present invention.

The present invention includes the algorithm and the method that are used for carrying out based on coding method and the code conversion between the standard of CELP (Code Excited Linear Prediction).The CELP coding method that most interested is by the group normsization such as International Telecommunication Union or ETSI (ETSI).The present invention also is included in the code conversion in the single standard, so that carry out rate controlled (arriving than low mode or the speech activity detector introducing silent frames by embedding by code conversion).

Generally can (for example be categorized into wave coder to speech coding technology, from ITU G.711, G.726, standard G.722) and scrambler by synthesis analysis (AbS) type (for example, from the G.723.1 and G.729 standard of ITU, and from the GSM-AMR standard of ETSI and from enhanced variable rate codec (EVRC) standard, selectable modes vocoder (SMV) standard of telecommunications industry association (TIA)).Wave coder is operated in time domain, and they are based on the method for a sampling of a sampling, and this method is utilized the correlativity between the phonetic sampling.Scrambler trial by synthesis analysis is imitated human speech generation system by the model in the source (glottis) of simplification and the model of wave filter (voice range), and these models are formed on the output voice spectrum on the frame basis (generally using the frame sign of 10-30 millisecond).

Introducing the scrambler by the synthesis analysis type, provide high-quality speech by low bit rate, is cost to increase the calculated amount that needs.Compress technique is a kind of eloquent method of saving resource in the communication interface.

On mathematics, all audio coder ﹠ decoder (codec)s are all used one dimension analog voice signal x ₀(1) starts, this signal is taken a sample unchangeably and quantized, to obtain the numeric field expression, x (n)=Q (x ₀(nT)).The sampling rate f=1/T of voice signal generally is 8kHz or 16kHz, and generally sampled signal is quantized to maximum 16-bits.

Then, can consider codec based on CELP as a kind of algorithm, this algorithm uses model for speech production, is shining upon between the voice x (n) of sampling and some parameter space θ, that is, it carries out Code And Decode to digital speech.All all go up operation at speech frame (can further be divided into several subframes to frame) based on the algorithm of CELP.In some codec, speech frame is overlapped.Can be defined as speech frame the vector of the phonetic sampling that begins at n sometime, that is,

{\tilde{x}}_{i} = {[\begin{matrix} x (n) & x (n + 1) & . . . & x (n + L - 1) \end{matrix}]}^{T}

Wherein, L is the length (number of samples) of speech frame.Notice that the frame index i and the first frame sample n have linear relationship,

IL is for non-overlapped frame

n＝{

I (L-K) is for overlapping frame.Wherein K is the overlapping sampling number between the frame.

Now, compression (lossy coding) process is speech frame

Be mapped to parameter θ _iA kind of function, and decode procedure is from parameter θ _iShine upon back the raw tone frame

Approximate value.By the speech frame of demoder generation and the speech frame of original coding is unequal.The design codec with on the sensigenous as far as possible to input voice similar output voice, promptly, when processing parameter, scrambler must produce so parameter, and these parameters make the input speech frame and maximize by some the sensation level measured value between the speech frame of demoder generation.

Generally, from being input to parameter, the mapping from parameter to output needs the input before all or the knowledge of parameter.For example, this can be by obtaining in the structure that the state S in the codec is kept at the self-adaptation code book of using based on the method for CELP.Must synchronously preserve coder state and decoder states.By the data that only all have according to both sides' (encoder), that is, parameter is come update mode, just can reach this point.Fig. 3 illustrates the universal model of scrambler, channel and demoder.

The frame parameter θ that in model, uses based on CELP _iThe linear predictor coefficient (LPC) that comprises the short-term forecasting that is used for voice signal (relevant with voice range, oral cavity and nasal cavity and lip physically), and the pumping signal that constitutes by self-adaptation and fixed code.Use adaptive code to form the model of the long-term tone information in the voice.Code (self-adaptation and fixing) has the code book that is associated, and this code book is predefined for specific CELP codec.Fig. 1 illustrates typical C ELP demoder, wherein by gain factor self-adaptation and this vector of fixed password is calibrated independently, then, makes up and filtering, to produce synthetic voice.Usually these voice are by back one wave filter, to remove the artefact that model is introduced.

CELP coding (analysis) process comprises voice signal is carried out pre-service removing the unwanted frequency component shown in figure 2, and uses a window function, then obtains short-term LPC parameter.This general Levinson-Durbin algorithm that uses is finished.Become the LPC Parameters Transformation line frequency spectrum to (LineSpectral Pairs (LSP)), to promote quantification and subframe interpolation.Then, by short-term LPC wave filter make voice anti--filtering, to produce the residual excitation signal.This residue is carried out appreciable weighting, improving the quality, and analyze, to seek the estimated value of speech tone.Use a method of analyzing an analysis of closed loop to determine optimum tone.In case find tone, just from residue, deduct the self-adaptation code book component of excitation, and find optimum fixed codeword.The storer of new encoder inside more is with the change of reflection codec states (such as the self-adaptation code book).

The simplest method of code conversion is the smoothing method that is called as the cascade code conversion, sees Fig. 4.This method is carried out decoding completely to the compressed bit of input, to produce synthetic voice.Then, with target criteria synthetic voice are encoded.This method suffers from: a large amount of calculating that signal is encoded again, and from pre--and the quality decline problem introduced of back-filtering of speech waveform, and the potential delay of eyes front requirement (look-ahead-requirements) introducing by scrambler.

The method that " intelligence " code conversion similar in article, occurred to method illustrated in fig. 5.Yet these methods are still basically and construct voice signal again, then, carry out extensive work and obtain various CELP parameters, such as LPC and tone.That is, these methods are still operated in the voice signal space.Especially, only use pumping signal for the generation of synthetic speech, this pumping signal is optimally mated by far-end scrambler (at the scrambler of far-end, this far-end has produced compressed voice according to a kind of compressed format) and raw tone.Then, use synthetic voice to calculate new optimal excitation.Because in conjunction with the requirement of impulse response filter operation, this becomes calculating strength and operates greatly in the closed loop search.

Fig. 6 illustrates US 6,260, the method that 009 B1 uses.Quantize the resonance peak filter coefficient from input stimulus parameter and output and produce the signal of structure again that passes through searcher as the echo signal use.Because the difference between the resonance peak filter coefficient of quantification in the source and destination codec, this causes degrading in the searcher echo signal, and is last, reduces widely from the output voice quality of code conversion.See Fig. 6.In whole instructions, especially below, can find other restrictions.

Fig. 7 illustrates another kind " intelligence " code conversion method.Announced (US2002/0077812 A1).Change by run time version by directly shining upon the reciprocation between each CELP parameter ignorance CELP parameter for this method.This method only is applied to require between source and destination CELP codec in the particular case of extremely limited condition.For example, it requires Algebraic CELP (ACELP) and in source and destination codec identical subframe size among both.For the code conversion of great majority based on CELP, it does not produce the voice of excellent quality.This method one of only is suitable in the GSM-AMR pattern, does not comprise all patterns among the GSM-AMR.

Go through a kind of method and apparatus of the present invention below.In the following description, for illustrative purposes, state many specific details, so that thorough understanding of the present invention is provided.For illustrative purposes and purpose for example and use GSM-AMR and situation G.723.1.Method described herein is general, and be applied to the CELP codec any between code conversion.The relevant personnel that are familiar with the present technique field will appreciate that, can use other step, configuration and arrangement and without departing from the spirit and scope of the present invention.

The present invention includes algorithm and method, be used to carry out based on the smart code between the voice coding standard of CELP and change.The present invention also comprises the code conversion in the single standard, so that carry out rate controlled (arriving than low mode or the speech activity detector introducing silent frames by embedding by code conversion).Lower part is discussed details of the present invention.

The present invention is the run time version conversion on basis of sub-frames of a subframe.That is, when the code converting system received a frame, transcoder can begin the operation on its subframe, and produced the output subframe.In case produced the subframe of sufficient amount, just can produce a frame.If the extended period of the frame by the source and destination standard definition is identical, then an incoming frame will produce an output frame, otherwise will need to cushion each incoming frame or produce a plurality of output frames.If subframe has the different extended periods, then need be between the subframe parameter interpolation.Therefore the code conversion operation comprises four kinds of operations: (1) bit stream is removed encapsulation, the interpolation of (2) sub-Frame buffering and source CELP parameter, (3) mapping and be tuned to purpose CELP parameter, and (4) encapsulation code is with the generation output frame.(see figure 8).

Figure 10 is a block scheme, and the principle according to the codec code conversion equipment based on CELP of the present invention is described.This square frame comprises source bit stream removal package module, intelligent interpolation engine, parameter maps and tuner module, optional advanced features module, control module and purpose bit stream package module.

Parameter maps and tuner module comprise mapping and tuning tactful handover module and parameter maps and tuning policy module.

By control module management code conversion operations.

When receiving a frame, the encapsulation of transcoder removal bit stream is included in the CELP parameter of each subframe in the frame with generation.Parameters of interest is LPC coefficient, excitation (producing from self-adaptation and fixed codeword) and pitch lag.

Note only need decoding, rather than whole speech waveforms is synthetic to excitation.This has reduced source codec bit stream widely and has removed the complicacy of encapsulation.For CELP parameter straight space mapping (DSM) code conversion strategy, interested also have code book to gain and fixed codeword.The subframe interpolation is then finished at this moment if desired.

Subframe is in a kind of form now, and this form can be admitted by the purpose parameter maps shown in Figure 14 and the processing of tuner module.Be independent of excitation CELP parameter and shine upon short-term LPC filter coefficient.Can use the simple linear mapping in LSP puppet-frequency space, be used for the LSP coefficient of purpose codec with generation.Can also use more complicated non--linear interpolation.Many methods that can correspondingly provide better quality output by the cost of computational complexity are shone upon excitation CELP parameter.In presents, described three kinds of so mapping policys, and be the part (Figure 10, square frame (4)) of parameter maps and tuning policy module:

● CELP parameter straight space mapping (DSM);

● analyze the excitation space territory;

● analyze excitation space territory through filtering

Go through this three kinds of methods in the part below.Because these three kinds of methods are to quality tradeoffs in order to reduce computation burden, thus at equipment owing under the situation that a large amount of simultaneously channels transship, can use them, so that appropriate degrading to be provided aspect the quality.Therefore the performance of code converter can be adapted to available resource.On the other hand, can use the quality of only generation requirement and a kind of strategy of performance to construct transcoding system.In this case, will be not in conjunction with mapping and tuning tactful handover module (Figure 10, square frame (3)).

If can be applicable to the purpose standard, then can also use speech activity detector (in parameter space, operating) to reduce the bandwidth of output this moment.

The output of parameter maps and tuner module is purpose CELP codec code.They are encapsulated in the purpose bit-stream frames according to codec CELP frame format.Need encapsulation process, so that the output bit is put in the understandable form of purpose CELP demoder.If use is in order to store, then can to encapsulate purpose CELP parameter and maybe can store by using specific format.If transmit frame according to multi-media protocol, then can also change encapsulation process, for example, in encapsulation process, implement to compare bit scrambling.

In addition, equipment of the present invention provides the function of interpolation optional signals processing capacity in future or module.

The subframe interpolation

When the subframe of various criterion represents that different time maybe when using different sampling rate, may need the subframe interpolation during extended period in the signal domain.For example, G.723.1 use the frame (7.5 milliseconds of every subframes) of 30 milliseconds of extended periods, and GSM-AMR uses the frame (5 milliseconds of every subframes) of 20 milliseconds of extended periods.This illustrates to imagery in Fig. 9.On two kinds of dissimilar parameters, carry out the subframe interpolation: the parameter (such as excitation and code word vector) of a sampling of (1) sampling, and (2) subframe parameter (such as LSP coefficient and pitch lag estimated value).Shine upon them by the discrete time index of parameter of considering a sampling of a sampling and the correct position that copies in the target-subframe.If use different sampling rates by different CELP standards, then may need to take a sample up or down.Come interpolation subframe parameter by some interpolation functions, in target-subframe, to produce the smooth estimated value of parameter.The intelligence interpolation algorithm can improve the speech code conversion, is not aspect calculated performance, and the more important thing is aspect speech quality.Simple interpolation functions is a linear interpolation.

As an example, Fig. 9 illustrates needs three GSM-AMR frames to describe two the identical voice signal extended periods that just can describe of frame G.723.1.Equally, for per two G.723.1 subframe need three GSM-AMR subframes.As mentioned above, there are two class parameters: the parameter (for example, self-adaptation and fixed codeword) of a full subframe parameter (for example, LSP coefficient) and a sampling of a sampling.Come conversion table linearly to be shown the subframe parameter of θ by the weighted sum of calculating overlapping subframe, and by copy suitable sampling form be expressed as v[] the parameter of a sampling of a sampling.For from subframe G.723.1 to the interpolation of GSM-AMR subframe, illustrate that to analyze formula as follows:

θ_{i}^{gsm} = θ_{[2 i / 3]}^{g . 723.1} - - - i \mod 3 = 0,2

θ_{i}^{gsm} = \frac{1}{2} (θ_{[2 i / 3]}^{g . 723.1} + θ_{[2 i / 3]}^{g . 723.1}) - - - i \mod 3 = 1

v_{i}^{gsm} [n] = v_{[(40 i + n) / 60]}^{g . 723.1} [(40 i + n) \mod 60] - - - &ForAll; i, n

Wherein i=0 is first subframe of a GSM-AMR frame, and i=4 is first subframe of the 2nd GSM-AMR frame, or the like.Figure 12 describes this process.

Should be being inserted in puppet-frequency domain in the LSP parameter (they are full subframe parameters), i.e. f=cos ^-1(q).This causes the output of better quality.Before interpolation, do not need other subframe parameter of conversion.

Notice that above-mentioned analysis formula obtains from simple linear interpolation.Any suitable interpolation scheme (such as teeth groove (spline), sinusoidal, or the like) can substitute this formula.In addition, each CELP parameter (LSP coefficient, hysteresis, pitch gain, code word gain and or the like) can use different interpolation schemes to obtain optimal perceptual quality.

LSP parameter maps and excitation vectors by the LSP coefficient are proofreaied and correct

Though nearly all audio codec based on CELP all uses identical method to obtain the LPC coefficient, also has some less important differences.These differences are owing to different windows size and the Different L PC interpolation of shape, each subframe, different subframe size, different LPC quantization scheme and different look-up tables cause.

In order further to improve the quality of the Audiocode conversion that produces by above-mentioned subframe interpolating method, by using the excitation vectors that is used as the echo signal in the code conversion from the LPC adjustment of data of source and destination codec.

Can use following two kinds of methods to improve perceived quality.

The linear transformation of method 1:LSP coefficient

The conventional method of changing between the LSP coefficient is through linear transformation,

Q '=Λ q+b wherein q ' is a purpose LSP vector (in puppet-frequency domain), and q is source (original) LSP vector, and A is the matrix of a linear transformation, and b is a bias term.In the simplest situation, A reduces to identity matrix (identitymatrix), and b reduces to zero.For the embodiment that G.723.1 arrives the GSM-AMR transcoder, the DC bias term of using in the GSM-AMR codec is different with a DC bias term of G.723.1 codec use, uses the b item in the above-mentioned formula to compensate this difference.

Method 2: the excitation vectors by the LSP coefficient is proofreaied and correct

In each subframe by the synthetic source forcing vector through decoding of source LSP coefficient to be transformed into voice domain, then, the LP parameter through quantizing of application target codec is carried out filtering, to form the echo signal in the code conversion.This correction is chosen wantonly, and when there were significant differences in the LSP parameter, it can improve perceptual speech quality widely.Figure 13 describes to encourage bearing calibration.

Parameter maps and tuner module

Three kinds of strategies of mapping CELP excitation parameters are discussed in this part.Ordering by continuous computational complexity and output quality is represented them.Core of the present invention is such fact, that is, can directly shine upon excitation and need not to construct voice signal again.This means because signal does not need to resemble the conventional art requirement filtering by short-term impulse response, so during the closed loop codebook search, saved a large amount of calculating.This mappings work is because incoming bit stream has comprised the optimal excitation according to the source CELP codec that produces voice.The present invention uses this fact to carry out to replace the quick search in the excitation domain of voice domain.

As mentioned above, have three kinds of methods of each excitation that all has preferable successively performance mapping, allow transcoder to be adapted to available computational resource.

The mapping of CELP parameter straight space

This strategy is the simplest code conversion scheme.Mapping is based on the similarity of the physical significance between the source and destination parameter, and the direct run time version conversion of operational analysis formula and need not any iteration or search.The advantage of this scheme is that it does not need a large amount of storeies, and consumes almost nil MIPS, but it still can produce the sound of intelligence, even quality decreases.Notice that CELP parameter straight space mapping method of the present invention is different with the equipment of the prior art shown in Fig. 7.This method is general, and aspect different frame or subframe size, it is applied to all types of code conversions based on CELP.

Analysis in the excitation space territory

This strategy is to search for self-adaptation and fixed password these both than the more advanced part of previous strategy, and the gain of estimating by common mode by purpose CELP standard definition, unless define them in excitation domain rather than in voice domain.At first use from the tone of input CELP subframe and determine that by Local Search tonal content (pitch contribution) is as initial estimate.In case find, just deduct tonal content, and assign to determine fixed password originally by optimally mating remainder from excitation.The advantage of these Cascading Methods do not need to be the automatic correlation technique from the CELP standard is used to calculate open loop tone estimated value, but as an alternative, can determine from the pitch lag of the CELP subframe through decoding.Also be in excitation domain, rather than in the voice domain, execution is searched for, so that do not need the impulse response filter during tone and the codebook search.This has saved a large amount of calculating and not compromise output quality.

In the analysis in the excitation space territory of filtering

In this case, still the LP parameter is mapped directly to the purpose codec from the source codec, and the pitch lag of use through decoding is as the open loop tone estimated value of purpose codec.Still in excitation domain, carry out the search of closed loop tone.Yet, carry out this search of fixed password in excitation space territory through filtering.The selection of filter type, and whether the target vector of one or two search is transformed into this territory, depend on desired quality and complicacy requirement.

Various wave filters be can use, a low-pass filter of filtering scrambling (smooth irregularities), a wave filter that compensates the difference between the incentive characteristic in the source and destination codec and a wave filter that strengthens appreciable signal of interest feature comprised.Advantage is, uses the composite filter through the LP of weighting the echo signal in standard code is calculated, and the parameter of this wave filter (exponent number (order), frequency increase the weight of/remove to increase the weight of, phase place) all is tunable.Therefore, this strategy allow tuning and improve specific codec between the code conversion quality, and the quality tradeoffs that guarantees to reduce complicacy.

Silent frames code conversion and generation

Some is based on the standard implementation speech activity detector (VAD) of CELP, and it allows discontinuous transmission (DTX) and comfort noise between no speech period to produce (CNG).In using VAD, there is important bit rate advantage.Need the code conversion between these frames, and do not produce in the situation of silent frames, for the purpose codec produces silent frames at the source codec.Frame generally includes some parameters, is used at the suitable comfort noise of demoder place generation.Can use simple algebraic method that these parameters are carried out code conversion.

The embodiments of the invention example

Lower part show for G.723.1 with the embodiments of the invention of GSM-AMR voice coding standard.The invention is not restricted to these standards.It comprises all audio coding standard based on CELP.Be familiar with those skilled in the art person and will appreciate that how to use these methods to carry out other based on the code conversion between the coding standard of CELP.Before describing preferred embodiment, at first provide GSM-AMR and the G.723.1 simple declaration of codec.

The GSM-AMR codec

It is eight source codecs of 12.2,10.2,7.95,7.40,6.70,5.90,5.15 and 4.75 kilobits/second that the GSM-AMR codec uses bit rate.

Codec is based on Code Excited Linear Prediction (CELP) encoding model.Use the 10th rank linear prediction (LP), or short-term, composite filter.It is long-term to use so-called self-adaptation code book method to implement, or tone, composite filter.

In CELP phonetic synthesis model, by adding the pumping signal that constitutes short-term LP composite filter input from two excitation vectors of self-adaptation and fixing (innovation) code book.Come synthetic speech by presenting by two vectors correctly selecting the code book of short-term composite filter from these.Use by analyzing the search procedure of synthesize (in this process, according to appreciable weighted distortion measurement, the error minimum between the original and synthetic speech) and select the optimal excitation sequence in the code book.The perceptual weighting filter that uses in the search technique synthetic by analysis uses non-quantized LP parameter.

Codec is operated on the speech frame of 20 milliseconds (corresponding to 160 samplings by the sampling frequencies of 8000 sampling/seconds).Each place at 160 phonetic samplings analyzes voice signal, with the parameter (LP filter coefficient, self-adaptation and this index of fixed password and gain) of obtaining the CELP model.These parameters are encoded and sent.At the demoder place, these parameters are decoded, and come synthetic speech by the reconstituted pumping signal of LP composite filter filtering.

For 12.2 kilobits/second patterns, every frame is carried out twice LP and is analyzed, and for other pattern, carries out once.For 12.2 kilobits/second patterns, become two groups of LP Parameters Transformation the line frequency spectrum to (LSP), and use division matrix quantization (SMQ) to quantize together with 38 bits.For other pattern, single LP parameter group is converted to the line frequency spectrum to (LSP), and use division vector quantization (SVQ) to quantize.

Speech frame is divided into four subframes that each is 5 milliseconds (40 samplings).Each subframe sends self-adaptation and this parameter of fixed password.According to subframe use through quantize with non-quantized LP parameter or their interpolation form.According to the weighted speech signal of perception, estimate the open loop pitch lag every a subframe (except 5.15 and 4.75 kilobits/second patterns, the every frame of this two-mode carries out once).

Then, repeat following operation for each subframe:

● assign to calculate echo signal by weighted synthesis filter filtering LP remainder, wherein upgraded the original state (this and deduct the commonsense method equivalence of the zero input response of weighted synthesis filter from voice signal) of wave filter through weighting by the error between filtering LP remainder and the excitation.

● calculate the impulse response of weighted synthesis filter.

● then,, use target and impulse response, carry out closed loop tone analysis (seeking pitch lag and gain) by search open loop pitch lag.The use sampling resolution is 1/6 or 1/3 mark tone (according to pattern).

● upgrade echo signal by removing self-adaptation code book component (filtering adaptive code vector), and fixedly using this new target (seeking optimum innovation code word) in the algebraically codebook search.

● this gain of self-adaptation and fixed password is a scalar of using 4 and 5 bit quantizations respectively, or with the vector (having the moving average (MA) that puts on this gain of fixed password predicts) of 6-7 bit quantization.

● last, upgrade filter memory (using the pumping signal of determining) in order to seek the echo signal in the next subframe.

In each speech frame of 20 milliseconds, produce the Bit Allocation in Discrete of 95,103,118,134,148,159,204 or 244 bits, corresponding to the bit rate of 4.75,5.15,5.90,6.70,7.40,7.95,10.2 and 12.2 kilobits/second.

G.723.1 codec

G.723.1 codec has two bit rates associated therewith, that is, and and 5.3 and 6.3kbps.Two speed are the mandatory parts of encoder.Might on any 30 milliseconds of frame boundaries, between two speed, switch.

Codec is based on by the linear prediction analysis principle of composite coding, and attempts to make the weighted error signal minimum of perception.Scrambler is the upward operation of piece (frame) of 240 samplings at each.When the 8KHz sampling rate, this equals 30 milliseconds.Each piece at first carries out high-pass filtering, to remove the DC component, then, is divided into four subframes that each is 60 samplings.For each subframe, use untreated input signal to calculate the 10th rank Linear Predictive Coder (LPC) wave filter.Use prediction division vector quantizer (PSVQ) to quantize the LP wave filter of last subframe.Use non-quantized LPC coefficient to construct the short-term perception weighting filter, use this wave filter that entire frame is carried out filtering, and obtain the perceptual weighting voice signal.

For per two subframes (120 samplings), use the voice signal of weighting to calculate open loop pitch period L _OLCarrying out this tone on the piece of 120 samplings estimates.In the scope of from 18 to 142 samplings, search for pitch period.

From this moment, processed voice on the basis of 60 samplings of every subframe.

The pitch period that calculates before using through estimating, structure harmonic noise forming filter.Use the combination of LPC composite filter, resonance peak perceptual weighting filter and harmonic noise forming filter, to create impulse response.Then, use impulse response further to calculate.

Use pitch period estimation value L _OLAnd closed loop tone predicted value is calculated in impulse response.Use the 5th rank tone predicted value.Calculate pitch period as a little difference around open loop tone estimated value.From the initial target vector, deduct tone predicted value component then.Pitch period and difference both are sent to demoder.

At last, the aperiodic component of approximate excitation.For high bit rate, use multiple-pulse maximum likelihood and encourage, and, use the excitation of algebraically code book for low bit rate than quantizing (MP-MLQ).

First embodiment-GSM-AMR is to 6.723.1

Figure 17 is the block scheme according to the first embodiment of the present invention, illustrates from GSM-AMR to G.723.1 transcoder.The GSM-AMR bit stream comprises 95 bits (12 byte) of length from 244 bits (31 byte) of flank speed pattern 12.2kbps to minimum speed limit pattern 4.75 kbps codecs.Always have eight patterns.In eight GSM-AMR operator schemes each produces different bit streams.Because the G.723.1 frame of 30 milliseconds of extended periods comprises one and half GSM-AMR frame, so need two GSM-AMR frames to produce single G.723.1 frame.Can when arriving, the 3rd GSM-AMR frame produce G.723.1 frame of the next one then.So three GSM-AMR frames of every processing produce two G.723.1 frames.

The 10LSP parameter of using identical technology that the short-term filter in the GSM-AMR model for speech production is used is encoded, but presses different bitstream formats for different operator schemes.In the GSM-AMR normative document, provide the algorithm of constructing the LSP parameter again

In case produced the short-term filter parameter of each subframe, just needed to form excitation vectors by combination self-adaptation code word and fixing (algebraically) code word.According to 1/6 or 1/3 resolution pitch lag parameter, use 60-tap (tap) interpolation filter to construct the self-adaptation code word.Construct fixed codeword then, define as excitation by standard and formation:

x [n] = {\tilde{g}}_{p} v [n] + {\tilde{g}}_{c} c [n]

Wherein x is excitation, and v is the self-adaptation code word through interpolation, and c is the fixed code vector, and

With

It is respectively the gain of self-adaptation and fixed code.Use this to encourage then and upgrade the memory state that GSM-AMR removes wrapper, and shine upon by bit stream wrapper G.723.1.

Seek the self-adaptation code word of each subframe by the linear combination that forms excitation vectors, and seek remove the Optimum Matching of the target excitation signal x{} of wrapper structure by GSM-AMR.Combination is the weighted sum of five former excitations that lag behind continuously.This can illustrate best by formula:

v [n] = Σ_{j = - 2}^{2} β_{j}

u [n

- L + j], 0 \leq n \leq 59

V[wherein] be the self-adaptation code word of constructing again, u[] be former excitation impact damper, L is (integer) pitch lag (removing package module from GSM-AMR determines) that comprises between 18 and 143, and β _jBe the hysteresis weighted value, it determines gain and lagging phase.Search β _jVector table, make self-adaptation code word v[] and excitation vectors x[] between the coupling optimization.

In case find the adaptive code word component of excitation, just deduct this component from excitation, stay remainder and prepare by this coding of fixed password.The residual signal that calculates each subframe is,

x ₂[n]＝x[n]-v[n]，n＝0，…，59

X wherein ₂[] is the target of this search of fixed password, x[] be to remove the excitation that encapsulation is derived from GSM-AMR, and v[] be (through interpolation with through calibrating) self-adaptation code word.

For the G.723.1 height and the low rate mode of codec, fixed password originally is different.Two-forty is used the MP-MLQ code book, and it allows in any position, six pulses of the every subframe of even number subframe, and five pulses of the every subframe of odd number subframe.Low rate mode is used algebraically code book (ACELP), and it allows four pulses of every subframe in restricted position.Two kinds of code books are all used the grid sign to represent whether should be offset code word and are made it to move a position.Except owing to be to carry out search rather than carry out search in voice domain in excitation domain, do not use outside the impulse response filter, search for these code books by the method that in standard, defines.

When the processing of finishing each subframe, need upgrade (lasting) storer of codec.This so finishes: at first make former excitation impact damper u[] displacement 60 samplings (that is, a subframe), so that abandoned the oldest sampling, then encouraging 60 samplings that copy the impact damper top from current subframe to,

u [n] = \{\begin{matrix} u [n + 60], & - 85 \leq n < 0 \\ {\tilde{g}}_{p} v [n] + {\tilde{g}}_{c} c [n], & 0 \leq n \leq 59 \end{matrix}

Wherein first sampling with respect to current subframe is provided with index n, and the former definition of other parameter.

All parameters through mapping all are encoded to export G.723.1 in the bit stream, next frame is prepared to handle by system.

Second embodiment: 6.723.1 is to GSM-AMR

Figure 18 is a block scheme according to a second embodiment of the present invention, and the transcoder that G.723.1 arrives GSM-AMR is described.G.723.1 bit stream comprises the frame of length 192 bits (24 byte) of two-forty (6.3kbps) codec, or the frame of 160 bits (20 byte) of low rate (5.3kbps) codec.These frames have the structure of fairly similar, and difference only is the expression of this parameter of fixed password.

For high and low rate, by identical mode the 10LSP parameter that is used to form short-term voice range filter model is encoded, and can obtain to 25 from the bit 2 of frame G.723.1.Only the LSP to the 4th subframe encodes, and uses the interpolation between the frame, to produce the LSP of other three subframes again.Coding uses three look-up tables, and constructs the LSP vector again by the combination of three sub-vectors obtaining from these forms.Each form has 256 vector inputs, and two forms in front have 3-unit sub-vector, and last form has 4-unit sub-vector.Make up these and provide 10-unit LSP vector.

Construct the self-adaptation code word of each subframe by making up former excitation vectors.Combination is the weighted sum of the former excitation of five continuous hysteresis place.Can this be described preferably by formula,

v [n] = Σ_{j = - 2}^{2} β_{j}

u [n

- L + j], 0 \leq n \leq 59

V[wherein] be the self-adaptation code word of constructing again, u[] be former excitation impact damper, L is (integer) pitch lag that comprises between 18 and 143, and β _jIt is the hysteresis weighted value of determining by the pitch gain parameter.

Directly obtain lag parameter L from bit stream.Whole dynamic ranges that the first and the 3rd subframe use to lag behind, and the second and the 4th subframe to lag behind coding as from before the skew of subframe.Search to determine hysteresis weighting parameters β by form _jRemove the result of encapsulation as the self-adaptation code word, can be by calculating the approximate value of the gain of determining the mark pitch lag and being associated.

L_{i} - \frac{Σ_{j = - 2}^{2} j β_{i, j}^{2}}{Σ_{j = - 2}^{2} β_{i, j}^{2}}

For the G.723.1 height and the low rate mode of codec, fixed password originally is different.High-rate mode is used the MP-MLQ code book, and it allows in any position, six pulses of the every subframe of even number subframe, and five pulses of the every subframe of odd number subframe.Low rate mode is used algebraically code book (ACELP), and it allows four pulses of every subframe in restricted position.Two kinds of code books are all used the grid sign to represent whether should be offset code word and are made it to move a position.G.723.1 providing the algorithm that produces code word from encoded bit stream in the normative document.

u [n] = \{\begin{matrix} u [n + 60], & - 85 \leq n < 0 \\ {\tilde{g}}_{p} v [n] + {\tilde{g}}_{c} c [n], & 0 \leq n \leq 59 \end{matrix}

The GSM-AMR parameter maps of transcoder partly obtains aforesaid through the CELP of interpolation parameter, and uses their bases as search GSM-AMR parameter space.When receiving, the LSP parameter is encoded simply, and use other parameter, that is, excitation and pitch lag are as the estimated value of sound search in the GSM-AMR space.Below describe (figure) the main operation that must occur in for completion code conversion on each subframe is shown.

For with the optimum matching of target excitation, the former excitation vectors that reaches maximum 143 hysteresis by search forms the self-adaptation code word.Determine target excitation from subframe through interpolation.Can come interpolation excitation in the past at interval by 1/6 or 1/3 according to pattern.Seek optimum the hysteresis by search about a zonule of pitch lag (determining) from G.723.1 removing package module.Search for this zone and lag behind, and then seek and definite fractional part that lags behind to seek optimum integer.This process is used 24-tap interpolation filter, to carry out the mark search.First is different with the processing of the second and the 4th subframe with the processing of the 3rd subframe.Then, form self-adaptation code word v[through interpolation] be,

v [n] = Σ_{i = 0}^{9} u [n - L - i] b_{60} [t + 6 i] + u [n - L + 1 + i] b_{60} [6 - t + 6 i]

V[wherein] be former excitation impact damper, L is (integer) pitch lag, t is the mark pitch lag by 1/6 resolution, and b ₆₀It is the 60-tap interpolation filter.

Calculate and quantize pitch gain, so that can encode and send to demoder, and be used to calculate this target vector of fixed password it.All patterns are all pressed same way as each subframe are calculated pitch gain,

g_{p} = \frac{x^{T} v}{v^{T} v}

G wherein _pBe non-quantized pitch gain, x is the target of self-adaptation codebook search, and v is (through interpolation) self-adaptation code word vector.12.2kbps quantize self-adaptation and this gain of fixed password independently with the 7.95kbps pattern, and other pattern is used the quantification of uniting of fixing and adaptive gain.

In case find the self-adaptation code book component of excitation, just deduct this component from excitation, stay remainder and prepare to be used for by fixed password coding originally.The residual signal that calculates each subframe is,

x_{2} [n] = x [n] - {\tilde{g}}_{p} v [n], n = 0, \cdot \cdot \cdot, 39

X wherein ₂[] is the target of this search of fixed password, x[] be the target of self-adaptation codebook search, g^ _pBe pitch gain, and v[through quantizing] be (through interpolation) self-adaptation.

The designs fix codebook search is to seek the optimum matching for residual signal after removing self-adaptation code book component.This is very important for non-voice voice and for starting the self-adaptation code book.Owing to the analysis of a large amount of raw tones has taken place, so the codebook search that uses can be simpler than the codebook search that uses in codec in code conversion.Also have, the signal of carrying out codebook search thereon is the pumping signal through constructing again that replaces synthetic speech, has therefore had a kind of structure that more can admit this coding of fixed password.

According to the energy of former four subframes, use the moving average value prediction to quantize this gain of fixed password.Correction factor between reality and the prediction gain is quantized (by searching form), and send to demoder.In the GSM-AMR normative document, provide definite details.

When the processing of finishing each subframe, need to upgrade (lasting) storer that is used for codec.This so carries out: at first make former excitation impact damper u[] displacement 40 samplings (that is, a subframe), consequently abandon the oldest sampling, from current subframe excitation is copied to 40 samplings in top of impact damper then,

u [n] = \{\begin{matrix} u [n + 40], & - 114 \leq n < 0 \\ {\tilde{g}}_{p} v [n] + {\tilde{g}}_{c} c [n], & 0 \leq n \leq 39 \end{matrix}

Wherein first sampling with respect to current subframe is provided with index n, and other parameter all defined in the past.

When illustrating and describing the embodiment of the current conduct example of thinking of the present invention, those skilled in the art that will appreciate that, can carry out various other modifications, and can substitute, and not depart from true scope of the present invention with equivalent.In addition, can make many modifications by theory of the present invention adapts to specific situation and does not depart from invention thought in center described herein.

Claims

1. equipment, be used for from a kind of standard based on CELP to another kind based on the standard of CELP and/or in single standard but different patterns is carried out the conversion of CELP frame, comprising:

Bit stream is removed package module, is used for obtaining one or more CELP parameters from a source codec;

Be coupled to bit stream and remove an interpolator module of package module, interpolator module is applicable to the interpolation between the sampling rate of different frame size, subframe size and/or source codec and purpose codec;

Be coupled to a mapping block of interpolator module, mapping block is applicable to from one or more CELP parameter maps of source codec one or more CELP parameters to the purpose codec;

Be coupled to the purpose bit stream package module of mapping block.Purpose bit stream package module is applicable to according at least one or a plurality of CELP parameter from the purpose codec and constitutes at least one purpose output CELP frame; And

Be coupled to a controller of purpose bit stream package module, mapping block, interpolator module and bit stream removal package module at least, controller is applicable to the operation of the one or more modules of management, and be applicable to the instruction of reception from one or more external applications, controller is applicable to status information is offered one or more external applications.

2. equipment as claimed in claim 1 is characterized in that, described controller is single controller or a plurality of controller.

3. equipment as claimed in claim 1 is characterized in that, described mapping block and described purpose bit stream package module are in same module.

4. equipment as claimed in claim 1 is characterized in that, described mapping block is individual module or a plurality of module.

5. equipment as claimed in claim 1 is characterized in that, described interpose module is individual module or a plurality of module.

6. equipment as claimed in claim 1 is characterized in that, described bit stream package module comprises:

Bit-stream processor, described bit-stream processor are applicable to that first form by one or more CELP parameters obtains information in the CELP codec incoming frame of source;

Be coupled to the LSP decoder module of described bit-stream processor, described LSP decoder module is applicable to that use is at least from the information of source CELP codec incoming frame and export one or more LSP coefficients;

Be coupled to the decoder module of described bit-stream processor, described decoder module is applicable to information decoding to export pitch lag parameter and pitch gain parameter from source CELP codec incoming frame;

Be coupled to this decoder module of fixed password of described bit-stream processor, described this decoder module of fixed password is applicable to information decoding with this vector of output fixed password;

Be coupled to the self-adaptation code word decoder module of described bit-stream processor, described self-adaptation code word decoder module is applicable to information decoding with output adaptive code book component vector; And

Be coupled to the actuation generator of described this decoder module of fixed password, described actuation generator is applicable to and uses at least this vector of fixed password and self-adaptation code book vector with the output drive vector.

7. equipment as claimed in claim 1 is characterized in that, described interpolator module comprises:

The LSP process, when the source codec had different subframe size with the purpose codec, described LSP process was applicable to the one or more LSP coefficients that one or more LSP coefficients of described source codec converted to described purpose codec;

Self-adaptation code book process, when the source codec had different subframe size with the purpose codec, described self-adaptation code book process was applicable to pitch lag and the pitch gain that pitch lag and pitch gain from described source codec are transformed into described purpose codec;

The CELP parameter buffer, when the source codec had different subframe size with the purpose codec, described CELP parameter buffer was applicable to the one or more CELP parameters that save as interpolation and need buffering.

8. equipment as claimed in claim 1 is characterized in that, described parameter maps and tuner module comprise:

Parameter maps and tuning tactful handover module, described tactful handover module are applicable to according to a CELP parameter maps of a plurality of policy selection strategy;

Parameter maps and tuning policy module, described mapping and tuning policy module are applicable to the one or more purpose CELP parameters of output.

9. equipment as claimed in claim 8 is characterized in that, a plurality of strategies comprise:

CELP parameter straight space mapping block;

Excitation space domain analyzing module through filtering; And

Analysis module in the excitation space territory.

10. equipment as claimed in claim 8 is characterized in that, described parameter maps and tuning policy module comprise:

LSP coefficient converter, it is encoded to purpose LSP coefficient;

CELP encourages map unit, and it obtains the CELP excitation parameters that comprises pitch lag, gain and excitation vectors from interpolation, to obtain encoded CELP excitation parameters.

11. equipment as claimed in claim 10 is characterized in that, described CELP excitation map unit comprises:

The module of CELP parameter straight space mapping, its use need not the analysis formula of any iteration and produces encoded purpose CELP parameter;

Analysis module in the mapping of excitation space territory, it produces encoded purpose CELP parameter by search excitation space territory;

At the analysis module in the mapping of the excitation space territory of filtering, it originally produces encoded purpose CELP parameter by the self-adapting closed loop and the fixed password in the excitation space of filtering of search in the excitation space territory;

12. equipment as claimed in claim 1, it is characterized in that, described purpose bit stream package module comprises a plurality of frame sealed in units, in the equipment each can both be applicable to the preselected application of application that is used for selecting the purpose celp coder from a plurality of, and selected purpose celp coder is one that comprises in a plurality of celp coders of purpose celp coder.

13. equipment as claimed in claim 1 is characterized in that, described controller comprises:

Control module, it receives external command and each signal processing module of control;

State cell, it is according to asking sending to the outside such as code transitional informations such as frame, counting, error hysteresis.

14. equipment as claimed in claim 1 is characterized in that, can select described interpose module from linear interpolation or non-linear interpolation.

15. equipment as claimed in claim 7 is characterized in that, described CELP parameter buffer comprises:

Excitation vectors impact damper, described excitation vectors impact damper are applicable to storage wait excitation vectors that shine upon, that construct again in next subframe or frame;

LSP coefficient impact damper, its storage are waited for LSP coefficient that shine upon, before or after the interpolation in next subframe or frame;

Other parameter buffer of CELP, its storage are waited for pitch lag, pitch gain, code book gain and index that shine upon, before or after the interpolation in next subframe or frame.

16. a method is used for carry out the code conversion from the source codec to the purpose codec based on the compressed voice bitstream of CELP, described method comprises:

Process source codec input CELP bit stream makes it at least one or a plurality of CELP parameter from input CELP bit stream are removed encapsulation;

If have one or more in a plurality of purpose codecs parameter of the sampling rate comprise frame sign, subframe size and/or purpose codec format and comprise one or more difference in the multiple source codecs parameter of sampling rate of frame sign, subframe size and/or source codec format, one or more the CELP parameter of a plurality of removals encapsulation then from the source codec format to purpose codec format interpolation;

One or more CELP parameters to the purpose codec are encoded; And

The processing intent CELP bit stream by the one or more CELP parameters that encapsulate the purpose codec at least.

17. method as claimed in claim 16 is characterized in that, described process source codec input comprises:

The incoming bit stream frame convert to one or more CELP parameter correlations connection information;

Information decoding is become one or more CELP parameters;

Construct excitation vectors again according at least one or a plurality of CELP parameter;

The CELP parameter is outputed to interpolater.

18. method as claimed in claim 16 is characterized in that, described interpolation comprises:

The one or more LSP coefficients that are inserted into the purpose codec in the one or more LSP coefficients from the source codec;

Other CELP parameter that arrives the purpose codec from other CELP parameter interpolate that is different from the LSP coefficient of source codec; And

If excitation vectors does not need to proofread and correct, this source forcing vector is sent to cataloged procedure.

19. method as claimed in claim 18 further comprises:

Use the linear transformation process to change one or more LSP coefficients.

20. method as claimed in claim 18 further comprises:

Use at least one or multiple source decoding LPC coefficient that source codec excitation vectors is converted to the synthetic speech vector;

Quantize purpose LPC coefficient;

Use the purpose LPC coefficient that quantizes at least that the synthetic speech vector is changed back calibrated excitation vectors; And

Calibrated excitation vectors is sent to another process.

21. method as claimed in claim 16 is characterized in that, described coding comprises:

Quantize purpose LPC coefficient;

According to one of selecting from the control signal of parameter maps and tuning tactful handover module in the CELP mapping policy;

The mapping of CELP parameter straight space;

In the excitation space territory, analyze;

In the excitation space territory of filtering, analyzing.

22. method as claimed in claim 21 is characterized in that, the operation of described CELP parameter straight space mapping comprises operation:

Pitch lag to the pitch lag parameter of the interpolation of hanging oneself is encoded;

Pitch gain to the pitch gain parameter of the interpolation of hanging oneself is encoded;

Fixed password index originally from analytical form is encoded;

Gain to this gain parameter of fixed password is encoded;

23. method as claimed in claim 21 is characterized in that, the analysis operation in the mapping of excitation space territory comprises operation:

From selecting pitch lag as initial value through the pitch lag parameter of interpolation;

Search for pitch lag in the closed loop in excitation space;

In excitation space, search for pitch gain;

The echo signal of this search of structure fixed password;

This index of search fixed password in excitation space;

This gain of search fixed password in excitation space;

Excitation vectors before upgrading.

24. method as claimed in claim 21 is characterized in that, comprises operation at the analysis operation in the mapping of the excitation space territory of filtering:

Search for pitch lag in the closed loop in excitation space;

In excitation space, search for pitch gain;

The echo signal of this search of structure fixed password;

At this index of search fixed password in the excitation space of filtering;

In this gain of search fixed password in the excitation space of filtering;

Excitation vectors before upgrading.

25. method as claimed in claim 21 is characterized in that, described selection is not to be only limited to above-mentioned three kinds of strategies, and the combination that can select three kinds of strategies is as new mapping policy.

26., add the silent frames code conversion unit that to carry out from a kind of voice coding standard to the quick conversion of the silent frames of another kind of voice coding standard as claim 1.

27. as claim 1, still wherein parameter maps and tuner module comprise the speech activity detector that is used to produce silent frames.Voice/quietness that speech activity detector is made it according to the parameter in the CELP space is judged.

28. as claim 1, but add a kind of system, be used to change employed excitation mapping policy, thereby a kind of mechanism be provided, being adapted to available computational resource, and allow under load, to have the quality of appropriateness to reduce.

Need not to be fed back into the voice signal territory 29. carry out the excitation mapping.

30. a method is used for from the source codec to the purpose codec format handling the compressed voice bitstream based on CELP, described method comprises:

Send control signal from an application process from a plurality of control signals;

Select a CELP mapping policy according to control signal at least from a plurality of different CELP mapping policys from described application; And

Use selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.

31. method as claimed in claim 30 is characterized in that, described a plurality of CELP mapping policys comprise:

The mapping of CELP parameter straight space; Or

In the excitation space territory, analyze; Or

In the excitation space territory of filtering, analyzing.

32. method as claimed in claim 30 is characterized in that, CELP mapping policy of described selection is for the predetermined application during setting up procedure and construction process.

33. method as claimed in claim 30 further is included in handover module place and receives control signal, and described handover module is coupled in a plurality of mapping policys each.

34. method as claimed in claim 30 is characterized in that, provides described control signal according to the resource characteristic that calculates selected CELP mapping policy.

35. method as claimed in claim 30 is characterized in that, one or more in a plurality of mapping policys are provided in the thesaurus in storer.

36. method as claimed in claim 31 further comprises one or more CELP parameters of purpose codec is encoded; And

Come processing intent CELP bit stream by the one or more CELP parameters that encapsulate the purpose codec at least.

37. method as claimed in claim 36 further comprises the purpose CELP bit stream through encapsulation is sent to the purpose codec.

38. a system is used for from the source codec to the purpose codec format handling the compressed voice bitstream based on CELP, described system comprises:

One or more codes, being used for always, a plurality of control signals of self-application process receive a control signal;

One or more codes are used for basis and select a CELP mapping policy from the control signal of described application from a plurality of different CELP mapping policys at least; And

One or more codes are used to use selected CELP mapping policy to carry out mapping process, one or more CELP parameters are mapped to one or more CELP parameters of purpose codec format from the source codec format.

39. system as claimed in claim 38 is characterized in that, a plurality of CELP mapping policys comprise:

One or more codes at the mapping of CELP parameter straight space; Or

At one or more codes of in the excitation space territory, analyzing; Or

At at one or more codes of in the excitation space territory of filtering, analyzing.

40. system as claimed in claim 38 is characterized in that, selected CELP mapping policy is used for predetermined.

41. system as claimed in claim 38, further be included in provide in the tactful handover module at the one or more codes that receive control signal, described tactful handover module is coupled in a plurality of mapping policys each.

42. system as claimed in claim 38 is characterized in that, provides described control signal according to the resource characteristic that calculates selected CELP mapping policy.

43. system as claimed in claim 38 is characterized in that, the one or more codes at a plurality of mapping policys are provided in the thesaurus in storer.

44. system as claimed in claim 43 further comprises one or more codes of encoding to one or more CELP parameters of purpose codec; And

At the one or more codes that come processing intent CELP bit stream by the one or more CELP parameters that encapsulate the purpose codec at least.

45. system as claimed in claim 44 further comprises at the one or more codes that purpose CELP bit stream are sent to the purpose codec.

46. system as claimed in claim 44 further comprises at the one or more codes that purpose CELP bit stream are sent to the memory location.