Summary of the invention
One object of the present invention is to provide the network components and the method that can strengthen the digitaling analoging signal quality that is transmitted by digital network satisfactorily on network side.
On the one hand, this purpose is to realize by a kind of network components that is used for improving at least the quality of the digitaling analoging signal that is transmitted by the digital network that has access to network components with the parametrization coding form, this network components comprises: parts are extracted in a pay(useful) load, be used for extracting the encode digitalized analog signal from digital network, these encode digitalized analog signal to small parts comprise parametrization encode digitalized analog signal; First processing unit is used for utilizing the function that is applicable to the quality that improves these digitaling analoging signals to handle the parametrization encode digitalized analog signal of being extracted in parameter field; Second processing unit is used for utilizing the function that is applicable to the quality that improves these digitaling analoging signals to handle the encode digitalized analog signal of extracting to small part in linear domain; The pay(useful) load insertion parts is used for treated encode digitalized analog signal is inserted in the digital network; And analysis and choice device, be used for determining by in parameter field, the encode digitalized analog signal of being extracted being handled and, and be used for making at least the encode digitalized analog signal of handling by processing unit that causes further improving to return being inserted in the digital network by the pay(useful) load insertion parts by the quality improvement of in linear domain, the encode digitalized analog signal of being extracted being handled resulting digitaling analoging signal.
On the other hand, this purpose is to realize that by a kind of method for quality that is used to improve at least the digitaling analoging signal that is transmitted by digital network with the parametrization coding form this method comprises:
Extract the encode digitalized analog signal from digital network, these encode digitalized analog signal to small parts comprise parametrization encode digitalized analog signal;
Determine by in parameter field, the encode digitalized analog signal of being extracted being handled and by in linear domain, the encode digitalized analog signal of being extracted being handled desired quality improvement to the encode digitalized analog signal extracted;
At least thinking by in parameter field, handling under the situation that can obtain bigger quality improvement, utilizing the function that is applicable to the quality that improves digitaling analoging signal in parameter field, to handle the parametrization encode digitalized analog signal of being extracted; And
At least thinking by in linear domain, handling under the situation that can obtain bigger quality improvement, utilizing the function that is applicable to the quality that improves digitaling analoging signal in linear domain, to handle the parametrization encode digitalized analog signal of extracting to small part; And
Be inserted in this digital network to those treated encode digitalized analog signals of major general, these signals in this territory through being considered to cause the processing of bigger quality improvement.
By comprising not only in linear domain but also in parameter field the ability that can handle the encode digitalized analog signal that is transmitted, network components of the present invention and method can be optimized enhancing to the quality of the digitaling analoging signal on the network side.
The analysis of network of the present invention and choice device are that linear domain is handled or parameter field is handled and can be caused digitaling analoging signal to obtain better quality improvement determining to use linear domain to handle and/or the parameter field processing by analysis.Corresponding step is provided in the method for the invention.For example, if the enhancing that parameter field is handled for signal quality is infeasible technically, expect that then linear process can cause better quality to strengthen.If can in parameter field, handle, then determine that at these two kinds of processing desired quality strengthens, and select according to desired enhancing is compared.
Can cause digitaling analoging signal to obtain in the situation that better quality strengthens thinking in parameter field the signal that is extracted handled, the signal that will handle in parameter field at least is inserted in the network once more.The signal that is extracted is handled can cause digitaling analoging signal to obtain better quality strengthening the time thinking in linear domain, the signal that only will handle in linear domain is inserted in the network once more.
Think in parameter field, handle can cause better as a result the time, if the processing in linear domain causes bigger processing delay because of necessary time loss before handling and after handling, then should be just except the signal of in parameter field, handling, the signal that will handle in linear domain is inserted in the network.Like this, can remove necessary this shortcoming of the signal that is extracted being carried out extra decoding and coding before the digitaling analoging signal of processing parameter coding in linear domain from.Signal is not carried out the better quality that extra decoding and coding mean digitaling analoging signal and processing delay is less simultaneously.For example, parametrization encode digitalized analog signal by packet-based network transmission, and the encode digitalized analog signal that transmits in the TFO stream in based on the network of TDM all need be decoded before handling and coding after handling in linear domain, and the encode digitalized analog signal that transmits in the PCM stream in based on the network of TDM only needs a-rule or μ-Lv to carry out linear transformation, and vice versa for linear process.
In the signal of improve selecting to be inserted into once more according to desired quality in the network,, then under any circumstance all can carry out the processing in two territories if handled signal is assessed to determine which kind of processing can expect to cause better result.When the signal of only handling in parameter field will be inserted in the network once more, can be in linear domain finish dealing with before carry out this insertion.In case the signal of handling is ready for the quality improvement of the expectation in future of determining linear process, just use these signals at once then in linear domain.
Can be well understood to the preferred embodiments of the invention in the dependent claims.
The analysis that the analysis of network of the present invention and choice device can be according to the input parameter numeric field datas, for example be used for the parameter that gains is determined and will still be handled in linear domain at parameter field.Perhaps or in addition, can according to the measurement of after decoding, in linear domain, carrying out for example the existence of electrical speech level, signal to noise ratio and echo determine.Preferably, handle before the input data at linear domain with in parameter field and measure afterwards and select.By being compared with the fixed threshold of having advised employing linear domain or parameter field processing, measurement result selects processing domain then.Can handle and the test input data of assessment are really listened and tested the numerical value of deriving threshold value by for example changing in two territories.
Because Several Factors can influence the selection of processing domain, so may be difficult to show the threshold value pattern that can under the all-calls situation, cause optimal selection with formula.Therefore, in a further preferred embodiment, adopt the processing domain of selecting to expect to bring better result based on neural network method.Can be used as the input of N neuronic neural net after the decoding from the input parameter numeric field data of measurement result and result.Can be by utilizing suitable test data and listening the output of test to train (training) network to derive neuronic weight or coefficient from true.
Be used for the processing unit of handling at parameter field and can comprise various functions with the processing unit that is used for handling at linear domain.For in parameter field and the processing at linear domain, echo elimination, noise reduction and level controlled function all are fine.In addition, change sign indicating number and the voice mixing as the meeting bridge separator is function possible in the processing of parameter field at least.
For example, for the gain controlling in the parameter field, the gain parameter of the parametrization encode digitalized analog signal of being extracted can compare with the required gain that is used to form corresponding new gain parameter.Desired gain parameter can be preset, by user input or calculate by the gain parameter that is received.Then the new gain parameter is inserted in the parametrization encode digitalized analog signal of being extracted, has replaced the original gain parameter thus.
In order to realize noise control by in parameter field, handling, in time domain, handle or in frequency domain, handle, preferably in two territories, all handle.In time domain, the noise section and the low level signal of the parametrization encode digitalized analog signal of being extracted partly are attenuated, and corresponding gain parameter is inserted in the parametrization encode digitalized analog signal of being extracted, and have replaced the original gain parameter thus.In frequency domain, the noise frequency that has a roughly the same energy with noise estimation value in the parametrization encode digitalized analog signal of being extracted partly is attenuated.Then the corresponding linear Prediction Parameters is inserted in the parametrization encode digitalized analog signal of being extracted, has replaced original linear forecasting parameter thus.
Suppress for the echo in parameter field, from two direction of transfer extracting parameter encode digitalized analog signals.Can compare signal then, to detect the echo in the first parametrization encode digitalized analog signal.If in first parametrization encode digitalized analog signal part, define echo, then the part of the first parametrization encode digitalized analog signal replaced with comfortable (comfort) noise section.The echo-signal that also can at first decay suppresses the residual echo signal then.If the signal level of the parametrization encode digitalized analog signal that does not in the opposite direction have activity or in the opposite direction extracted is lower than threshold value, then suggestion can the bypass first parametrization encode digitalized analog signal and do not carry out echo cancellation.
In a preferred embodiment of the invention, in network components, comprise bad frame processor parts.These parts can extract parts with pay(useful) load and processing unit is worked together, be used for for example detecting the disappearance frame from RTP (real-time protocol (RTP)) number number, for example be used for by use interpolation method or duplicate before the frame piece of disappearance of regenerating, and be used for unordered frame being resequenced at buffer window.The appropriate location of bad frame processor parts is and then after parts are extracted in pay(useful) load.
In another preferred embodiment of the present invention, network components comprises analytical equipment, be used to determine whether that the parametrization encode digitalized analog signal to being extracted adopts any processing, and be used for being chosen in the function that parameter field and/or linear domain will adopt the encode digitalized analog signal of being extracted.Those functions can be included in and be used for determining in the analysis and choice device of the processing of parameter field and/or the improvement of the processing desired quality in the linear domain.
Will carry out under the situation of any processing thinking it unnecessary, the encode digitalized analog signal can be passed through one or two processing unit simply, and does not carry out any processing.
By analyzing the encode digitalized analog signal that is received and may can independently selecting by analytical equipment by analyzing treated signal.Perhaps or in addition, this selection can depend on external control signal.Even adopt external control signal and do not ask to carry out any processing, the parametrization encode digitalized analog signal that analytical equipment also can be estimated to be received is for example about the existence of speech level, echo, the quality of signal to noise ratio, and selects one or several processing capacity.External control signal can enter network components by the control assembly in network components, the H.248 agreement that this control assembly is up to specification, and represent for example in the connection echo eliminator has been arranged, therefore can under the situation of not eliminating echo, send the parametrization encode digitalized analog signal that is received by processing unit.Control assembly can directly be linked into processing unit, is used to select the processing capacity by its execution.
Selection to the optimal function that will adopt also is the preferred feature of the inventive method.
Related digital network can be based on for example IP-, UDP-(User Datagram Protoco (UDP)) or RTP-(real-time protocol (RTP)) network of grouping, perhaps based on the network of TDM.Also can insert any other passing a parameter encode digitalized Analog signals'digital network.When being referred to as IP network in this manual, it comprises any IP-, UDP-or RTP-network.
In packet-based network, digitaling analoging signal only transmits as parametrization encode digitalized analog signal.In the network based on TDM that for example adopts at GSM, in TFO stream and simultaneously G.711PCM a rule of sampling or μ restrain and transmit the digitaling analoging signal that digitaling analoging signal can be used as parametrization coding according to coding in PCM (pulse code modulation) stream.
Therefore, preferred a replacement in the embodiment, pay(useful) load is extracted parts and is suitable for extracting parameter encode digitalized analog signal the IP stack (IP stack) from packet-based network, and the pay(useful) load insertion parts is suitable for parametrization encode digitalized analog signal is inserted in the described IP stack of packet-based network.
Preferably replace embodiment at another, pay(useful) load is extracted parts and be suitable for extracting TFO stream from the time slot of TDM network, in case of necessity, also extracts PCM stream.In the latter case, in pay(useful) load extraction frame, these two flow points are opened so that further handle, the TFO stream that will be provided is provided the pay(useful) load insertion parts and the PCM stream that is provided merges once more, and the stream that will merge is inserted in the described network based on TDM.If but this pay(useful) load insertion device only provides PCM stream, it can only flow back to this PCM once more and be inserted in the described TDM network.
In GSM-PCM, pay(useful) load is extracted parts and can only be adopted TFO stream as input, perhaps adopts TFO stream and PCM stream as input, extracts in the parts in pay(useful) load then these two flow points are opened.
The TFO that is extracted that is inserted into once more in the digital network flows by be subjected to processing decoding before the linear process and encoding in parameter field or in linear domain after linear process.Insert any TFO stream and should depend on the quality improvement of included digitaling analoging signal being realized or can realize.In addition, the TFO stream of handling after decoding in linear domain should be under the situation of formerly not encoding and be converted into PCM stream, and this PCM stream flows merging so that be inserted in the digital network with selected coding TFO.But, can not obtain TFO stream at pay(useful) load extraction element place or under the situation that TFO stream stops, can in linear domain, extract and handle PCM stream, and output in the digital network by pay(useful) load insertion device self.
Perhaps, can in parameter field, handle TFO stream, simultaneously can be to handling in linear domain for the PCM stream that linear process is decoded.Can cause comparing that PCM stream is handled better result and in the situation that TFO stream is handled, when not handling, TFO stream just not necessarily is included in the data that are inserted into once more in the network only thinking TFO stream handled.
Network components of the present invention can at random be arranged on the next door or the inside of any other network components.In packet-based network, network components of the present invention preferably with the broadband IP node ground that coexists, can cause minimum processing delay like this.
Network components of the present invention and method can be used to strengthen with the quality of parametrization coding form by any digitaling analoging signal of digital network transmission.Be particularly suitable for transmitting voice, but also for example be used for video.
Specific embodiments
Fig. 1 demonstrates the environment according to network components 1 of the present invention.
First terminal 2 is connected with second terminal 3 by IP network.These two terminals 2,3 can be IP phone.Some positions in IP network have the ip router that forms broadband IP node 4.According to network components according to the present invention 1 with this network node 4 coexistence one ground and be attached thereto.
Network components 1 is operated in the speech parameter territory and can be carried out signal processing function to the parametrization encoded voice.Feasible function has echo elimination, noise reduction, gain controlling, meeting bridge to divide and bad frame is handled.Describe with reference to Fig. 5 to 8 pair of some possible scheme that is used to realize these functions below.
The parametrization encoded voice is from first terminal, 2 leading to network nodes 4.They are forwarded to network components 1 from network node 4, this network components 1 is carried out appropriate functional in the speech parameter territory.Then, treated parametrization encoded voice is sent back to network node 4, their i.e. second terminals 3 in destination of being transmitted to them of this network node.
Fig. 2 demonstrates the different elements that is comprised in the embodiment of the network components of Fig. 1.
Pay(useful) load extraction parts 20 and pay(useful) load insertion parts 21 form the interface of network components 1 and network node 4 together.In network components 1, pay(useful) load parts 20 are connected with selector part 23 with analyzer by bad frame processor parts 22.Two outputs of analyzer and selector part 23 are connected with first processing unit 24 on the one hand, and are connected with second processing unit 26 by tone decoding parts 25 on the other hand.In the processing unit 24,26 each all comprises the function of echo elimination, noise reduction and level control.The output of first processing unit 24 is connected with the input of selector 27.The output of second processing unit 26 is connected with the input of selector 27 equally, but connects by speech coding parts 28.The output signal of selector 27 is transfused to pay(useful) load insertion parts 21.At last, have a control assembly 29, agreement control assembly H.248 for example, this control assembly are received in control signal that network components 1 outside produces as input, and its output is connected with selector part 23 with analyzer.
Network components 1 its effect is as follows:
Pay(useful) load extraction parts 20 extract pay(useful) load from the IP stack of the network node 4 of Fig. 1 be the parametrization encoded voice.Detect these speech parameters by bad frame processor parts 22.Here, by adopting the interpolation method technology to survey and the disappearance frame of regenerating.And, in buffer window, make unordered frame rearrangement.Give analyzer and selector part 23 with treated signal forwarding then.
Analyzer and selector part 23 these speech parameters are analyzed and determine in linear domain processing still the processing in parameter field can cause better result and should use any in these available functions.If it is not feasible technically that parameter field is handled, then select linear process for voice strengthen.Analyzer and selector part 23 can also determine not need to carry out any processing.Analyzer and selector part 23 pass through control assembly 29 and receive other external information, for example are illustrated in whether echo eliminator has been arranged on the coupling part, eliminate thereby needn't carry out echo in addition.
If be chosen in and do not carry out any processing in the parameter field or handle, then analyzer and selector part 23 are to first processing unit, 24 output encoder voice, and this is applied in the function of all selections on the parametrization encoded voice in parameter field.
If think and must handle in linear domain, then analyzer and selector part 23 are to tone decoding parts 25 output parameter encoded voices.These tone decoding parts 25 decode encoded voice (this may be applicable to GSM FR (full rate)) to form linear signal.Then linear voice signal is inputed to second processing unit 26, this device function that all are selected in linear domain is applied on the linear voice signal.After handling, linear voice signal is inputed to speech coding parts 28, these parts are encoded to form the parametrization encoded voice that is applicable to GSM FR once more to linear voice signal.
Selector 27 receives the output signal of the speech coding parts 28 and first processing unit 24, and is subjected to the control of analyzer and selector part 23.Therefore, selector 27 can determine whether constitute treated encoded voice from the signal of first processing unit 24 or from the signal of speech coding parts 28, and can give pay(useful) load insertion parts 21 with corresponding signal forwarding.And selector 27 can be by providing the work of supporting analyzer and selector part 23 about the information of treated signal.
In the pay(useful) load insertion parts, the parametrization encoded voice is returned the IP stack that is inserted into network node 4 as pay(useful) load, therefrom it is transmitted to its destination 3.
In a word, can realize the enhancing of voice quality, simultaneously decoding and the coding that only adds where necessary.Therefore avoided the excessive descent of voice quality, and made processing delay keep less by the processing in parameter field.The ground because network components 1 and broadband IP node coexist is so can make processing delay drop to minimum.
Fig. 3 schematically demonstrates another embodiment of network components of the present invention.This embodiment is similar to first embodiment of network components, but the present embodiment is used for to from handling in the encoded voice parameter that is used for GSM TFO that receives based on the node the network of TDM.
The same with the network components of Fig. 2, the network components of Fig. 3 comprises H.248 control assembly 39 of pay(useful) load extraction parts 30, a bad frame processor 32, an analyzer and selector part 33, decoding parts 35, first and second processing unit 34,36, an addressable part 38, a pay(useful) load insertion parts 31 and.These two processing unit 34,36 also are used for carrying out echo elimination, noise reduction and level control.These elements are to interconnect with mode identical in Fig. 2.But different with the network components of Fig. 2 is that replacement selector part 27 is combined in one second analyzer and selector part 37 between addressable part 38 and the pay(useful) load insertion parts 31.And the output of second processing unit 36 not only is connected with addressable part 38, but also directly is connected with pay(useful) load insertion parts 31.
Its effect of the network components of this second embodiment is as follows:
Enter signal that pay(useful) load extracts parts 30 from network node and its highest significant position, comprise 48 or the G.711PCM stream of 56kbps, and in least significant bit, comprise 16 or the GSM TFO encoded voice parameter of 8kbps.Extract in the parts 30 in pay(useful) load, TFO stream is opened with the PCM flow point.Only TFO stream is sent to bad frame processor parts 32, there as at the processing of in the embodiment of Fig. 2 the parametrization encoded voice being carried out is described, it being handled.
After bad frame is handled, TFO is inputed to analyzer and selector part 33.This analyzer and selector part 33 send to first processing unit 34 with TFO stream on the one hand, in parameter field this TFO stream are handled there.On the other hand, this analyzer and selector part 33 send to decoding device 35 with TFO stream, carry out tone decoding there, and for example still GMS FR carries out linear codec.Then decoded TFO stream is inputed to second processing unit 36, in linear domain, it is handled there.For these two processing unit 34,36, come to select the function that to use at first analyzer and selector installation 33 according to the external control signal that enters network components by control assembly 39.
The output signal of first processing unit 34 is fed to analyzer and selector part 37.Once more by speech coding, for example linear GSM FR's output signal of second processing unit 36 encodes, and is fed to second analyzer and selector part 37 equally in decoding device.
First analyzer and selector part 33 and second analyzer and selector part 37 1 work and cause better tonequality so that determine i.e. processing in parameter field of the sort of processing or the processing in linear domain.
Handle in the parameter of determining TFO stream under the situation of the better tonequality of linear process that causes ratio decoder TFO stream, just will send to pay(useful) load from the TFO stream of first processing unit 34 by second analyzer and selector part 37 and insert device 31.Cause handling under the situation of better tonequality in the linear process of determining decoding TFO stream than the parameter of TFO stream, by second analyzer and selector part 37 just in the future the TFO stream of own coding parts 38 send to pay(useful) load and insert device 31.
These two kinds of approach can carry out work always, are variation between pure linear process and the parallel processing thereby can carry out at different mode in the internal state in decoding device 25 and the code device 28 without interruption.
The output of second processing unit 36 is directly sent to pay(useful) load under without any the situation of coding in addition and insert device 31.Insert in the device 31 in pay(useful) load, PCM stream is formed by the TFO stream through decoding and linear process.Then with this PCM stream with select the TFO stream of coding to merge and return the network that is inserted into based on TDM so that further transmit.
Therefore, improved the voice quality of the digitaling analoging signal in output PCM stream by linear process, and which kind of processing can cause better result to come to have improved the voice quality of the digitaling analoging signal in output TFO stream according to by the processing in parameter field or in linear domain.
If in the signal that extracts by pay(useful) load extraction element 30, flow without any available TFO, if perhaps TFO stream stops, then providing a kind of scheme to come the bad frame processor 32 by being used for carrying out the processing relevant and guide PCM stream by second processing unit of in linear domain, handling 36 with frame.Dispensable by the decoding parts, because PCM stream does not comprise supplemental characteristic.But, it should be noted that the linear process that stream is G.711PCM carried out needs a rule or μ to restrain to carry out linear transformation and vice versa.Inserting device 31 by pay(useful) load then is inserted into treated PCM stream in the digital network once more.
Fig. 4 schematically demonstrates the 3rd embodiment of network components of the present invention, and it is configured for strengthening second possibility based on the voice quality in the network of TDM that uses for GSM TFO.
In this embodiment, pay(useful) load extraction parts 40 directly are connected with first and second processing unit 44,46 by bad frame processor parts 42.These two processing unit 44,46 also comprise the function of echo elimination, noise reduction and level control.Also have, the output of first and second processing unit 44,46 just directly is connected with the input of pay(useful) load insertion parts 41.Also be provided with H.248 agreement control assembly 49.
Its effect of the network components of the 3rd embodiment is as follows:
The PCM stream that enters pay(useful) load extraction parts 40 from network node extracts parts 40 separately by pay(useful) load with TFO stream as in the embodiment of Fig. 3.But in this embodiment, TFO stream all is sent to bad frame processor parts 42 with two of PCM streams and handles as described in reference Fig. 2 there.
After bad frame is handled, TFO is sent to first processing unit 44, in parameter field, it is handled there.Simultaneously, the PCM sampling is sent to second processing unit 46.Because in this embodiment, has only the processing of the processing unit 46 that the PCM sampling is subjected to working in linear domain, so just do not need the parts of decoding; PCM stream does not comprise parameterized data as described at the embodiment of Fig. 3.In these two processing unit 44,46, by control assembly 49 function that selection will be used according to external control signal of network components.
Therefore, simultaneously TFO stream and PCM stream are carried out the voice enhancing individually.In any situation, the encoded voice in TFO stream does not have decoded to handle once more and to encode.
Leave the TFO stream of processing unit 44,46 and PCM stream in pay(useful) load insertion parts 41, merge and returned be inserted into based in the network of TDM so that further transmit.Can determine to use in these streams which to obtain best tonequality in some other positions of this network.
Processing delay that can enough minimums according in three embodiments of network components of the present invention each improves the parameterised speech on network side or the quality of video.Can at random they be arranged on outside any existing network components or within.
Difference now with reference to the processing in the parameter field in first processing unit 24,34,44 of Fig. 5 to 8 pair of arbitrary width of cloth in Fig. 2 to 4 may scheme describe.
Fig. 5 demonstrate can be combined in according in first processing unit of network components of the present invention so that in parameter field, carry out the block diagram of the gain control of gain controlling.Incoming line is connected with the input of decoder 50 on the one hand, and is connected with the first input end of gain parameter re-quantization parts 53 on the other hand.Decoder 50 also directly and by speech level estimation section 51 is connected with linearity-parameter field mapping means 52.The output of linearity-parameter field mapping means 52 is connected with second input of gain parameter re-quantization parts 53, and this second input also is connected with output line in addition.
The encoded speech frames of input is sent to decoder 50, and encoded voice is linearized before being fed to speech level estimation section 51 there.This speech level estimation section 51 comprises an internal speech activity detector (VAD), is used to refer to level and estimates whether obtained upgrading, and only estimates speech level because require in speech level is estimated.
In speech level estimation section 51, according to estimating that speech level and predetermined desired target speech level calculate desired gain values.Desired gain is flowed to the first input end of linearity-parameter field mapping means 52.
Just control needs voice estimation section 51 for automatic electric-level.In the situation that will use fixing gain controlling, may be the enhancing that the user can be provided with, can remove decoder 50 and voice estimation section 51.
In addition, also will be for example 20ms the current speech frame or for example the decoding gain parameter of the subframe of 5ms flow to linearity-parameter field mapping means 52, these decoding gain parameters are directly from decoder 50.The excitation gain parameter of these decoding normally Code Excited Linear Prediction of gain parameter (CELP) speech coders.These gain parameters are made of self adaptation and fixing close (codebook) gain usually, and these gains are subjected to vector quantization so that transmit.Can obtain the scalar numeric value of these parameters from the bosom numerical value of decoder 50.
In linearity-parameter field mapping means 52, the gain values that linearity is desired converts the suitable new gain parameter of speech coder to.Be used for determining that based on close mapping these new gain parameters of present frame or subframe are so that realize desired gain.Close originally is a kind of three-dimensional table, and wherein close gain of self adaptation, fixed codebook gain and linear gain numerical value form each dimension.In case learn all input values of this frame or subframe, just can from this form, read the new gain parameter values.In advance this form is trained by this way, thereby reduced error between the gain parameter numerical value of the gain of new gain parameter values and the linear gain numerical value that each is desired calibration coded frame.Perhaps, can train this mapping table by the error that reduces to decode between re-quantization speech frame and the decoding gain calibration speech frame.The several cycle testss of this training need are so that all elements of comprehensive training in this mapping table.
In actual applications, can by utilize redundancy in data, by limiting linear gain values or suppressing the size of this form by the step-length that increases input value.Another selection is to find out a kind of mathematical function, this function with its performance subjective be that acceptable manner is similar to this mapping function.
At last, these new gain values are quantized again so that transmit, and in gain parameter re-quantization parts 53, replace original gain values with new numerical value.
Fig. 6 demonstrates the block diagram of Noise Suppression Device, and this device can be combined in first processing unit of network components of the present invention so that carry out noise suppressed in parameter field.
Incoming line also is connected with the input of decoder 60 on the one hand, and is connected with the first input end of gain parameter re-quantization parts 63 on the other hand.First output of decoder 60 is connected with the parts 67 that are used for definite noise attentuation parameter with frequency spectrum calculating unit 65 with the short term signal level by speech level estimation section 61, VAD66, noise level and spectrum estimation parts 64.And the output of VAD66 is connected with the input of speech level estimation section 61 and the input of noise level and spectrum estimation parts 64.
First output that is used for the parts 67 of definite noise attentuation parameter is connected with the first input end of frequency spectrum-LP (linear prediction) mapping means 68, and its second output is connected with the first input end of linearity-parameter field mapping means 62.
Second output of decoder 60 is connected with another input of noise level and spectrum estimation parts 64, and also is connected with second input of frequency spectrum-LP mapping means 68 in addition.The 3rd output of decoder 60 is connected with second input of linearity-parameter field mapping means 62.
The output of linearity-parameter field mapping means 62 is connected with second input of gain parameter re-quantization parts 63, and its output is connected with the first input end of LP parameter quantization unit spare again 69 again.Second input of these parts 69 is connected with the output of frequency spectrum-LP mapping means 68.
At last, the LP parameter again the output of quantization unit spare 69 be connected with output line.
Decoder 60, speech level estimation section 61, linearity-parameter field gain map parts 62 and gain parameter re-quantization parts 63 can be identical or quite similar with the corresponding component 50-53 of the embodiment of Fig. 5.
In the embodiment of Fig. 6, can handle by time domain or frequency domain parameter and realize noise suppressed.Obviously, by can realize optimum performance in conjunction with these two kinds of methods.
Time domain is handled and is based on a kind of dynamic process, and wherein utilization decays noise section and the low-down phonological component of level with the corresponding parts 60-63 of parts 50-53 of Fig. 5 by gain control function slightly.Therefore equally carry out gain controlling as mentioned above, only use those parts 67 to estimate to send to linearity-parameter field mapping means 62 by the speech level that parts 61 receive.This can be understood as the expanded function in parameter field.
In the frequency domain noise suppressed, its energy is partly decayed greater than the frequency of voice.By convention, at first by utilize Fourier transform or bank of filters with the linear time conversion of signals to frequency domain.Then, spectral subtraction can be applied on the frequency-region signal.The amount that deducts depends on Noise Estimation, signal to noise ratio and other possible parameter.At last, the conversion of signals of noise reduction is returned time domain.But, handles linear prediction (LP) the frequency spectrum envelope of speech frame shaping again in this embodiment, by being carried out frequency domain.Below will explain in more detail this.
In order to realize high-quality noise suppressed, must simulate Noise Estimation accurately.In order to distinguish voice and speech pause, adopt speech activity detector 66, this detector is exported voice sign " truly " when detecting voice, output voice signs " falseness " when detecting speech pause.Speech activity detector 66 should be high-quality, even VAD determines also can obtain accurately under the low signal-to-noise ratio condition, otherwise voice and Noise Estimation will be separated from each other.Basically,, in speech level estimation section 61, upgrade speech level and estimate,, in noise level and spectrum estimation parts 64, upgrade noise level and spectrum estimation when voice sign when being false when voice sign when being real.
In parts 64, estimate long-term noise level and frequency spectrum.For long-term noise spectrum is estimated, need in decoder 60, from the speech frame that is received, linear predictor coefficient (LPC) be decoded.The LP coefficient is converted into linear spectral to (LSP) by the encoder that is adopted for coding usually.Under the sort of situation, can obtain the LPC value from the bosom value of decoder 60.Because the LP coefficient only defines the frequency spectrum envelope, need the noise level estimated value to calibrate LP spectrum envelope, to form the power spectrum estimation of noise.Perhaps, can calibrate LP frequency spectrum envelope by the excitation gain parameter of utilizing the frame that is received.As previously described, only when the VAD sign is false, upgrade Noise Estimation.
Calculate the short term signal level and the frequency spectrum of the frame that is received in the mode identical in the calculating unit 65 in short term signal level and spectrum with aforementioned manner, but to level calculate do not adopt before frame average or average fast.Usually, do not adopt the VAD decision.
The main intelligence of this algorithm is to be used for determining the parts 67 of noise attentuation parameter.In these parts 67, the long-term noise spectrum that is received according to parts 64 is estimated and is selected frequency domain noise attentuation parameter (being required frequency spectrum shaping) by the short term signal frequency spectrum that parts 65 are received.Accordingly, required time domain gain is based on long-term speech and noise and short term signal level.In addition, adopt the VAD information that receives by VAD66 and by the extraneous information of the long-term signal to noise ratio that calculates voice that receive from parts 61 and parts 64 and the noise level estimated value, so that determine the noise attentuation parameter as the algorithm of parts 67.
In the frequency spectrum shaping in parts 67, long-term noise spectrum estimated value and short term signal frequency spectrum are compared.Come the target frame frequency spectrum is carried out shaping according to the mode that makes the short-term spectrum part slight fading that is in close proximity to long-term frequency spectrum.On the other hand, those parts apparently higher than long-term frequency spectrum remain unchanged, because those parts contain voice messaging probably.In addition, in frequency shaping, can utilize the frequency and the temporary transient shielding of people's auditory system.This means if some part of frequency spectrum is within the audible frequencies shielding curve, those parts are not needed frequency shaping.If before one or more frames contained can for present frame introduce the higher speech level of temporary transient shield effectiveness than low level signal, then in temporary transient shielding, do not need frequency shaping (perhaps time domain processing) for present frame.Utilize these rules, cause less distortion can for the voice of handling, because the shaping of being carried out is less.
In addition, can control the shaping of frequency spectrum in the following manner,, just adopt less frequency spectrum shaping if promptly detect speech pause by the VAD sign.Mainly realize noise attentuation then by in the speech pause process, carrying out gain process by parts 60-63.In addition, the short term signal level also can be controlled the shaping amount.That is, because noise attentuation partly utilizes gain process to handle, so shaping is less in the situation of low level frame.At last, the amount of frequency spectrum shaping can depend on long-term signal to noise ratio (snr) in the following manner, promptly adopts less shaping when high SNR, to keep high-quality under the voice condition of making an uproar in nothing.
In case calculate the required frequency spectrum shaping of present frame, original LP coefficient must be changed according to required frequency spectrum.These carry out in frequency spectrum-LP mapping means 68.By utilizing original LPC and required frequency spectrum, can realize mapping once more in the mode of close mapping as input parameter.Perhaps, can from required frequency spectrum, directly calculate the LP coefficient that makes new advances by this spectrum being converted to LP spectrum envelope and being converted into the LP coefficient thus.
At last, in LP parameter quantization unit spare again 69, new LPC parameter is quantized or is converted into the LSP parameter, and replaces old parameter with new parameter in coded frame.
As previously mentioned, the signal dynamics expanded function can together be used with frequency spectrum shaping, perhaps or even separately uses.If use separately, only allow to expand slightly, because it may cause the noise modulated effect.Generally in expansion, signal level is low more, then adopts many more decay.Estimate to control in the following manner expanded threshold value by noise level, promptly unattenuated frame or the subframe that surpasses the noise level estimated value.In addition, VAD66 can be that speech frame just adopts the mode of less slightly expansion to control expansion according to needing only present frame.Thus, can reduce low level voice relaxation phenomenon.
In case found the desired linear gain of present frame or subframe, can control as reference gain and in parts 62 and 63, carry out linearity-parameter field mapping and gain parameter re-quantization described.As a result, improved gain and LPC parameter and other speech parameter one coexist to transmit on the media and transmit.
Fig. 7 represents to be combined in the block diagram of the echo trap in first processing unit of network components of the present invention, is used for carrying out echo at parameter field and suppresses.
First incoming line is connected to first decoder, 70, the second incoming lines and is connected to second decoder 71, and these two decoders 70,71 are connected to echo analysis parts 72.The output of first decoder 70 is connected to a joint of transducer 76 by Noise Estimation parts 73, comfort noise production part 74 and encoder 75.Transducer 76 can form connection between encoder 75 and output line, perhaps form to connect between first incoming line and output line.Echo analysis parts 72 have a control access of leading to this transducer 76.
In order to determine whether comprise echo and can suppress or eliminate this echo, must analyze signal from two direction of transfers from the signal that near-end transmits to far-end.Therefore, adopt two decoders 70,71 " to send into " signal of signal and to distinguish linearisation from the conduct of near-end (point that echo reflection is returned) from the signal of conduct " receptions " signal of far-end.It is easier and more accurately to carry out echo analysis be in linear domain.In echo analysis parts 72, estimate the signal level of two linearizing signals.If the level ratio of near-end and remote signaling is lower than threshold value, then near end signal is considered to echo, and comfort noise is inserted into will be sent in the signal of far-end as " sending " signal.If acoustic echo is arranged, can estimate adopt special filtration to remote signaling, to improve double talking (double talk) performance that echo suppresses, for example as described in the WO9749196.In order to obtain correct result relatively, must know echo path delay from signal.If this delay is variable, may need to postpone to estimate to limit correct length of delay.Estimate to adopt cross-correlation for postponing.
In Noise Estimation parts 73, form the correct Noise Estimation of the linearisation near end signal that receives from first decoder 70.Preferably, carrying out background noise in the level territory with in frequency domain estimates.Method of estimation can be identical with the method described in the noise suppressed.Equally, can use additive method, for example based on the method for bank of filters or Fourier transform.
In comfort noise production part 74, generate comfort noise then by the noise estimation value of utilizing reception from Noise Estimation parts 73.In order to produce comfort noise, carry the white noise of level calibration by the composite filter that in fact has envelope (envelope) frequency spectrum that is equal to Noise Estimation parts 73.Therefore composite filter can be LP filter or bank of filters.
At last, the comfort noise that is produced comprises the frame or the subframe of the comfortable noise parameter of encoding by encoder 75 codings with formation.
If send into frame or subframe shows echo by echo analysis parts 72 for current, then transducer 76 is switched to encoder 75 is connected with output line, and use the coding comfortable noise parameter that is produced to replace present frame or subframe by echo analysis parts 72.If do not show echo, transducer 76 keeps connecting, and is perhaps switched to by echo analysis parts 72 first incoming line is connected with output line, thereby primitive frame or subframe are sent to output line and is not replaced.
By using said method, in voice and comfort noise frame, can avoid tandem type (tandem) speech coding, thereby high-quality speech can be provided.
Perhaps, and handle and memory resource, can come the bypass voice encoder by directly in parameter field, producing comfort noise in order to save.In the parameter field comfort noise produces, as described, make the long-term LP frequency spectrum envelope equalization of background noise with reference to figure 6.In addition, utilize with LP frequency spectrum envelope and upgrade same more new principle,, the long-term incentive gain parameter is averaged if just the VAD sign is the false principle of just upgrading.Usually it is average to have only close fixing yield value to need, because if there is not the signal of noise type, adaptive close yield value approaches 0.Because comfort noise frame or subframe need be sent to far-end, so replace original LPC and excitation gain parameter with LPC and gain parameter through equalization.In addition, replace the interior original driving pulse of frame with the random pulses of white noise in the representation parameter territory.If adopt discontinuous transmission (DTX) sending into direction, then needn't transmit driving pulse.On the contrary, be that most of encoding and decoding speechs (codecs) and standardized silence description frames (silence description frame) only transmit LPC and the gain parameter through equalization in (SID).In discontinuous transmission, produce the arbitrary excitation pulse in decoder end.
Fig. 8 demonstrates the block diagram that is used for carrying out the echo eliminator that echo eliminates in first processing unit that can be combined in network components of the present invention in parameter field.
First incoming line is connected directly to first decoder, 80, the second incoming lines and is connected to second decoder 81 by FIFO (first in first out) frame memory 87, and these two decoders 80,81 are connected to sef-adapting filter 82.Sef-adapting filter 82 is connected to NLP and comfort noise production part 84, the first decoders 80 are connected to these parts 84 by Noise Estimation parts 83 second input.The output of NLP and comfort noise production part 84 is connected to transducer 86 by encoder 85.Transducer 86 can form connection between encoder 85 and output line, perhaps form to connect between first incoming line and output line.The output of first decoder 80, second decoder 81 and adaptive filter 82 is connected to the input of control logic 88 in addition.Control logic 88 has the control access of leading to sef-adapting filter 82, NLP and comfort noise production part 84 and transducer 86.
The echo that is proposed is eliminated and very is similar to above-mentioned echo inhibition.Comprise sef-adapting filter 82 and control logic 88, be used for before using the residual echo suppression function, reducing echo-signal by nonlinear processor (NLP) 84.For linear adaption filtering, must carry out linearisation by 80,81 pairs of signals of local decoder from both direction.Because two speech codings are arranged for return echo signal, the nonlinear distortion of accumulation has obviously reduced the linear adaption filtration efficiency.Therefore, it is desirable to, in echo is eliminated, comprise the nonlinear echoes simulation, for example as described in the WO9960720.In addition, the delay that is incorporated in the echo path by speech coding, transmission or other signal processing can be compensated by FIFO frame memory unit 87.Therefore can reduce the tap amount of sef-adapting filter 82, and need processing capacity still less.
The function of Noise Estimation parts 83 and NLP and comfort noise production part 84 can be similar with above-mentioned noise suppressed, certainly the control of NLP84 can be different, because in NLP decision, can adopt more parameter, for example echo path model, the echo attenutation that is realized, send into, reception and residual echo signal.This handles in control logic unit 88.The output of NLP and comfort noise production part 84 is encoded by encoder 85.
Transducer 86 setting is used for switching between the coding output of sending into speech frame that the port place receives and NLP/ comfort noise parts, that is to say, the output of sending port be bypass send into frame (or subframe) or echo is eliminated frame (or subframe).The standard of selecting is as follows.
If do not have the signal level of speech activity or far-end enough low, then send into frame by bypass.Otherwise the output of selecting NLP/ comfort noise parts 84 is as the output after being encoded by encoder 85.Therefore, if, then make TFO keep motionless if having only near-end conversation or be noiseless at both direction.If have only the far-end conversation, then insert the coding comfort noise.If overlapping conversation situation is arranged, the output of then selecting comfort noise or adaptive filter 82 is as sending signal.This depends on the state of NLP84, and changes in the double talking process usually.The benefit of this method is, in most cases has the tandem-free operation near end signal.Deliver to moment in the moment of distal direction at the concatenated coding frame, utilize the double talking right and wrong activity of NLP parts 84.But, compare with the conventional echo elimination, this subjective be not more bothersome, on near-end speech, introduced some human factors (artefact) because NLP switches, and because direct voice covers the audibility that the sidetone of (masking) and far-end has reduced NLP human factor in the double talking process.
Perhaps, handle and memory resource,, can save encoder like this by as with reference to the figure 7 described comfort noises that directly in parameter field, produce in order to save.