CN1327410C - Method and apparatus for transcoding between different speech encoding/decoding systems and recording medium - Google Patents

Method and apparatus for transcoding between different speech encoding/decoding systems and recording medium Download PDF

Info

Publication number
CN1327410C
CN1327410C CNB038176750A CN03817675A CN1327410C CN 1327410 C CN1327410 C CN 1327410C CN B038176750 A CNB038176750 A CN B038176750A CN 03817675 A CN03817675 A CN 03817675A CN 1327410 C CN1327410 C CN 1327410C
Authority
CN
China
Prior art keywords
gain
code
acb
circuit
fcb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB038176750A
Other languages
Chinese (zh)
Other versions
CN1672192A (en
Inventor
村岛淳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of CN1672192A publication Critical patent/CN1672192A/en
Application granted granted Critical
Publication of CN1327410C publication Critical patent/CN1327410C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/083Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A code conversion device for converting a first code string based on a first method into a second code string based on a second method. An voice decoding circuit (1500) acquires information of a first linear prediction coefficient and an excitation signal from the first code string and drives a filter having the first linear prediction coefficient by an excitation signal obtained from the excitation signal information, thereby generating a first voice signal. A gain code generation circuit (1400) calculates a gain (optimal gain) at which the distance between a second voice signal generated by information obtained from the second code string and the first voice signal is minimum and corrects the optimal gain, so that gain information in the second code string is obtained according to the optimal gain which has been corrected (corrected optimal gain), the aforementioned optimal gain, and a gain read out from a gain codebook in the second method. Here, according to a voice judgment value, at a non-voice section, a gain is obtained by using an evaluation function which reduces the gain time fluctuation of the second code string.

Description

Method and device for code conversion between voice coding and decoding methods
Technical Field
The present invention relates to an encoding and decoding method for transmitting or storing a speech signal at a low bit rate, and more particularly, to a code conversion method and apparatus for converting a code obtained by encoding a speech signal using a certain method into a code that can be decoded by another method with high sound quality and a low amount of computation when performing speech communication using different encoding and decoding methods, and a storage medium therefor.
Background
As a method of efficiently encoding a speech signal at a low bit rate, a method of separating a speech signal into a Linear Prediction (IP) filter and an excitation signal for driving the same to encode the speech signal is widely used. One of the typical methods is Code Excited Linear Prediction (Code Excited Linear Prediction, referred to as CELP). In CELP, a synthesized speech signal is obtained by driving an LP filter, to which LP coefficients representing the frequency characteristics of input speech are set, with an excitation signal represented by the sum of an Adaptive Codebook (ACB) representing the interval (pitch) period of the input speech and a Fixed Codebook (FCB) composed of random numbers or pulses. At this time, respective gains ("ACB gain" and "FCB gain") are multiplied on the ACB component and the FCB component. Further, for CELP, see Code exposed linear prediction by m.schroeder and s.oral: high quality speed at top low rates (proc. of IEEE int. Conf. on Acoust., Speech and Signal Processing, pp.937-940, 1985) (referred to as literature 1).
Further, when it is assumed that a 3G mobile network and a wired packet network are connected to each other, there is a problem that direct connection cannot be made because standard speech coding methods used in the respective networks are different. The simplest solution to this is a series connection. However, in the series connection, a speech signal is temporarily decoded from a code string obtained by encoding speech using one standard method using the standard method, and then the decoded speech signal is encoded again using the other standard method. Therefore, there is a problem that the quality of sound is generally poor, the delay is increased, and the amount of calculation is increased, as compared with the case where encoding and decoding are performed only once using each speech encoding and decoding method.
In this regard, a transcoding method of converting a code obtained by encoding speech using a standard method on one side into a code that can be decoded using a standard method on the other side in a code region or an encoding parameter region is effective to solve the problem. As for a method of converting codes, see Hong-Goo Kang, "Improving Transmission capabilities of Speech codes in Clean and Frame affected Channel Environments (Proc. of IEEE Workshop on Speech Coding 2000, pp78-80, 2000) (referred to as document 2).
Fig. 12 is a schematic diagram showing an example of the configuration of a transcoding apparatus that converts a code obtained by encoding a speech using a first speech encoding method (referred to as method a) into a code that can be decoded using a second method (referred to as method B). Referring to fig. 12, the code conversion apparatus includes an input terminal 10, a code separation circuit 1010, an LP coefficient code conversion circuit 100, an ACB code conversion circuit 200, an FCB code conversion circuit 300, a gain code conversion circuit 400, a code multiplexing circuit 1020, and an output terminal 20. Each component of the conventional code conversion device will be described with reference to fig. 12.
A first code string obtained by encoding a speech using method a is input from the input terminal 10.
The code separation circuit 1010 separates codes corresponding to the LP coefficient, ACB, FCB, ACB gain, and FCB gain, that is, LP coefficient code, ACB code, FCB code, and gain code, from the first code string input from the input terminal 10. Here, the ACB gain and the FCB gain are coded and decoded together, and therefore they are simply referred to as a gain, and their codes are referred to as a gain code. The LP coefficient code, ACB code, FCB code, and gain code are referred to as a first LP coefficient code, a first ACB code, a first FCB code, and a first gain code, respectively. Further, the first LP coefficient code is output to the LP coefficient code conversion circuit 100, the first ACB code is output to the ACB code conversion circuit 200, the first FCB code is output to the FCB code conversion circuit 300, and the first gain code is output to the gain code conversion circuit 400.
The LP coefficient code conversion circuit 100 inputs the first LP coefficient code output by the code separation circuit 1010, and converts the first LP coefficient code into a code that can be decoded using the method B. And outputs the converted LP coefficient code as a second LP coefficient code to the code multiplexing circuit 1020.
The ACB code conversion circuit 200 inputs the first ACB code output by the code separation circuit 1010 and converts the first ACB code into a code that can be decoded using method B. And outputs the converted ACB code to code multiplexing circuit 1020 as a second ACB code.
The FCB code conversion circuit 300 inputs the first FCB code output by the code separation circuit 1010, and converts the first FCB code into a code that can be decoded using method B. And outputs the converted FCB code to code multiplexing circuit 1020 as a second FCB code.
The gain code conversion circuit 400 inputs the first gain code output by the code separation circuit 1010 and converts the first gain code into a code that can be decoded using the method B. And outputs the converted gain code to the code multiplexing circuit 1020 as a second gain code.
A more specific operation of each conversion circuit is explained below.
The LP coefficient code conversion circuit 100 decodes the first LP coefficient code input from the code separation circuit 1010 using the LP coefficient decoding method in method a, thereby obtaining a first LP coefficient. Next, the LP coefficient code conversion circuit 100 quantizes and encodes the first LP coefficient using the quantization method and the encoding method of the LP coefficient in method B, thereby obtaining a second LP coefficient code. Then, the LP coefficient code conversion circuit 100 outputs the second LP coefficient code to the code multiplexing circuit 1020 as a code that can be decoded using the LP coefficient decoding method in method B.
The ACB code conversion circuit 200 converts the first ACB code input from the code separation circuit 1010 by using the correspondence relationship between the code in method a and the code in method B, thereby obtaining a second ACB code. Then, the ACB code conversion circuit 200 outputs the second ACB code to the code multiplexing circuit 1020 as a code that can be decoded using the ACB decoding method in method B.
The FCB code conversion circuit 300 transforms the first FCB code input from the code separation circuit 1010 by using the correspondence relationship between the code in method a and the code in method B, thereby obtaining a second FCB code. Then, the FCB code conversion circuit 300 outputs the second FCB code to the code multiplexing circuit 1020 as a code that can be decoded using the FCB decoding method in method B.
The gain code conversion circuit 400 decodes the first gain code input from the code separation circuit 1010 using the gain decoding method in method a, thereby obtaining the first gain. Next, the gain code conversion circuit 400 quantizes and encodes the first gain using the quantization method and the encoding method of the gain in method B, thereby obtaining a second gain and a code thereof (second gain code). Then, the gain code conversion circuit 400 outputs the second gain code to the code multiplexing circuit 1020 as a code that can be decoded using the gain decoding method in method B.
Code multiplexing circuit 1020 receives second LP coefficient codes output from LP coefficient code conversion circuit 100, second ACB codes output from ACB code conversion circuit 200, second FCB codes output from FCB code conversion circuit 300, and second gain codes output from gain code conversion circuit 400, and outputs the code string obtained by multiplexing these codes as a second code string via output terminal 20. This concludes the description of fig. 12.
However, the conventional transcoder described with reference to fig. 12 has a problem that the sound quality of background noise in a non-speech section deteriorates.
This is because the background noise energy in the non-speech section varies greatly in time. The reason for this is that the second gain obtained by quantizing the first gain again varies greatly in time in the non-speech section.
The present invention has been made in view of the above problems, and a main object thereof is to provide an apparatus and a method capable of reducing deterioration of the sound quality of background noise in a non-speech section, and a recording medium recording a program thereof. Other objects, features, advantages, and the like of the present invention will become apparent to those skilled in the art from the following description.
Disclosure of Invention
A method of a first aspect of the present invention to achieve the above object is a transcoding method of converting a first code string according to a first method into a second code string according to a second method, the method comprising the steps of: obtaining a first linear prediction coefficient and information of an excitation signal from the first code string, and driving a filter having the first linear prediction coefficient using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal; calculating an optimal gain based on a second voice signal generated from information obtained from a second code string and the first voice signal; correcting the optimal gain; the gain information in the second code string is found based on the corrected optimum gain (corrected optimum gain), the optimum gain, and the gain read from the gain codebook of the second method. In the method of the present invention, the optimum gain is preferably obtained as a gain that minimizes a distance between second speech information generated from information obtained from a second code string and the first speech information.
The method of the second aspect of the present invention is a transcoding method of converting a first code string according to a first method into a second code string according to a second method, the method comprising the steps of: decoding gain information from the first code string; modifying the decoded gain (decoding gain); the gain information in the second code string is found based on the corrected decoding gain (corrected decoding gain), the decoding gain, and the gain read from the gain codebook of the second method.
In the above-described invention according to the first aspect, it is preferable that the gain information in the second code string is obtained by selecting a gain from the gain codebook, the gain having a smallest evaluation function based on the first squared error and the second squared error by calculating a first squared error from the corrected optimal gain and a gain read from the gain codebook and calculating a second squared error from the optimal gain and a gain read from the gain codebook.
In the aforementioned invention according to the second aspect, it is preferable that the gain information in the second code string is obtained by selecting a gain from the gain codebook, the gain having a smallest evaluation function based on the first square error and the second square error by calculating a first square error from the corrected decoding gain and a gain read from the gain codebook and calculating a second square error from the decoding gain and a gain read from the gain codebook.
In the invention of the first aspect, the correction optimum gain is preferably based on a long-term average of the optimum gains.
In the invention of the second aspect described above, the modified decoding gain is preferably based on a long-term average of the decoding gains.
An apparatus of a third aspect of the present invention is a code converting apparatus that converts a first code string according to a first method into a second code string according to a second method, the apparatus including: a speech decoding circuit that obtains a first linear prediction coefficient and information of an excitation signal from the first code string, and drives a filter having the first linear prediction coefficient using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal; an optimum gain calculation circuit that calculates an optimum gain based on a second speech signal generated from information obtained from a second code string and the first speech signal; an optimum gain correction circuit for correcting the optimum gain; the gain coding circuit obtains gain information in the second code string based on the corrected optimal gain (corrected optimal gain), the optimal gain, and the gain read from the gain codebook of the second method. In the apparatus according to the present invention, the optimum gain calculation circuit preferably obtains, as the optimum gain, a gain that minimizes a distance between second speech information and the first speech information, the second speech information being generated from information obtained from a second code string.
An apparatus in a fourth aspect of the present invention is a code converting apparatus that converts a first code string according to a first method into a second code string according to a second method, the apparatus including: a gain decoding circuit that decodes gain information from the first code string; a gain correction circuit for correcting the decoded gain (decoding gain); the gain coding circuit obtains gain information in the second code string based on the corrected decoding gain (corrected decoding gain), the decoding gain, and a gain read from the gain codebook of the second method.
In the aforementioned third aspect of the invention, the gain coding circuit preferably calculates a first square error from the corrected optimum gain and the gain read from the gain codebook, calculates a second square error from the optimum gain and the gain read from the gain codebook, and obtains the gain information in the second code string by selecting a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error.
In the fourth aspect of the present invention, the gain coding circuit preferably calculates a first square error from the corrected decoding gain and the gain read from the gain codebook, calculates a second square error from the decoded gain and the gain read from the gain codebook, and selects a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error to obtain gain information in the second code string.
In the optimum gain correction circuit according to the third aspect of the present invention, it is preferable that the corrected optimum gain is based on a long-term average of the optimum gains.
In the decoding gain correction circuit of the invention of the fourth aspect, the correction decoding gain is preferably based on a long-time average of the decoding gains.
A program of a fifth aspect of the present invention provides a program for causing a computer constituting a code conversion apparatus that converts a first code string according to a first method into a second code string according to a second method to execute:
(a) obtaining a first linear prediction coefficient and information of an excitation signal from the first code string, and driving a filter having the first linear prediction coefficient using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal;
(b) calculating a gain (optimum gain) based on a second voice signal generated from information obtained from a second code string and the first voice signal;
(c) correcting the optimal gain;
(d) the gain information in the second code string is found based on the corrected optimum gain (corrected optimum gain), the optimum gain, and the gain read from the gain codebook of the second method. In the present invention, a gain that minimizes a distance between second speech information generated from information obtained from a second code string and the first speech information is obtained as an optimum gain.
A program of a sixth aspect of the present invention provides a program for causing a computer constituting a code conversion apparatus that converts a first code string according to a first method into a second code string according to a second method to execute:
(a) decoding gain information from the first code string;
(b) modifying the decoded gain (decoding gain);
(c) the gain information in the second code string is found based on the corrected decoding gain (corrected decoding gain), the decoding gain, and the gain read from the gain codebook of the second method.
In the program according to the fifth aspect of the present invention, it is preferable that the gain information in the second code string is obtained by selecting a gain from the gain codebook, the gain having a smallest evaluation function based on the first square error and the second square error by calculating a first square error from the corrected optimum gain and a gain read from the gain codebook and calculating a second square error from the optimum gain and a gain read from the gain codebook.
In the program according to the sixth aspect of the present invention, it is preferable that the gain information in the second code string is obtained by selecting a gain from the gain codebook, the gain having a smallest evaluation function based on the first square error and the second square error, by calculating a first square error from the corrected decoding gain and a gain read from the gain codebook, and calculating a second square error from the decoded gain and a gain read from the gain codebook.
In the program of the invention of the fifth aspect, it is preferable that the correction optimum gain is based on a long-term average of the optimum gains.
In the program of the invention of the sixth aspect, it is preferable that the modified decoding gain is based on a long-term average of the decoding gains.
A seventh aspect of the present invention provides a recording medium on which the program of the fifth and sixth aspects is recorded.
Drawings
Fig. 1 is a schematic configuration diagram of a first embodiment of a transcoding apparatus of the present invention;
FIG. 2 is a schematic diagram of the construction of an LP coefficient code conversion circuit in the code conversion apparatus of the present invention;
fig. 3 is an explanatory diagram of a correspondence relationship between ACB codes and ACB delays and a conversion method of the ACB codes;
FIG. 4 is a schematic diagram of the structure of the speech decoding circuit of the transcoding device of the present invention;
FIG. 5 is a schematic diagram showing the configuration of a target signal calculating circuit in the code converting apparatus according to the present invention;
FIG. 6 is a schematic diagram showing the structure of an FCB code generating circuit in the code conversion apparatus according to the present invention;
fig. 7 is an explanatory diagram of a correspondence relationship between pulse position codes and pulse positions and a conversion method of ACB codes;
FIG. 8 is a schematic diagram showing a gain code generating circuit in the code conversion apparatus according to the present invention;
fig. 9 is a schematic configuration diagram of a second embodiment of the code conversion apparatus of the present invention;
FIG. 10 is a schematic diagram showing a gain code generating circuit in the code conversion apparatus according to the present invention;
FIG. 11 is a schematic diagram showing the construction of a third and fourth embodiment of the transcoding device of the present invention;
fig. 12 is a schematic diagram of a configuration of a conventional code conversion device.
Detailed Description
The following describes embodiments of the present invention. The outline and principle of the apparatus and method of the present invention will be first explained, and then the embodiments will be explained in detail below.
In a code conversion device of the present invention, a first linear prediction coefficient and information of an excitation signal are obtained from a first code string according to a first method, a filter having the first linear prediction coefficient is driven by using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal, a gain code generation circuit (1400) calculates a gain (optimum gain) that minimizes the distance between a second speech signal generated from information obtained from a second code string according to a second method and a second speech signal generated from the information obtained from the second code string according to the second method, and corrects the optimum gain, thereby finding gain information in the second code string based on the corrected optimum gain (corrected optimum gain), the optimum gain, and a gain read from a gain codebook in the second method.
The method of the present invention has the following steps.
Step a: deriving a first linear prediction coefficient from the first code string;
step b: deriving information of the excitation signal from the first code string;
step c: deriving an excitation signal from information of the excitation signal;
step d: generating a first speech signal by driving a filter having a first linear prediction coefficient using the excitation signal;
step e: calculating a gain (optimal gain) that minimizes a distance between a second speech signal and the first speech signal, wherein the second speech signal is generated from information obtained from a second code string according to a second method;
step f: correcting the optimal gain;
step g: the gain information in the second code string is found based on the corrected optimum gain (corrected optimum gain), the optimum gain, and the gain read from the gain codebook in the second method.
In the present invention, the second gain (gain in the second code string) is obtained in the non-speech section using an evaluation function that reduces temporal variation of the second gain.
Therefore, in the non-speech section, the temporal variation of the obtained second gain is reduced, and the temporal variation of the background noise energy in the same section is reduced.
As a result, deterioration of the quality of the background noise in the non-speech section is reduced.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
First embodiment
Fig. 1 is a schematic configuration diagram of a transcoding apparatus according to a first embodiment of the present invention. In fig. 1, the same reference numerals are given to the same or equivalent elements as those in fig. 12. Referring to fig. 1, the encoder includes an input terminal 10, a code separation circuit 1010, an LP coefficient code conversion circuit 1100, an LSP-LPC conversion circuit 1110, an impulse response calculation circuit 1120, an ACB code conversion circuit 1200, a target signal calculation circuit 1700, an FCB code generation circuit 1800, a gain code generation circuit 1400, a speech decoding circuit 1500, a second excitation signal calculation circuit 1610, a second excitation signal storage circuit 1620, a code multiplexing circuit 1020, and an output terminal 20. The input terminal 10, the output terminal 20, the code separation circuit 1010, and the code multiplexing circuit 1020 are basically the same as those shown in fig. 12 except that the connections are partially different. The following description will mainly explain differences from the configuration shown in fig. 12, while omitting the description of the same or equivalent elements as those described above.
Further, in method A, the coding of the LP coefficients is set every time
Tfr (A)
Coding of constituent elements of excitation signal such as ACB, FCB and gain is performed every msec period (frame)
T sfr ( A ) = T fr ( A ) / N sfr ( A )
msec period (subframe).
On the other hand, in method B, the code of the LP coefficient is set every time
Tfr (B)
At msec period (frame), and each time the constituent elements of the excitation signal are encoded
T sfr ( B ) = T fr ( B ) / N sfr ( B )
msec period (subframe).
The frame length, the number of subframes, and the subframe length of method a are set as follows
Lfr (A)
Nsfr (A)
And L sfr ( A ) = L fr ( A ) / N sfr ( A ) .
the frame length, the number of sub-frames and the length of the sub-frames of method B are respectively set as
Lfr (B)
Nsfr (B)
And L sfr ( B ) = L fr ( B ) / N sfr ( B ) .
in the following description, for convenience, the following description will be assumed to be
L fr ( A ) = L fr ( B )
N sfr ( A ) = N sfr ( B ) = 2 .
Here, for example, if the sampling frequency is 8000Hz, the
Tfr (A)
And
Tfr (B)
it is set to 10msec and,
then L isfr (A)
And Lfr (B)
In the case of a sample of 160 samples,
Lsfr (A)
and
Lsfr (B)
is 80 samples.
The LP coefficient code conversion circuit 1100 inputs the first LP coefficient code from the code separation circuit 1010. In many standard methods such as 3GPP AMR spech Codec (document 3) and ITU-T recommendation g.729, the LP coefficient is represented by a Line Spectrum Pair (LSP), and the coding and decoding of the LSP are performed in the LSP region because many times the LSP is coded and decoded. For example, the descriptions in sections 5.2.3 and 6.2.4 of "document 3" can be referred to for a known method of switching from an LP coefficient to an LSP and switching from an LSP to an LP coefficient. The LP coefficient code conversion circuit 1100 decodes the first LP coefficient code using the LSP decoding method in method a, thereby obtaining a first LSP.
Next, the LP coefficient code conversion circuit 1100 quantizes and encodes the first LSP by using the LSP quantization method and the encoding method in method B, thereby obtaining a second LSP and a code (second LP coefficient code) corresponding thereto. Then, the LP coefficient code conversion circuit 1100 outputs the second LP coefficient code to the code multiplexing circuit 1020 as a code that can be decoded using the LSP decoding method in method B, and outputs the first LSP and the second LSP to the LSP-LPC conversion circuit 1110.
Fig. 2 is a schematic diagram of the LP coefficient code conversion circuit 1100. Referring to fig. 2, the LP coefficient code conversion circuit 1100 includes an LSP decoding circuit 110, a first LSP codebook 111, an LSP coefficient encoding circuit 130, and a second LSP codebook 131. Each constituent element of the LP coefficient code conversion circuit 1100 is described with reference to fig. 2.
The LSP decoding circuit 110 decodes the corresponding LSP from the LP coefficient code. The LSP decoding circuit 110 has a first LSP codebook 111 in which a plurality of sets of LSPs are stored, and inputs the first LP coefficient code output from the code separation circuit 1010 via the input terminal 31, reads an LSP corresponding to the first LP coefficient code from the first LSP codebook 111, outputs the read LSP as a first LSP to the LSP encoding circuit 130, and outputs the first LSP to the LSP-LPC conversion circuit 1110 via the output terminal 33. Here, the decoding of the LSP from the LP-based code is based on the LSP decoding method of method a, and the LSP codebook of method a is used.
The LSP encoding circuit 130 receives the first LSP output from the LSP decoding circuit 110 as an input, sequentially reads the second LSP and the LP coefficient codes corresponding thereto from the second LSP codebook 131 in which a plurality of sets of LSPs are stored, respectively, selects the second LSP having the smallest error with respect to the first LSP, outputs the LP coefficient code corresponding thereto to the code multiplexing circuit 1020 as the second LP coefficient code via the output terminal 32, and outputs the second LSP to the LSP-LPC conversion circuit 1110 via the output terminal 34. Here, the second LSP selection method, i.e., the LSP quantization and coding method, uses the LSP codebook of method B according to the LSP quantization method and coding method of method B. Here, for quantization and coding of LSPs, see, for example, the description in section 5.2.5 of "document 3".
This completes the description of the LP coefficient code conversion circuit 1100 in fig. 2, and the description returns to fig. 1 again.
The LSP-LPC conversion circuit 1110 inputs the first LSP and the second LSP output by the LP coefficient code conversion circuit 1100, and converts the first LSP into the first LP coefficient a1、iConverting the second LSP into a second LP coefficient a2、iThereby converting the first LP coefficient a1、IOutputs the signal to a target signal calculation circuit 1700, a speech decoding circuit 1500, and an impulse response calculation circuit 1120, and outputs a second LP systemNumber a2、iOutput to the target signal calculation circuit 1700 and the impulse response calculation circuit 1120. Here, the conversion from LSP to LP coefficient is described in section 5.2.4 of "document 3".
The ACB code conversion circuit 1200 transforms the first ACB code input by the code separation circuit 1010 using the correspondence between the codes in method a and the codes in method B ( み instead of え), thereby obtaining a second ACB code. Also, the ACB code conversion circuit 1200 outputs the second ACB code to the code multiplexing circuit 1020 as a code that can be decoded using the ACB decoding method in method B. Further, the ACB code conversion circuit 1200 outputs the ACB delay corresponding to the second ACB code to the target signal calculation circuit 1700 as the second ACB delay.
Here, the conversion of the code is explained with reference to fig. 3. For example, assume ACB codes in method A
iT (A)
At 56, the corresponding ACB delay
T(A)
Is 76. If in method B, ACB codes are assumed
iT (B)
At 53, the corresponding ACB delay
T(B)
At 76, when ACB code conversion is performed from method a to method B so that the ACB delay value is the same (at 76 in this case), ACB code 56 in method a and ACB code 53 in method B may be associated with each other. This completes the description of the code conversion, and the description returns to fig. 1 again.
The speech decoding circuit 1500 inputs the first ACB code, the first FCB code, and the first gain code output from the code separation circuit 1010, and inputs the first LP coefficient from the LSP-LPC conversion circuit 1110. Next, the speech decoding circuit 1500 decodes the ACB delay, the FCB signal, and the gain from the first ACB code, the first FCB code, and the first gain code, respectively, using the ACB signal decoding method, the FCB signal decoding method, and the gain decoding method of method a, respectively, and uses these as the first ACB delay, the first FCB signal, and the first gain, respectively. The speech decoding circuit 1500 generates an ACB signal using the first ACB delay and takes this as the first ACB signal. The speech decoding circuit 1500 generates speech from the first ACB signal, the first FCB signal, the first gain, and the first LP coefficient, and outputs the speech to the target signal calculation circuit 1700.
Fig. 4 is a schematic diagram of the configuration of the speech decoding circuit 1500. Referring to fig. 4, the speech decoding circuit 1500 has an excitation signal information decoding circuit 1600, an excitation signal calculation circuit 1540, an excitation signal storage circuit 1570, and a synthesis filter 1580, wherein the excitation signal information decoding circuit 1600 has an ACB decoding circuit 1510, an FCB decoding circuit 1520, and a gain decoding circuit 1530. Referring to fig. 4, each constituent element of the speech decoding circuit 1500 will be described.
The excitation signal information decoding circuit 1600 decodes the excitation signal information from the code corresponding to the excitation signal information. The first ACB code, the first FCB code, and the first gain code output from the code separation circuit 1010 are input via input terminals 51, 52, and 53, respectively, and the ACB delay, the FCB signal, and the gain are decoded from the first ACB code, the first FCB code, and the first gain code, respectively, and are used as the first ACB delay, the first FCB signal, and the first gain, respectively. Here, the first gain is composed of ACB gain and FCB gain, which are taken as the first ACB gain and the first FCB gain, respectively. The excitation signal information decoding circuit 1600 receives the past excitation signal output from the excitation signal storage circuit 1570. The excitation signal information decoding circuit 1600 generates an ACB signal using the past excitation signal and the first ACB delay, and takes this as the first ACB signal. The excitation signal information decoding circuit 1600 outputs the first ACB signal, the first FCB signal, the first ACB gain, and the first FCB gain to the excitation signal calculation circuit 1540.
Next, the ACB decoding circuit 1510, the FCB decoding circuit 1520, and the gain decoding circuit 1530, which are components of the excitation signal information decoding circuit 1600, will be described in detail.
The ACB decoding circuit 1510 inputs the first ACB code output from the code separation circuit 1010 via the input terminal 51, and inputs the past excitation signal output from the excitation signal storage circuit 1570. Next, the ACB decoding circuit 1510 uses the corresponding relationship between the ACB code and the ACB delay in the method a shown in fig. 3 to obtain the first ACB delay corresponding to the first ACB code, as in the ACB code conversion circuit 1200 described above
T(A)
. For the excitation signal, from the beginning of the current sub-frame
T(A)
Sampling past points truncating the length of the subframe
Lsfr (A)
The signal is sampled, thereby generating a first ACB signal. Here, when
T(A)
Ratio of
Lsfr (A)
Hour, cut out
T(A)
A vector of quantities is sampled and the vectors are concatenated repeatedly to form a vector of length
Lsft (A)
The sampled signal of (2). The first ACB signal is then output to the stimulus signal calculation circuit 1540. Here, the details of the method of generating the first ACB signal are described in sections 6.1 and 5.6 of "document 3".
The FCB decoding circuit 1520 receives the first FCB code output from the code separation circuit 1010 via the input terminal 52, and outputs a first FCB signal corresponding to the first FCB code to the excitation signal calculation circuit 1540. The FCB signal is represented as a multi-pulse signal defined by a pulse position and a pulse polarity, and thus the first FCB code is composed of a code (pulse position code) corresponding to the pulse position and a code (pulse polarity code) corresponding to the pulse polarity. Here, a method of generating the FCB signal expressed as a multipulse signal is described in detail in sections 6.1 and 5.7 of "document 3".
The gain decoding circuit 1530 inputs the first gain code output from the code separating circuit 1010 via the input terminal 53. The gain decoding circuit 1530 incorporates a table in which a plurality of gains are stored, and reads the gain corresponding to the first gain code from the table. Then, the gain decoding circuit 1530 outputs the first ACB gain corresponding to the ACB gain and the first FCB gain corresponding to the FCB gain among the read gains to the excitation signal calculation circuit 1540. Here, in the case of encoding the first ACB gain and the first FCB gain together, a plurality of two-dimensional vectors composed of the first ACB gain and the first FCB gain are stored in a table. In addition, in the case of encoding the first ACB gain and the first FCB gain separately, two tables are built, and a plurality of first ACB gains are stored in one table and a plurality of first FCB gains are stored in the other table.
The excitation signal calculation circuit 1540 inputs the first ACB signal output by the ACB decoding circuit 1510, the first FCB signal output by the FCB decoding circuit 1520, and the first ACB gain and the first FCB gain output by the gain decoding circuit 1530. The excitation signal calculation circuit 1540 adds a signal obtained by multiplying the first ACB gain by the first ACB signal and a signal obtained by multiplying the first FCB gain by the first FCB signal, thereby obtaining a first excitation signal. Then, the excitation signal calculation circuit 1540 outputs the first excitation signal to the synthesis filter 1580 and the excitation signal storage circuit 1570.
The stimulus signal storage circuit 1570 inputs the first stimulus signal output by the stimulus signal calculation circuit 1540, and stores and holds it. The excitation signal storage circuit 1570 outputs the past first excitation signal, which is input and stored in the past, to the ACB decoding circuit 1510.
The synthesis filter 1580 inputs the first excitation signal output by the excitation signal calculation circuit 1540, and inputs the first LP coefficient output by the LSP-LPC conversion circuit 1110 via the input terminal 61. The synthesis filter 1580 then generates speech by driving a linear prediction filter having the first LP coefficient using the first excitation signal. And outputs the voice signal to the target signal calculation circuit 1700 via the output terminal 63.
This completes the description of the speech decoding circuit 1500 in fig. 4, and the description returns to fig. 1 again.
The target signal calculation circuit 1700 receives the first LSP and the second LSP from the LSP-LPC conversion circuit 1110, receives the second ACB delay corresponding to the second ACB code from the ACB code conversion circuit 1200, receives the decoded speech from the speech decoding circuit 1500, receives the impulse response signal from the impulse response calculation circuit 1120, and receives the past second excitation signal stored in the second excitation signal storage circuit 1620. The target signal calculation circuit 1700 calculates a first target signal from the decoded speech and the first and second LP coefficients. Next, the target signal calculation circuit 1700 obtains the second ACB signal and the optimum ACB gain from the past second excitation signal, the impulse response signal, the first target signal, and the second ACB delay. Then, the target signal calculation circuit 1700 outputs the first target signal and the optimum ACB gain to the gain code generation circuit 1400, and outputs the second ACB signal to the gain code generation circuit 1400 and the second excitation signal calculation circuit 1610.
Fig. 5 is a schematic diagram of the configuration of the target signal calculation circuit 1700. Referring to fig. 5, the target signal calculation circuit 1700 has a weighting signal calculation circuit 1710, an ACB signal generation circuit 1720, and an optimal ACB gain calculation circuit 1730. The respective constituent elements of the target signal calculation circuit 1700 are explained with reference to fig. 5.
The weighted signal calculation circuit 1710 receives the decoded speech s (n) output from the synthesis filter 1580 of the speech decoding circuit 1500 via the input terminal 57, and outputs the speech s (n) via the input terminal 36 and the input terminal 35, respectivelyInto the first LP coefficient a output from the LSP-LPC conversion circuit 11101、iAnd a second LP coefficient a2、i. The weighting signal calculation circuit 1710 first constructs the auditory weighting filter w (z) using the first LP coefficient.
Then, the weighted signal calculation circuit 1710 drives the auditory sense weighting filter using the decoded speech, thereby generating an auditory sense weighted speech signal. Next, the weighted signal calculation circuit 1710 constructs the auditory sensation weighted synthesis filter w (z)/a2(z) using the first LP coefficient and the second LP coefficient.
Then, the weighted signal calculation circuit 1710 outputs the first target signal x (n) obtained by subtracting the zero input response of the hearing weighted synthesis filter from the hearing weighted speech signal to the ACB signal generation circuit 1720 and the optimal ACB gain calculation circuit 1730, and outputs the result to the second target signal calculation circuit 1430 via the output terminal 78.
The ACB signal generation circuit 1720 inputs the first target signal output by the weighting signal calculation circuit 1710, and inputs the second ACB delay T output by the ACB code conversion circuit 1200 via the input terminal 37(B) lagThe impulse response signal h (n) output from the impulse response calculating circuit 1120 is input via the input terminal 74, and the second excitation signal u (n) in the past output from the second excitation signal storing circuit 1620 is input via the input terminal 75.
The ACB signal generation circuit 1720 calculates the filter-processed delayed k past excitation signal based on the convolution (tatamiza み Write み) between the pulse signal and the signal cut out from the past second excitation signal by the delay k
y k ( n ) , n = 0 , . . . , L sfr ( B ) - 1 .
Here, the delay k is the second ACB delay. The signal truncated from the past second excitation signal with a delay k is the second ACB signal v (n).
Then, the ACB signal generation circuit 1720 outputs the second ACB signal to the second target signal calculation circuit 1430 and the second excitation signal calculation circuit 1610 via the output terminal 76, and outputs the filter-processed past excitation signal yk (n) delayed by k to the optimal ACB gain calculation circuit 1730.
The optimal ACB gain calculation circuit 1730 receives the first target signal x (n) output from the weighting signal calculation circuit 1710, and receives the past excitation signal yk (n) of the delay k after filter processing output from the ACB signal generation circuit 1720.
Next, the ACB gain calculation circuit 1730 calculates the optimum ACB gain gp from the first target signal x (n) and the filtered past excitation signal yk (n) delayed by k according to the following equation. The optimal ACB gain gp is the gain that minimizes the distance between the first target signal x (n) and the filtered delayed k past excitation signal yk (n).
g p = Σ n - 0 L sfr ( B ) - 1 x ( n ) y k ( n ) Σ n - 0 L sfr ( B ) - 1 y k ( n ) y k ( n )
Then, the ACB gain calculation circuit 1730 outputs the optimum ACB gain gp to the ACB gain coding circuit 1410 via the output terminal 77.
Further, the method of calculating the second ACB signal and the method of calculating the optimum ACB gain are described in sections 6.1 and 5.6 of "document 3" for a detailed description. The description of the target signal calculation circuit 1700 in fig. 5 is thus completed, and the description returns to fig. 1 again.
The impulse response calculation circuit 1120 inputs the first LP coefficient and the second LP coefficient output by the LSP-LPC conversion circuit 1110, and constructs a listening weight synthesis filter using the first LP coefficient and the second LP coefficient.
Then, the impulse response calculation circuit 1120 outputs the impulse response signal of the listening weighted synthesis filter to the target signal calculation circuit 1700 and the gain code generation circuit 1400. Here, the transfer function of the auditory sense weighted synthesis filter is expressed by the following equation.
W ( z ) A 2 ( z ) = A 1 ( z / r 1 ) A 2 ( z ) A 1 ( z / r 1 )
Wherein,
1 A 2 ( z ) = · 1 1 + Σ i = 1 P a 2 , i z - i
is provided with the second LP coefficient
a2,i,i=1,…,P
The transfer function of the linear prediction filter of (1).
W ( z ) = A 1 ( z / r 1 ) A 1 ( z / r 2 ) = 1 + Σ i = 1 P r 1 i a 1 , i z - i 1 + Σ i = 1 P r 2 i a 1 , i z - i
Is provided with a first LP coefficient
a1,i,i=1,…,P
The transfer function of the auditory weighting filter.
Here, P is the number of linear predictions (e.g., 10), and r1 and r2 are coefficients (e.g., 0.94 and 0.6) that control the weighting.
The FCB code generation circuit 1800 inputs the first FCB code output by the code separation circuit 1010, and converts the first FCB code into a code that can be decoded using method B. The FCB code generation circuit 1800 outputs the converted FCB code to the code multiplexing circuit 1020 as a second FCB code, and outputs a second FCB signal corresponding to the second FCB code to the gain code generation circuit 1400 and the second excitation signal calculation circuit 1610. Here, the FCB signal is composed of a plurality of pulses, and is expressed as a multi-pulse signal defined by the position (pulse position) and polarity (pulse polarity) of the pulse. The FCB code is composed of a code corresponding to a pulse position (pulse position code) and a code corresponding to a pulse polarity (pulse polarity code). For a method of expressing the FCB signal using a multi-pulse signal, see the description in section 5.7 of "document 3".
Fig. 6 is a schematic diagram of the FCB code generation circuit 1800 of fig. 1. Referring to fig. 6, the FCB code generation circuit 1800 has an FCB code conversion circuit 1300 and an FCB signal generation circuit 1820. Referring to fig. 6, each constituent element of the FCB code generation circuit 1800 will be described.
The FCB code conversion circuit 1300 uses the correspondence relationship between the code in method a and the code in method B for the first FCB code i input from the code separation circuit 1010 via the input terminal 85(A) PIs transformed to obtain a second FCB code i(B) P. Then, the FCB code conversion circuit 1300 outputs the code, which can be decoded by the FCB decoding method in method B, to the code multiplexing circuit 1020 via the output terminal 55, and outputs the pulse position corresponding to the second FCB code
Pi (A)
And polarity of the pulse
Si (A)
And output to the FCB signal generation circuit 1820.
The transformation of the pulse position code is explained with reference to fig. 7.
For example, when the pulse position code in method A
iP (A)
When it is 6, the pulse position corresponding thereto
P0 (A)
Is 30. If in method B, it is assumed that the current pulse position code
iP (B)
When 1, the pulse position corresponding thereto
P0 (B)
At 30, when the pulse position code is converted from method a to method B so that the pulse position value is the same (30 in this case), the pulse position code 6 in method a may be associated with the pulse position code 1 in method B.
As for the pulse polarity code, the code may be converted so that the polarity (positive or negative) corresponding to the code before conversion is the same as the polarity corresponding to the code after conversion.
The description of the conversion of the pulse position code and the pulse polarity code is ended, and the description returns to fig. 6 again.
The FCB signal generation circuit 1820 inputs the pulse position and the pulse polarity output from the FCB code conversion circuit 1300. The FCB signal generation circuit 1820 outputs an FCB signal defined by the pulse position and the pulse polarity as the second FCB signal c (n) to the optimum FCB gain calculation circuit 1440 and the second excitation signal calculation circuit 1610 through the output terminal 86.
The FCB code generation circuit 1800 of fig. 6 is now described, and the description returns to fig. 1.
The gain code generation circuit 1400 receives the first target signal, the second ACB signal, and the optimal ACB gain output from the target signal calculation circuit 1700, receives the second FCB signal output from the FCB code generation circuit 1800, receives the impulse response signal output from the impulse response calculation circuit 1120, and receives the first LSP output from the LP coefficient code conversion circuit 1100.
The gain code generation circuit 1400 first calculates a second target signal from the first target signal, the second ACB signal, the optimal ACB gain, and the impulse response signal, calculates an optimal FCB gain from the second target signal, the second FCB signal, and the impulse response signal, calculates a corrected FCB gain from the optimal FCB gain, and determines a voice determination value from the first LSP.
Next, the gain code generation circuit 1400 calculates a first square error from the ACB gain and the optimum ACB gain read from the ACB gain codebook in order, and calculates a second square error from the ACB gain and the corrected ACB gain.
The gain code generation circuit 1400 selects an ACB gain and a corresponding ACB gain code that minimize an evaluation function calculated from the weight coefficient, the first squared error, and the second squared error calculated from the speech determination value.
Further, the gain code generation circuit 1400 calculates a third square error from the FCB gain and the optimum FCB gain read from the FCB gain codebook in order, and calculates a fourth square error from the FCB gain and the corrected FCB gain.
The gain code generation circuit 1400 selects the FCB gain and the corresponding FCB gain code that minimize the evaluation function calculated from the weighting coefficient, the third square error, and the fourth square error calculated from the speech determination value.
Finally, the gain code generation circuit 1400 outputs the second gain code composed of the selected ACB gain code and FCB gain code to the code multiplexing circuit 1020 via the output terminal 56 as a code that can be decoded using the gain decoding method in method B.
Fig. 8 is a schematic diagram of the gain code generation circuit 1400. Referring to fig. 8, the gain code generation circuit 1400 includes an ACB gain coding circuit 1410, an ACB gain codebook 1411, an FCB gain coding circuit 1420, an FCB gain codebook 1421, a second target signal calculation circuit 1430, an optimum FCB gain calculation circuit 1440, an optimum FCB gain correction circuit 1450, and a speech/non-speech recognition circuit 1460. Referring to fig. 8, each constituent element of the gain code generation circuit 1400 will be described in detail.
The second target signal calculation circuit 1430 receives the second ACB signal v (n) output from the ACB signal generation circuit 1720 via the input terminal 92, receives the first target signal x (n) output from the weighting signal calculation circuit 1710 via the input terminal 93, receives the impulse response signal h (n) output from the impulse response calculation circuit 1120 via the input terminal 94, and receives the second ACB gain output from the ACB gain encoding circuit 1410.
The second target signal calculation circuit 1430 calculates the filter-processed second ACB signal from the convolution of the second ACB signal and the impulse response signal
y ( n ) , n = 0 , . . . , L sfr ( B ) - 1
And subtracting y (n) from the first target signal x (n) by multiplying by the second ACB gain
The resulting signal, thereby obtaining a second target signal x2(n)。
x 2 ( n ) = x ( n ) - g ^ P y ( n ) ,
y(n)=v(n)*h(n)
Then, the second target signal calculation circuit 1430 outputs the second target signal x to the optimum FCB gain calculation circuit 14402(n)。
The second FCB signal c (n) output from the FCB signal generation circuit 1820 is input to the optimum FCB gain calculation circuit 1440 via the input terminal 91, the impulse response signal h (n) output from the impulse response calculation circuit 1120 is input to the optimum FCB gain calculation circuit 94, and the second target signal output from the second target signal calculation circuit 1430 is input to the optimum FCB gain calculation circuit 1440Number x2(n) to calculate the second FCB signal after the filter processing based on the convolution of the second FCB signal and the impulse response signal
z ( n ) , n = 0 , . . . , L sfr ( B ) - 1
And from the second target signal x according to2(n) and the second FCB signal z (n) after the filter processing, calculate the optimum FCB gain gc. The optimum FCB gain gc is such that the second target signal x2(n) and the filter processed second FCB signal z (n).
g c = Σ n = 0 L sfr ( g ) - 1 x 2 ( n ) z ( n ) Σ n = 0 L sfr ( g ) - 1 z ( n ) z ( n )
Then, the optimum FCB gain calculation circuit 1440 outputs the optimum FCB gain to the optimum FCB gain correction circuit 1450 and the FCB gain coding circuit 1420.
The voice/non-voice recognition circuit 1460 inputs the first LSP output by the LSP decoding circuit 110 via the input terminal 98. And calculating the LSP fluctuation from the first LSP and the long-term average thereof, and determining a speech determination value from the LSP fluctuation.
The procedure for determining the LSP variation is as follows. Calculating the long-time average of the LSP in the nth frame according to the following equation
q ‾ j ( n ) = β · q ‾ j ( n - 1 ) + ( 1 - β ) · q ^ j ( N sfr ) ( n ) , j = 1 , . . . , N P
Here, NPIs the number of linear predictions, and β is, for example, 0.9.
The LSP fluctuation amount dq (n) of the nth frame is defined by the following equation.
d q ( n ) = Σ j = 1 N p Σ m = 1 N sfr D q , j ( m ) ( n ) q ‾ j ( n )
Here, the number of the first and second electrodes,
Dq,j (m)(n)
can be defined as
Figure C0381767500372
And
an error therebetween, for example, can be defined as
D q , j ( m ) ( n ) = ( q ‾ j ( n ) - q ^ j ( m ) ( n ) ) 2
Or
D q , j ( m ) ( n ) = | q ‾ j ( n ) - q ^ j ( m ) ( n ) |
And the latter is used herein. The interval with the large variation dq (n) can be associated with a speech interval, and the interval with the small variation dq (n) can be associated with a non-speech interval. Determining a speech determination value by threshold processing of the fluctuation amount dq (n)
Vs
if(dq(n)≥Cvs)then Vs=1
else Vs=0
(Vs(n) cases of greater than or equal to CVS
Vs0 dq (n) case less than CVS)
Here, CvsIs a constant (e.g., 2.2), and V s1 corresponds to a speech interval, Vs0 corresponds to a non-speech interval. The voice determination value is output to the optimal ACB gain correction circuit 1480, the ACB gain encoding circuit 1410, the optimal FCB gain correction circuit 1450, and the FCB gain encoding circuit 1420.
The optimum ACB gain correction circuit 1480 receives the optimum ACB gain output by the ACB signal generation circuit 1720 via the input terminal 97, and receives the voice determination value output by the voice/non-voice recognition circuit 1460. In the optimum ACB gain correction circuit 1480, when the voice determination value V is setsWhen the average value is 0 (non-speech interval), the long-term average of the optimum ACB gain is used as the corrected ACB gain. In the non-speech interval, the long-time average of the optimal ACB gain is calculated according to the following equation.
g ‾ p ( n ) = α · g ‾ p ( n - 1 ) + ( 1 - α ) · g p ( n )
Here, the number of the first and second electrodes,
gp(n)
is the best ACB gain in the nth subframe,
Figure C0381767500382
is the long time average of the best ACB gain in the nth subframe and alpha is, for example, 0.9. In addition, the average value, the median value, the mode value, and the like can be applied to the long-time averaging.
On the other hand, in the optimum ACB gain correction circuit 1480, when the speech determination value V is setsWhen the gain is 1 (speech interval), the optimum ACB gain itself is used as the corrected ACB gain.
The optimal ACB gain correction circuit 1480 outputs the corrected ACB gain to the ACB gain coding circuit 1410.
The ACB gain coding circuit 1410 receives the optimum ACB gain gp output from the ACB signal generation circuit 1720, the corrected ACB gain output from the optimum ACB gain correction circuit 1480, and the speech determination value output from the speech/non-speech recognition circuit 1460 via the input terminal 97.
The ACB gain coding circuit 1410 calculates a first square error from the ACB gain read out from the ACB gain codebook 1411 and the optimum ACB gain from the input terminal 97 in this order, calculates a second square error from the ACB gain and the corrected ACB gain, and calculates an evaluation function defined by the following equation from the weight coefficient calculated from the speech determination value, the first square error, and the second square error.
E gp = μ · ( g p - g ^ p ) 2 + ( 1 - μ ) · ( g ~ p - g ^ p ) 2
Here, the number of the first and second electrodes,
gp
is the best gain of the ACB, and,
is to modify the gain of the ACB,
is the ACB gain read sequentially from the ACB gain codebook, and μ is the weight coefficient. For example, when the speech determination value VsWhen it is 1 (speech interval), the weight coefficient μ is 1.0, and when V issWhen 0 (non-speech interval), μ is 0.2.
The ACB gain coding circuit 1410 selects an ACB gain that minimizes the evaluation function, outputs the selected ACB gain to the second target signal calculation circuit 1430 as a second ACB gain, outputs the selected ACB gain to the second excitation signal calculation circuit 1610 through the output terminal 95, and outputs a code corresponding to the second ACB gain to the gain code multiplexing circuit 1470 as an ACB gain code.
The optimum FCB gain correction circuit 1450 inputs the optimum FCB gain output from the optimum FCB gain calculation circuit 1440, and also inputs the speech determination value output from the speech/non-speech recognition circuit 1460.
In the optimum FCB gain correction circuit 1450, when a speech determination value V is determinedsIs 0 (non-speech interval), the long-term average of the optimum FCB gainAs a modified FCB gain. In the non-speech interval, the long-time average of the optimal FCB gain is calculated according to the following equation.
g ‾ c ( n ) = α · g ‾ c ( n - 1 ) + ( 1 - α ) · g c ( n )
Here, the number of the first and second electrodes,
gc(n)
is the best FCB gain in the nth subframe,
Figure C0381767500392
is the long-time average of the best FCB gain in the nth subframe, and α is, for example, 0.9. In addition, the average value, the median value, the mode value, and the like can be applied to the long-time averaging.
On the other hand, in the optimum FCB gain correction circuit 1450, when the speech determination value V is setsAt 1 (speech interval), the optimum FCB gain itself is used as the corrected FCB gain.
The optimal FCB gain correction circuit 1450 outputs the corrected FCB gain to the FCB gain encoding circuit 1420.
The FCB gain encoding circuit 1420 receives the optimum FCB gain output from the optimum FCB gain calculation circuit 1440, the corrected FCB gain output from the optimum FCB gain correction circuit 1450, and the speech determination value output from the speech/non-speech recognition circuit 1460. The FCB gain coding circuit 1420 calculates a first square error from the FCB gain and the optimum FCB gain read out from the FCB gain codebook 1421 in this order, calculates a second square error from the FCB gain and the corrected FCB gain, and calculates an evaluation function defined by the following equation from the weight coefficient calculated from the speech determination value, the first square error, and the second square error.
E gc = μ · ( g c - g ^ c ) 2 + ( 1 - μ ) · ( g ~ c - g ^ c ) 2
Here, the number of the first and second electrodes,
gc
is the optimum FCB gain for the signal,
is to modify the FCB gain in such a way,
Figure C0381767500402
is the FCB gain read sequentially from the FCB gain codebook, and μ is a weight coefficient. For example, when the speech determination value VsWhen it is 1 (speech interval), the weight coefficient μ is 1.0, and when V issWhen 0 (non-speech interval), μ is 0.2.
Then, the FCB gain coding circuit 1420 selects the FCB gain that minimizes the evaluation function, outputs the selected FCB gain to the second excitation signal calculation circuit 1610 as the second FCB gain via the output terminal 96, and outputs the code corresponding to the second FCB gain to the gain code multiplexing circuit 1470 as the FCB gain code.
The gain code multiplexing circuit 1470 receives the ACB gain code output from the ACB gain coding circuit 1410, receives the FCB gain code output from the FCB gain coding circuit 1420, and outputs the second gain code obtained by multiplexing the ACB gain code and the FCB gain code to the code multiplexing circuit 1020 as a code that can be decoded by the gain decoding method in method B.
The description of the gain code generation circuit 1400 in fig. 8 is thus completed, and the description returns to fig. 1 again.
The second excitation signal calculation circuit 1610 inputs the second ACB signal output by the target signal calculation circuit 1700, the second FCB signal output by the FCB code generation circuit 1800, and the second ACB gain and the second FCB gain output by the gain code generation circuit 1400. The second excitation signal calculation circuit 1610 adds a signal obtained by multiplying the second ACB signal by the second ACB gain and a signal obtained by multiplying the second FCB signal by the second FCB gain, thereby obtaining a second excitation signal. The second pumping signal is then output to the second pumping signal storage circuit 1620.
The second excitation signal storage circuit 1620 inputs the second excitation signal output from the second excitation signal calculation circuit 1610, and stores and holds the second excitation signal. Also, the second excitation signal that was input and stored and held in the past is output to the target signal calculation circuit 1700. This concludes the description of the first embodiment of the present invention.
Second embodiment
The second embodiment of the present invention is explained below. Fig. 9 is a schematic configuration diagram of a second embodiment of the transcoding apparatus of the present invention. In fig. 9, the LP coefficient code conversion circuit 100 and the gain code conversion circuit 400 in fig. 12 are replaced with the LP coefficient code conversion circuit 1100 and the gain code conversion circuit 2400, respectively, and a connection line is added between the LP coefficient code conversion circuit 1100 and the gain code conversion circuit 2400. Hereinafter, the description of the same or equivalent elements as those shown in fig. 12 will be omitted, and only the differences will be described.
The LP coefficient code conversion circuit 1100 is the same as that in the first embodiment described using fig. 1. However, unlike the connection method between other circuits, the first LSP is output to the gain code conversion circuit 400.
The gain code conversion circuit 2400 inputs the first gain code output by the code separation circuit 1010, and inputs the first LSP output by the LP coefficient code conversion circuit 1100.
The gain code conversion circuit 2400 calculates a corrected ACB gain and a corrected FCB gain from first gains (a first ACB gain and a first FCB gain) obtained by decoding the first gain code using the gain decoding method in the method a, and determines a speech determination value from the first LSP.
Next, the gain code conversion circuit 2400 calculates a first square error from the ACB gain and the first ACB gain read from the ACB gain codebook in this order, and calculates a second square error from the ACB gain and the corrected ACB gain.
Then, the gain code conversion circuit 2400 selects an ACB gain and a corresponding ACB gain code that minimize an evaluation function calculated from the weight coefficient, the first squared error, and the second squared error calculated from the speech determination value.
The gain code conversion circuit 2400 calculates a third square error from the F CB gain and the first FCB gain sequentially read from the FCB gain codebook, and calculates a fourth square error from the FCB gain and the corrected FCB gain. Then, the gain code conversion circuit 2400 selects an FCB gain and a corresponding FCB gain code that minimize an evaluation function calculated from the weighting coefficient, the third square error, and the fourth square error calculated from the speech determination value.
Finally, the gain code conversion circuit 2400 outputs the second gain code composed of the selected ACB gain code and FCB gain code to the code multiplexing circuit 1020 as a code that can be decoded using the gain decoding method in method B.
Fig. 10 is a schematic diagram of the configuration of the gain code conversion circuit 2400 of fig. 9. Referring to fig. 10, the gain code conversion circuit 2400 has a speech/non-speech recognition circuit 1460, a gain code separation circuit 2490, an ACB gain decoding circuit 2470, an ACB gain codebook 2471, an ACB gain correction circuit 2440, an ACB gain encoding circuit 2410, an ACB gain codebook 1411, an FCB gain decoding circuit 2480, an FCB gain codebook 2481, an FCB gain correction circuit 2450, an FCB gain encoding circuit 2420, an FCB gain codebook 1421, and a gain code multiplexing circuit 1470. Referring to fig. 10, each constituent element of the gain code conversion circuit 2400 of the present embodiment is described. Note that, in fig. 10, the speech/non-speech recognition circuit 1460 and the gain code multiplexing circuit 1470 are basically the same as those shown in fig. 8, and therefore, description thereof will be omitted below.
The gain code separation circuit 2490 receives the first gain code output from the code separation circuit 1010 through the input terminal 45, separates a code corresponding to the ACB gain and the FCB gain from the first gain code, that is, the first ACB gain code, that is, the first FCB gain code, outputs the first ACB gain code to the ACB gain decoding circuit 2470, and outputs the first FCB gain code to the FCB gain decoding circuit 2480.
The ACB gain decoding circuit 2470 has an ACB gain codebook 2471 storing a plurality of sets of ACB gains, and inputs the first ACB gain code output by the gain code separation circuit 2490, thereby reading the ACB gain corresponding to the first ACB gain code from the ACB gain codebook 2471, and outputs the read ACB gain as the first ACB gain to the ACB gain correction circuit 2440 and to the ACB gain encoding circuit 2410. Here, the decoding of ACB gain from ACB gain code is according to the decoding method of ACB gain in method a and is performed using the ACB gain codebook of method a.
The FCB gain decoding circuit 2480 has an FCB gain codebook 2481 storing a plurality of sets of FCB gains, and inputs the first FCB gain code output from the gain code separation circuit 2490, thereby reading the FCB gain corresponding to the first FCB gain code from the FCB gain codebook 2481, and outputs the read FCB gain as the first FCB gain to the FCB gain correction circuit 2450 and to the FCB gain encoding circuit 2420. Here, the decoding of the FCB gain from the FCB gain code is performed according to the FCB gain decoding method in method a and using the FCB gain codebook of method a.
The ACB gain correction circuit 2440 inputs the first ACB gain output by the ACB gain decoding circuit 2470, and inputs the voice determination value output by the voice/non-voice recognition circuit 1460. When the voice determination value VsIn the case of 0 (non-speech interval), the long-term average of the first ACB gain is used as the corrected ACB gain.
The ACB gain correction circuit 2440 calculates a long-time average of the first ACB gain in the non-speech interval according to the following equation.
g ‾ qp ( n ) = α · g ‾ qp ( n - 1 ) + ( 1 - α ) · g qp ( n )
Here, the number of the first and second electrodes,
gqp(n)
is the first ACB gain in the nth subframe,
Figure C0381767500432
is the long time average of the first ACB gain in the nth subframe and alpha is, for example, 0.9. In addition, the average value, the median value, the mode value, and the like can be applied to the long-time averaging.
On the other hand, when the speech determination value V is setsIn the case of 1 (speech interval), the ACB gain correction circuit 2440 uses the first ACB gain itself as the corrected ACB gain.
The ACB gain correction circuit 2440 outputs the corrected ACB gain to the ACB gain encoding circuit 2410.
The FCB gain correction circuit 2450 receives the first FCB gain output from the FCB gain decoding circuit 2480, and receives the speech determination value output from the speech/non-speech recognition circuit 1460.
In the FCB gain correction circuit 2450, when a speech determination value V is setsWhen the value is 0 (non-speech section), the long-term average of the first FCB gain is set as the corrected FCB gain. In the non-speech interval, the long-time average of the first FCB gain is calculated according to the following equation.
g ‾ qc ( n ) = α · g ‾ qc ( n - 1 ) + ( 1 - α ) · g qc ( n )
Here, the number of the first and second electrodes,
gqc(n)
is the first FCB gain in the nth subframe,
Figure C0381767500434
is the long-time average of the first FCB gain in the nth subframe, and α is, for example, 0.9. In addition, the average value, the median value, the mode value, and the like can be applied to the long-time averaging.
On the other hand, when the speech determination value V is setsAt 1 (speech interval), the FCB gain correction circuit 2450 sets the first FCB gain itself as the corrected FCB gain.
The FCB gain correction circuit 2450 outputs the corrected FCB gain to the FCB gain coding circuit 2420.
The ACB gain encoding circuit 2410 receives the first ACB gain output from the ACB gain decoding circuit 2470, the modified ACB gain output from the ACB gain modifying circuit 2440, and the speech determination value output from the speech/non-speech recognition circuit 1460.
The ACB gain encoding circuit 2410 calculates a first square error from the ACB gain and the first ACB gain read out from the ACB gain codebook 1411 in this order, calculates a second square error from the ACB gain and the corrected ACB gain, and calculates an evaluation function defined by the following equation from the weight coefficient calculated from the speech determination value, the first square error, and the second square error.
E gqp = μ · ( g qp - g ^ qp ) 2 + ( 1 - μ ) · ( g ~ qp - g ^ qp ) 2
Here, the number of the first and second electrodes,
gqp
is the first ACB gain to be achieved,
Figure C0381767500442
is to modify the gain of the ACB,
is the ACB gain read sequentially from the ACB gain codebook 1411, and μ is a weight coefficient. For example, when the speech determination value VsWhen it is 1 (speech interval), the weighting coefficient mu is set to 1.0, and when V issWhen it is 0 (non-speech section), μ is set to 0.2.
Then, the ACB gain encoding circuit 2410 selects an ACB gain that minimizes the evaluation function, uses the selected ACB gain as a second ACB gain, and outputs a code corresponding to the second ACB gain to the gain code multiplexing circuit 1470 as a second ACB gain code.
The FCB gain coding circuit 2420 receives the first FCB gain output from the FCB gain decoding circuit 2480, the modified FCB gain output from the FCB gain modifying circuit 2450, and the speech determination value output from the speech/non-speech recognition circuit 1460.
The FCB gain encoding circuit 2420 calculates a third square error from the FCB gain and the first FCB gain sequentially read from the FCB gain codebook 1421, calculates a fourth square error from the FCB gain and the corrected FCB gain, and calculates an evaluation function defined by the following equation from the weight coefficient calculated from the speech determination value, the third square error, and the fourth square error.
E gqc = μ · ( g qc - g ^ qc ) 2 + ( 1 - μ ) · ( g ~ qc - g ^ qc ) 2
Here, the number of the first and second electrodes,
gqc
is the first FCB gain and is,
is to modify the FCB gain in such a way,
Figure C0381767500453
is increased from FCBThe codebook 1421 reads the FCB gain in turn, and μ is a weight coefficient. For example, when the speech determination value VsWhen it is 1 (speech interval), the weighting coefficient mu is set to 1.0, and when V issWhen it is 0 (non-speech section), μ is set to 0.2.
Then, the FCB gain coding circuit 2420 selects an FCB gain that minimizes the evaluation function, uses the selected FCB gain as a second FCB gain, and outputs a code corresponding to the second FCB gain to the gain code multiplexing circuit 1470 as a second FCB gain code.
Third embodiment
The code conversion apparatus according to the embodiments of the present invention described above may be realized by computer control of a digital signal processor or the like. Fig. 11 is a schematic diagram of an apparatus configuration for realizing the transcoding process of each of the above embodiments by a program executed by a computer (processor) as a third embodiment of the present invention. In the computer 1 that executes the program read from the recording medium 6, a code conversion process of converting a first code obtained by encoding speech by a first codec device into a second code decodable by a second codec device is executed, and in accordance with this, a program for causing the following processes to be executed is recorded in the recording medium 6:
(a) deriving a first linear prediction coefficient from the first code string;
(b) deriving information of the excitation signal from the first code string;
(c) deriving an excitation signal from information of the excitation signal;
(d) generating a speech signal by driving a filter having a first linear prediction coefficient using an excitation signal;
(e) calculating a gain (optimal gain) that minimizes a distance between a second voice signal generated from information obtained from a second code string and the first voice signal;
(f) correcting the optimal gain;
(g) and a process of calculating gain information of the second code string by calculating a first square error from the corrected optimum gain (corrected optimum gain) and a gain read from the gain codebook in the second method, calculating a second square error from the optimum gain and a gain read from the gain codebook, and selecting a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error.
The program is read from the recording medium 6 via the recording medium reading device 5 and the interface 4 into the memory 3 and executed. The program may be stored in a nonvolatile memory such as a mask ROM or a flash memory, and the recording medium may include a medium such as a CD-ROM, FD, Digital Versatile Disk (DVD), Magnetic Tape (MT), or removable HDD, in addition to the nonvolatile memory, and may also include a communication medium that carries the program and performs wired or wireless communication when the program is transmitted from a server apparatus using a computer.
In the fourth embodiment of the present invention, in the computer 1 that executes the program read from the recording medium 6, a code conversion process of converting a first code obtained by encoding speech by a first codec device into a second code decodable by a second codec device is executed, and in accordance with this, a program for causing the following processes to be executed is recorded in the recording medium 6:
(a) decoding the gain information from the first code string;
(b) correcting the decoded gain (decoding gain);
(g) the gain information of the second code string is obtained by calculating a first square error from the corrected decoding gain (corrected decoding gain) and the gain read from the gain codebook in the second method, calculating a second square error from the decoding gain and the gain read from the gain codebook, and selecting a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error.
The present invention has been described above based on the above embodiments, but the present invention is not limited thereto, and it is a matter of course that various modifications and improvements which can be made by those skilled in the art within the scope of the invention of the claims are included.
Industrial applicability
As described above, according to the present invention, an effect is obtained that deterioration in the sound quality of background noise in a non-speech section can be reduced.
The reason is that the present invention is constituted as follows: a first speech signal is obtained by driving a synthesis filter having a first linear prediction coefficient with an excitation signal from a first code string, a second speech signal is generated from information obtained from a second code string, an optimum gain is derived from the first speech signal and the second speech signal, the optimum gain is further corrected, and then gain information in the second code string is obtained based on the corrected optimum gain, and gain read from a gain codebook in the second method. The above effects can also be achieved by the present invention configured as follows: the gain information is decoded from the first code string, and the decoded gain is corrected, so that the gain information in the second code string is obtained based on the corrected gain, the decoded gain, and the gain read from the gain codebook in the second method, and the second gain is obtained in the non-speech section using an evaluation function that reduces the temporal variation of the second gain.

Claims (31)

1. A transcoding method for converting a first code string according to a first method into a second code string according to a second method, comprising the steps of:
obtaining a first linear prediction coefficient and information of an excitation signal from the first code string, and driving a filter having the first linear prediction coefficient using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal;
calculating an optimal gain based on a second voice signal generated from information obtained from a second code string and the first voice signal;
correcting the optimal gain;
gain information of the second code string is found based on the corrected optimal gain, which is referred to as a corrected optimal gain, the optimal gain, and a gain read from a gain codebook of the second method.
2. The transcoding method of claim 1, comprising the steps of:
calculating a first squared error from the modified optimal gain and the gain read from the gain codebook;
calculating a second squared error from the optimal gain and the gain read from the gain codebook;
gain information in a second code string is obtained by selecting, from the gain codebook, a gain that minimizes an evaluation function based on the first squared error and the second squared error.
3. The transcoding method of claim 2, wherein the merit function is comprised of the first squared error, the second squared error, and a weight coefficient.
4. The transcoding method of claim 2, wherein there is a step of determining a speech discrimination value for recognizing a speech interval/non-speech interval based on the first linear prediction coefficient;
the evaluation function is obtained by weighted averaging the first squared error and the second squared error using a weight coefficient;
the method also includes a step of calculating the evaluation function by setting the weighting coefficients to predetermined values in accordance with the speech discrimination value and the speech section and the non-speech section, respectively.
5. The transcoding method of claim 1, wherein a gain that minimizes a distance between second voice information generated from information obtained from a second code string and the first voice information is found as an optimum gain.
6. The transcoding method of claim 1 or 2, wherein the modified optimal gain is a gain based on a long time average of the optimal gain.
7. The transcoding method of claim 1, comprising the steps of:
determining a voice discrimination value for recognizing a voice interval/non-voice interval based on the first linear prediction coefficient;
when the speech discrimination value indicates a non-speech section, gain information of the second code string is obtained using an evaluation function that reduces temporal variation of a gain in the second code string.
8. A transcoding method for converting a first code string according to a first method into a second code string according to a second method, comprising the steps of:
decoding gain information from the first code string;
correcting the decoded gain;
obtaining gain information of the second code string based on the corrected decoding gain, the decoding gain, and a gain read from a gain codebook of the second method,
here, the decoded gain is referred to as a decoding gain, and the corrected decoding gain is referred to as a correction decoding gain.
9. The transcoding method of claim 8, comprising the steps of:
determining a voice discrimination value for recognizing a voice interval/non-voice interval based on the first linear prediction coefficient;
when the speech discrimination value indicates a non-speech section, gain information of the second code string is obtained using an evaluation function that reduces temporal variation of a gain in the second code string.
10. The transcoding method of claim 8, comprising the steps of:
calculating a first squared error from the modified decoded gain and the gain read from the gain codebook;
calculating a second squared error from the decoded gain and the gain read from the gain codebook;
gain information in a second code string is obtained by selecting, from the gain codebook, a gain that minimizes an evaluation function based on the first squared error and the second squared error.
11. The transcoding method of claim 10, wherein the merit function is comprised of the first squared error, the second squared error, and a weight coefficient.
12. The transcoding method of claim 10, wherein there is a step of determining a speech discrimination value for recognizing a speech interval/non-speech interval based on the first linear prediction coefficient;
the evaluation function is obtained by weighted averaging the first squared error and the second squared error using a weight coefficient;
the method also includes a step of calculating the evaluation function by setting the weighting coefficients to predetermined values in accordance with the speech discrimination value and the speech section and the non-speech section, respectively.
13. The transcoding method of claim 8 or 10, wherein the modified decoding gain is a gain based on a long-time average of the decoding gains.
14. A code conversion apparatus for converting a first code string according to a first method into a second code string according to a second method, comprising:
a speech decoding circuit that obtains a first linear prediction coefficient and information of an excitation signal from the first code string, and drives a filter having the first linear prediction coefficient using the excitation signal obtained from the information of the excitation signal, thereby generating a first speech signal;
an optimum gain calculation circuit that calculates an optimum gain based on a second voice signal generated from information obtained from a second code string and the first voice signal;
an optimum gain correction circuit for correcting the optimum gain;
and a gain coding circuit for obtaining gain information in the second code string based on the corrected optimal gain, which is called a corrected optimal gain, the optimal gain, and a gain read from the gain codebook of the second method.
15. The transcoding apparatus of claim 14,
a voice/non-voice recognition circuit for outputting a voice discrimination value for recognizing a voice section/non-voice section based on the first linear prediction coefficient;
when the speech discrimination value indicates a non-speech section, the gain coding circuit obtains gain information in the second code string using an evaluation function that reduces temporal variation in gain for listening to the second code string.
16. The transcoding device of claim 14, wherein the gain encoding circuit has means for
And calculating a first square error from the corrected optimal gain and the gain read from the gain codebook, and calculating a second square error from the optimal gain and the gain read from the gain codebook, thereby obtaining gain information of a second code string by selecting a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error.
17. The transcoding apparatus of claim 16, wherein the merit function is comprised of the first squared error, the second squared error, and a weight coefficient.
18. The transcoding apparatus of claim 14 or 16, wherein the modified optimal gain is a gain based on a long time average of the optimal gain.
19. The transcoding apparatus of claim 14, wherein the optimum gain calculation circuit outputs, as the optimum gain, a gain that minimizes a distance between second voice information and the first voice information, the second voice information being generated from information obtained from a second code string.
20. The transcoding apparatus of claim 16,
a voice/non-voice recognition circuit for outputting a voice discrimination value for recognizing a voice section/non-voice section based on the first linear prediction coefficient;
the gain encoding circuit obtains the evaluation function by performing weighted averaging of the first square error and the second square error using a weight coefficient, and at this time, calculates the evaluation function by setting the weight coefficient to a predetermined value in correspondence with each of a speech section and a non-speech section based on the speech discrimination value.
21. A code conversion apparatus for converting a first code string according to a first method into a second code string according to a second method, comprising:
a gain decoding circuit for decoding gain information from the first code string;
a gain correction circuit for correcting the decoded gain;
a gain coding circuit for obtaining gain information in the second code string based on the corrected decoding gain, the decoding gain, and a gain read from the gain codebook of the second method,
here, the decoded gain is referred to as a decoding gain, and the corrected decoding gain is referred to as a correction decoding gain.
22. The transcoding apparatus of claim 21,
a voice/non-voice recognition circuit for outputting a voice discrimination value for recognizing a voice section/non-voice section based on the first linear prediction coefficient;
when the speech discrimination value indicates a non-speech section, the gain coding circuit obtains gain information in the second code string using an evaluation function that reduces temporal variation of a gain of the second code string.
23. The transcoding device of claim 21, wherein the gain encoding circuit has means for
And calculating a first square error from the corrected decoding gain and a gain read from the gain codebook, and calculating a second square error from the decoded gain and a gain read from the gain codebook, and thereby obtaining gain information of a second code string by selecting a gain from the gain codebook that minimizes an evaluation function based on the first square error and the second square error.
24. The transcoding device of claim 23, wherein the merit function is comprised of the first squared error, the second squared error, and a weight coefficient.
25. The transcoding apparatus of claim 21 or 23, wherein the modified decoding gain is a gain based on a long time average of the decoding gain.
26. The transcoding apparatus of claim 23,
a voice/non-voice recognition circuit for outputting a voice discrimination value for recognizing a voice section/non-voice section based on the first linear prediction coefficient;
the gain encoding circuit obtains the evaluation function by performing weighted averaging of the first square error and the second square error using a weight coefficient, and at this time, calculates the evaluation function by setting the weight coefficient to a predetermined value in correspondence with each of a speech section and a non-speech section based on the speech discrimination value.
27. A code conversion device which inputs code string data obtained by multiplexing codes obtained by encoding a speech signal by a first method to a code separation circuit, converts the code string data into codes according to a second method different from the first method based on the codes separated by the code separation circuit, supplies the converted codes to a code multiplexing circuit, and outputs the code string data obtained by multiplexing the converted codes from the code multiplexing circuit,
comprising:
a circuit for generating first and second linear prediction coefficients decoded by a first method and a second method based on the linear prediction coefficient code separated by the code separation circuit;
an adaptive codebook conversion circuit, referred to as an ACB codebook conversion circuit, includes a device that converts an Adaptive Codebook (ACB) code of a first method input from the code separation circuit using a correspondence between a code of the first method and a code of a second method to obtain an ACB code of the second method and outputs it to the code multiplexing circuit, and outputs an ACB delay corresponding to the second ACB code as a second ACB delay to a target signal calculation circuit;
a speech decoding circuit that receives and decodes excitation signal information including an ACB code, a Fixed Codebook (FCB) code, and a gain code of a first method separated by the code separation circuit as input, and synthesizes and outputs a decoded speech signal by driving a synthesis filter having a first linear prediction coefficient decoded by the first method based on a linear prediction coefficient code separated by the code separation circuit using an excitation signal obtained from the excitation signal information;
a fixed codebook code generating circuit, referred to as an FCB code generating circuit, which inputs the FCB code of the first method output by the code separating circuit, converts the FCB code into a code that can be decoded using a second method, outputs the converted code to the code multiplexing circuit as a second FCB code, and outputs a second FCB signal corresponding to the second FCB code;
an impulse response calculation circuit for outputting an impulse response signal of a hearing weighted synthesis filter composed of the first linear prediction coefficient and the second linear prediction coefficient;
the target signal calculation circuit;
a gain code generation circuit;
wherein the target signal calculation circuit has:
a weighted signal calculation circuit that receives the decoded speech output from the synthesis filter of the speech decoding circuit, generates a hearing weighted speech signal by driving a hearing weighted filter using the first linear prediction coefficient with the decoded speech, and generates a first target signal obtained by subtracting a zero input response of the hearing weighted synthesis filter using the first and second linear prediction coefficients from the hearing weighted speech signal;
an ACB signal generation circuit that inputs the first target signal output by the weighting signal calculation circuit, the ACB delay output by the ACB code conversion circuit, the impulse response signal output by the impulse response calculation circuit, and a past second excitation signal output by a second excitation signal storage circuit that stores a held past second excitation signal, and calculates a filter-processed past excitation signal of a delay k by convolution between a signal truncated from the past second excitation signal by a delay k and the impulse response signal, and outputs it as a second ACB signal;
an optimum ACB gain calculation circuit that receives the first target signal output from the weighting signal calculation circuit and the filtered past excitation signal with the delay k output from the ACB signal generation circuit, and derives and outputs an optimum ACB gain from the first target signal and the filtered past excitation signal with the delay k;
the gain code generating circuit
Inputting the first target signal output by the target signal calculation circuit, the second ACB signal, the optimal ACB gain, the second FCB signal output by the FCB code generation circuit, the impulse response signal output by the impulse response calculation circuit, and the first linear prediction coefficient;
and has:
means for calculating a second target signal from the first target signal, the second ACB signal, the optimal ACB gain, and the impulse response signal, and calculating an optimal FCB gain from the second target signal, the second FCB signal, and the impulse response signal;
means for determining a corrected ACB gain from said optimum ACB gain;
means for inputting said calculated optimum FCB gain and calculating a modified FCB gain from said optimum FCB gain;
means for determining a speech decision value from the first linear prediction coefficient;
means for calculating a first squared error from the ACB gain and the optimal ACB gain read sequentially from the ACB gain codebook, and a second squared error from the ACB gain and the modified ACB gain;
means for selecting an ACB gain and a corresponding ACB gain code that minimize a first evaluation function calculated from a weighting coefficient calculated from the speech decision value, the first squared error, and the second squared error;
means for calculating a third squared error from the FCB gain and the optimal FCB gain read sequentially from the FCB gain codebook, and a fourth squared error from the FCB gain and the modified FCB gain;
means for selecting an FCB gain and a corresponding FCB gain code that minimize a second evaluation function calculated from a weight coefficient, a third square error, and a fourth square error calculated from a speech determination value; and
and a means for outputting a second gain code composed of the selected ACB gain code and FCB gain code to the code multiplexing circuit as a code decodable using a gain decoding method in the second method.
28. The code conversion apparatus according to claim 27, having a second excitation signal calculation circuit which inputs a second ACB signal output by the target signal calculation circuit, a second FCB signal output by the FCB code generation circuit, a second ACB gain and a second FCB gain output by the gain code generation circuit, and adds a signal obtained by multiplying the second ACB signal by the second ACB gain and a signal obtained by multiplying the second FCB signal by the second FCB gain to obtain a second excitation signal, and outputs the second excitation signal to the second excitation signal storage circuit;
the second excitation signal storage circuit inputs the second excitation signal output by the second excitation signal calculation circuit, stores and holds it, and outputs the second excitation signal input and stored and held in the past to the target signal calculation circuit.
29. The transcoding apparatus of claim 27, wherein,
the gain code generation circuit includes:
a second target signal calculation circuit that inputs the second ACB signal output by the ACB signal generation circuit, the first target signal output by the weighted signal calculation circuit, the impulse response signal output by the impulse response calculation circuit, and the second ACB gain output by the ACB gain coding circuit, calculates a filter-processed second ACB signal by convolution of the second ACB signal and the impulse response signal, and subtracts a signal obtained by multiplying the filter-processed second ACB signal by the second ACB gain from the first target signal, thereby deriving a second target signal, and outputs the second target signal;
an optimum FCB gain calculation circuit that inputs the second FCB signal output by the FCB signal generation circuit, the impulse response signal output by the impulse response calculation circuit, and the second target signal output by the second target signal calculation circuit, calculates a filter-processed second FCB signal by convolution of the second FCB signal and the impulse response signal, and calculates an optimum FCB gain that minimizes a distance between the second target signal and the second FCB signal;
a speech/non-speech recognition circuit for calculating the variation of the linear prediction coefficient from the first linear prediction coefficient and the long-term average thereof, and determining a speech determination value;
an optimum ACB gain correction circuit that inputs the optimum ACB gain output by the ACB signal generation circuit and the speech determination value output by the speech/non-speech recognition circuit, and calculates a long-time average of the optimum ACB gain as a corrected ACB gain in a non-speech section when the speech determination value indicates the non-speech section, and outputs the optimum ACB gain itself as the corrected ACB gain when the speech determination value indicates the speech section;
an ACB code gain encoding circuit that inputs the optimal ACB gain output by the ACB signal generating circuit, the modified ACB gain output by the optimal ACB gain modifying circuit, and the voice determination value output by the voice/non-voice recognition circuit, and calculates a first square error from the ACB gain read from the ACB gain codebook and the optimum ACB gain, calculates a second square error from the ACB gain and the corrected ACB gain, calculates an evaluation function from a weight coefficient calculated from the voice determination value, the first square error and the second square error, and selects an ACB gain that minimizes the evaluation function, thereby outputting the selected ACB gain as a second ACB gain to the second target signal calculation circuit, simultaneously outputting the code corresponding to the second ACB gain to the second excitation signal calculation circuit as an ACB gain code and outputting the ACB gain code to a gain code multiplexing circuit;
an optimum FCB gain correction circuit that inputs the optimum FCB gain output by the optimum FCB gain calculation circuit and the speech determination value output by the speech/non-speech recognition circuit, and takes a long-time average of the optimum FCB gain as a corrected FCB gain when the speech determination value indicates a non-speech section, and takes the optimum FCB gain itself as a corrected FCB gain when the speech determination value indicates a speech section, and outputs the corrected FCB gain to an FCB gain encoding circuit;
an FCB gain encoding circuit to which the optimum FCB gain output by the optimum FCB gain calculation circuit, the corrected FCB gain output by the optimum FCB gain correction circuit, and the speech determination value output by the speech/non-speech recognition circuit are input, and calculating a third square error from the FCB gain and the optimum FCB gain read from the FCB gain codebook in this order, calculating a fourth square error from the FCB gain and the corrected FCB gain, calculating an evaluation function from a weight coefficient calculated from the speech determination value, the third square error, and the fourth square error, and selecting the FCB gain that minimizes the evaluation function, outputting the selected FCB gain as a second FCB gain to the second excitation signal calculation circuit, and outputting a code corresponding to the second FCB gain as an FCB gain code to the gain code multiplexing circuit;
and a gain code multiplexing circuit to which the ACB gain code outputted from the ACB gain coding circuit and the FCB gain code outputted from the FCB gain coding circuit are inputted, and which outputs a second gain code obtained by multiplexing the ACB gain code and the FCB gain code to the code multiplexing circuit as a code which can be decoded by using a gain decoding method in the second method.
30. A code conversion device which inputs code string data obtained by multiplexing codes obtained by encoding a speech signal by a first method to a code separation circuit, converts the code string data into codes according to a second method different from the first method based on the codes separated by the code separation circuit, supplies the converted codes to a code multiplexing circuit, and outputs the code string data obtained by multiplexing the converted codes from the code multiplexing circuit,
comprising:
a circuit for generating first and second linear prediction coefficients decoded by a first method and a second method based on the linear prediction coefficient code separated by the code separation circuit;
an ACB code conversion circuit that inputs the first ACB code output by the code separation circuit, thereby converting the first ACB code into a code that can be decoded using a second method, and outputs the converted ACB code to the code multiplexing circuit as a second ACB code;
an FCB code conversion circuit that inputs the first FCB code output by the code separation circuit, thereby converting the first FCB code into a code that can be decoded using a second method, and outputs the converted FCB code to the code multiplexing circuit as a second FCB code; and
a gain code conversion circuit that inputs the first gain code output by the code separation circuit, thereby converting the first gain code into a code that can be decoded using a second method, and outputs the converted gain code as a second gain code to the code multiplexing circuit;
the gain code conversion circuit includes:
a means for receiving the first gain code outputted from the code separation circuit and the first linear prediction coefficient, and calculating a corrected ACB gain and a corrected FCB gain from a first Adaptive Codebook (ACB) gain and a first Fixed Codebook (FCB) gain obtained by decoding the first gain code using a gain decoding method of a first method;
means for determining a speech decision value from the first linear prediction coefficient;
a means for calculating a first square error from the ACB gain read from the ACB gain codebook and the first ACB gain, calculating a second square error from the ACB gain and the corrected ACB gain, and selecting an ACB gain and a corresponding ACB gain code that minimize a first evaluation function calculated from a weight coefficient calculated from the speech determination value, the first square error, and the second square error;
a means for calculating a third square error from the FCB gain read from the FCB gain codebook and the first FCB gain in this order, calculating a fourth square error from the FCB gain and the corrected FCB gain, and selecting an FCB gain and a corresponding FCB gain code that minimize a second evaluation function calculated from a weight coefficient calculated from the speech determination value, the third square error, and the fourth square error; and
and a means for outputting a second gain code composed of the selected ACB gain code and the FCB gain code to a code multiplexing circuit as a code decodable using a gain decoding method in the second method.
31. The transcoding apparatus of claim 30,
the gain code conversion circuit includes:
a speech/non-speech recognition circuit for calculating the variation of the linear prediction coefficient from the first linear prediction coefficient and the long-term average thereof, and determining a speech determination value;
a gain code separation circuit which inputs the first gain code outputted from the code separation circuit, and separates a first ACB gain code and a first FCB gain code corresponding to the ACB gain and the FCB gain from the first gain code, thereby outputting the first ACB gain code to the ACB gain decoding circuit and outputting the first FCB gain code to the FCB gain decoding circuit;
an ACB gain decoding circuit having an ACB gain codebook storing a plurality of sets of ACB gains, and inputting the first ACB gain code outputted by the gain code separation circuit, thereby reading an ACB gain corresponding to the first ACB gain code from the first ACB gain codebook, and outputting the read ACB gain as the first ACB gain to the ACB gain correction circuit and to the ACB gain encoding circuit at the same time, and using the ACB gain codebook of the first method from the decoding of the ACB gain code by using the decoding method of the ACB gain in the first method;
an FCB gain decoding circuit having an FCB gain codebook storing a plurality of sets of FCB gains, and inputting the first FCB gain code outputted by the gain code separation circuit, thereby reading an FCB gain corresponding to the first FCB gain code from the first FCB gain codebook, and outputting the read FCB gain as the first FCB gain to the FCB gain correction circuit and to the FCB gain encoding circuit at the same time, and using the FCB gain codebook of the first method from the decoding of the FCB gain code by using the decoding method of the FCB gain in the first method;
an ACB gain correction circuit that inputs the first ACB gain output by the ACB gain decoding circuit and the speech decision value output by the speech/non-speech recognition circuit, and takes a long-time average of the first ACB gain as a corrected ACB gain when the speech decision value indicates a non-speech section, and takes the first ACB gain itself as a corrected ACB gain when the speech decision value indicates a speech section, and outputs the corrected ACB gain to an ACB gain encoding circuit;
an FCB gain correction circuit that inputs the first FCB gain output by the FCB gain decoding circuit and the speech determination value output by the speech/non-speech recognition circuit, and takes a long-time average of the first FCB gain as a corrected FCB gain when a speech determination value indicates a non-speech interval, and takes the first FCB gain itself as a corrected FCB gain when the speech determination value indicates a speech interval, and outputs the corrected FCB gain to an FCB gain encoding circuit;
an ACB gain encoding circuit that receives the first ACB gain outputted by the ACB gain decoding circuit, the corrected ACB gain outputted by the ACB gain correcting circuit, and the speech decision value outputted by the speech/non-speech recognition circuit, calculates a first square error from the ACB gain and the first ACB gain read from the ACB gain codebook in this order, calculates a second square error from the ACB gain and the corrected ACB gain, calculates a first evaluation function from a weight coefficient calculated from the speech decision value, the first square error, and the second square error, selects an ACB gain that minimizes the first evaluation function, uses the selected ACB gain as a second ACB gain, and outputs a code corresponding to the second ACB gain as a second ACB gain code to a gain code multiplexing circuit;
an FCB gain encoding circuit that receives the first FCB gain output from the FCB gain decoding circuit, the corrected FCB gain output from the FCB gain correction circuit, and the speech decision value output from the speech/non-speech recognition circuit, calculates a third square error from an FCB gain read from an FCB gain codebook and the first FCB gain in this order, calculates a fourth square error from the FCB gain and the corrected FCB gain, calculates a second evaluation function from a weight coefficient calculated from the speech decision value, the third square error, and the fourth square error, selects an FCB gain that minimizes the second evaluation function, uses the selected FCB gain as a second FCB gain, and outputs a code corresponding to the second FCB gain as a second FCB gain code to a gain code multiplexing circuit;
and a gain code multiplexing circuit to which the ACB gain code outputted from the ACB gain coding circuit and the FCB gain code outputted from the FCB gain coding circuit are inputted, and which outputs a second gain code obtained by multiplexing the ACB gain code and the FCB gain code to the code multiplexing circuit as a code decodable by a gain decoding method in the second method.
CNB038176750A 2002-07-24 2003-07-09 Method and apparatus for transcoding between different speech encoding/decoding systems and recording medium Expired - Fee Related CN1327410C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2002215766A JP4238535B2 (en) 2002-07-24 2002-07-24 Code conversion method and apparatus between speech coding and decoding systems and storage medium thereof
JP215766/2002 2002-07-24

Publications (2)

Publication Number Publication Date
CN1672192A CN1672192A (en) 2005-09-21
CN1327410C true CN1327410C (en) 2007-07-18

Family

ID=30767940

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB038176750A Expired - Fee Related CN1327410C (en) 2002-07-24 2003-07-09 Method and apparatus for transcoding between different speech encoding/decoding systems and recording medium

Country Status (3)

Country Link
JP (1) JP4238535B2 (en)
CN (1) CN1327410C (en)
WO (1) WO2004010416A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2867648A1 (en) * 2003-12-10 2005-09-16 France Telecom TRANSCODING BETWEEN INDICES OF MULTI-IMPULSE DICTIONARIES USED IN COMPRESSION CODING OF DIGITAL SIGNALS
DE102006051673A1 (en) * 2006-11-02 2008-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for reworking spectral values and encoders and decoders for audio signals
EP2980797A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder, method and computer program using a zero-input-response to obtain a smooth transition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08146997A (en) * 1994-11-21 1996-06-07 Hitachi Ltd Device and system for code conversion
JPH10207491A (en) * 1997-01-23 1998-08-07 Toshiba Corp Method of discriminating background sound/voice, method of discriminating voice sound/unvoiced sound, method of decoding background sound
JP2002198870A (en) * 2000-12-27 2002-07-12 Mitsubishi Electric Corp Echo processing device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08146997A (en) * 1994-11-21 1996-06-07 Hitachi Ltd Device and system for code conversion
JPH10207491A (en) * 1997-01-23 1998-08-07 Toshiba Corp Method of discriminating background sound/voice, method of discriminating voice sound/unvoiced sound, method of decoding background sound
JP2002198870A (en) * 2000-12-27 2002-07-12 Mitsubishi Electric Corp Echo processing device

Also Published As

Publication number Publication date
JP4238535B2 (en) 2009-03-18
JP2004061558A (en) 2004-02-26
WO2004010416A1 (en) 2004-01-29
CN1672192A (en) 2005-09-21

Similar Documents

Publication Publication Date Title
RU2713605C1 (en) Audio encoding device, an audio encoding method, an audio encoding program, an audio decoding device, an audio decoding method and an audio decoding program
JPH0353300A (en) Sound encoding and decoding system
JP4304360B2 (en) Code conversion method and apparatus between speech coding and decoding methods and storage medium thereof
US7630884B2 (en) Code conversion method, apparatus, program, and storage medium
JP2002268696A (en) Sound signal encoding method, method and device for decoding, program, and recording medium
JP3266178B2 (en) Audio coding device
WO2002071394A1 (en) Sound encoding apparatus and method, and sound decoding apparatus and method
US20030033142A1 (en) Method of converting codes between speech coding and decoding systems, and device and program therefor
KR100796836B1 (en) Apparatus and method of code conversion and recording medium that records program for computer to execute the method
CN1327410C (en) Method and apparatus for transcoding between different speech encoding/decoding systems and recording medium
US7319953B2 (en) Method and apparatus for transcoding between different speech encoding/decoding systems using gain calculations
JP3417362B2 (en) Audio signal decoding method and audio signal encoding / decoding method
EP1536413B1 (en) Method and device for voice code conversion
JP4510977B2 (en) Speech encoding method and speech decoding method and apparatus
JP3319396B2 (en) Speech encoder and speech encoder / decoder
JPH1069297A (en) Voice coding device
EP1560201B1 (en) Code conversion method and device for code conversion
JP3274451B2 (en) Adaptive postfilter and adaptive postfiltering method
JP3845316B2 (en) Speech coding apparatus and speech decoding apparatus
JPH04301900A (en) Audio encoding device
JP3071800B2 (en) Adaptive post filter
JPH034300A (en) Voice encoding and decoding system
JP2004020675A (en) Method and apparatus for encoding/decoding speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070718

Termination date: 20130709