GB1485803A

GB1485803A - Method and apparatus for the analysis and synthesis of speech

Info

Publication number: GB1485803A
Application number: GB41575/74A
Authority: GB
Original assignee: Gretag AG
Current assignee: Gretag AG
Priority date: 1974-07-22
Filing date: 1974-09-24
Publication date: 1977-09-14
Also published as: CH581878A5; CA1039407A; US3909533A

Abstract

1485803 Analysis synthesis communication system GRETAG AG 24 Sept 1974 [22 July 1974] 41575/74 Heading H4R In an analysis synthesis telephone system the analyser includes a synthesizer which is identical to the synthesizer used at the receiver of the system, and is adjusted in accordance with the transmitted parameters, the output of the synthesizer is compared with stored samples of the original speech, and the transmitted parameters are adjusted to minimize the comparator output. As described the speech input from 1 after filtering in filter 2, having a cut-off frequency in the region 3-5 kHz, is converted to digital form in the A. to D converter 3 operating at a sampling frequency of 6 to 10 kHz and a short sample of speech, of 10 to 30 msecs. length is stored in unit 5. Simultaneously the pitch detector 4 determines, in conventional manner, whether the speech is voiced or unvoiced and, if voiced, the pitch period. The pitch detector output is fed to coder 10 for transmission to the synthesizer S, and also to a generator 6 which generates pitch pulses at the appropriate frequency if the speech is voiced, or a pseudo-random pulse stream if the speech is unvoiced. The output from generator 6 is fed to a vocal tract model 7, which is a linear digital filter simulating the transfer characteristic of the vocal tract, and the output from the model is compared at 8 with the incoming speech to provide an error signal which is used to adjust the parameters of the vocal tract model to reduce the error signal from the comparator. When the error signal has been reduced to a predetermined level the parameters set in the model 7 are coded in the coder 10 and transmitted. At the receiver a similar pulse/noise generator 6<SP>1</SP> and vocal tract model 7<SP>1</SP> to that in the transmitter are used to synthesize a signal suitable for conversion back to analogue form and reproduction on speaker 13. Since similar filters, generators, and vocal tract models, are used for transmission and reception it is suggested that only one filter, one generator, and one vocal tract model, are used with switching to change their function between transmission and reception. Vocal tract model, Fig. 2a, which includes a number of storage locations U 1 to U 8 , which forms the state vector U n of the model on the nth cycle, 8 linear combinations are formed from these values in the feedback matrix 22, the nth sample of the excitation sequence multiplied by the input vectors components b 1 to b 8 in multipliers 23 is added to each of the linear combinations from the matrix 22 in adders 26 to provide the new state vector U n + 1 for the n + 1th cycle. The output sequence y n is calculated as a linear combination of the values in the store 21 which are multiplied by the output vector components C 1 to C 8 in multipliers 24 and added, in 27, to the input pulse x n multiplied by a scalar quantity d to obtain the output sequence y n . The components of matrix A and vectors b and c and scalar quantity d can be divided into two groups, those that are invariable, and normally have simple values such as 0; 1, or - 1, and those that are changed by the optimization process. Parameter computer, Fig. 3a (not shown), includes a first primary model (29) a unit (30), and 8 primary part models (31 to 38). The primary model (29), is identical to the vocal tract model and can therefore be used as such, it is fed with the excitation function x n and yields the synthetic speech signal y n and the partial derivatives which is equal to the corresponding component Ui of the state vector U, and the derivative which is equal to the corresponding term of the excitation sequence x n in respect of the transition coefficient d. The unit 30 (not shown) is a dual model with respect to the vocal tract model and from it can be derived the partial derivatives In addition the components of the state vector U 1 to U 8 , formed by a matrix which is the transpose of the matrix 22 of the vocal tract model, are fed to primary part models 31 to 38 and the state vectors of the primary part modes each provide The partial derivatives of y n are fed to computer stages, Fig. 4 (not shown), to derive the partial derivatives of the error signal, in accordance with the selected error dimension, which partial derivatives are fed back to the parameter computer, Fig. 3a, to increment the parameters to enable y n to more closely approach the incoming speech signal S n . In an alternative arrangement the parameter computer includes the primary model (39), equivalent to the vocal tract model, and a dual part model (40) (as in 29 and 30 in Fig. 3a) but the state vector components U 1 to U 8 of the primary model (39) are fed to primary dual part models (41 to 48) to obtain the partial derivatives with respect to the matrix coefficients.