EP0138954B1 - Speech pattern processing utilizing speech pattern compression - Google Patents

Speech pattern processing utilizing speech pattern compression Download PDF

Info

Publication number
EP0138954B1
EP0138954B1 EP19840901491 EP84901491A EP0138954B1 EP 0138954 B1 EP0138954 B1 EP 0138954B1 EP 19840901491 EP19840901491 EP 19840901491 EP 84901491 A EP84901491 A EP 84901491A EP 0138954 B1 EP0138954 B1 EP 0138954B1
Authority
EP
European Patent Office
Prior art keywords
speech
signals
signal
representative
speech pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
EP19840901491
Other languages
German (de)
French (fr)
Other versions
EP0138954A1 (en
EP0138954A4 (en
Inventor
Bishnu Saroop Atal
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
American Telephone and Telegraph Co Inc
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone and Telegraph Co Inc, AT&T Corp filed Critical American Telephone and Telegraph Co Inc
Publication of EP0138954A1 publication Critical patent/EP0138954A1/en
Publication of EP0138954A4 publication Critical patent/EP0138954A4/en
Application granted granted Critical
Publication of EP0138954B1 publication Critical patent/EP0138954B1/en
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • This invention relates to speech processing and, particularly, to the compression of speech patterns and to the synthesis of speech patterns from such compressed patterns.
  • a speech signal requires a bandwidth of at least 4 kHz for reasonable intelligibility.
  • digital speech processing systems such as speech synthesizers, recognizers, or coders
  • the channel capacity needed for transmission or memory required for storage of the digital elements of the full 4 kHz bandwidth waveform is very large.
  • Waveform coding such as Pulse Code Modulation (PCM), Differential Pulse Code Modulation (DPCM), Delta Modulation or adaptive predictive coding result in natural sounding, high quality speech at bit rates between 16 and 64 kbps.
  • PCM Pulse Code Modulation
  • DPCM Differential Pulse Code Modulation
  • Delta Modulation adaptive predictive coding
  • An alternative speech coding technique disclosed in U.S. Patent 3,624,302 utilizes a small number, e.g., 12-16, of slowly varying parameters which may be processed to produce a low distortion replica of a speech pattern.
  • Such parameters e.g., Linear Prediction Coefficient (LPC) or log area
  • LPC Linear Prediction Coefficient
  • Encoding of the LPC or log area parameters generally requires sampling at a rate of twice the bandwidth and quantizing each resulting frame of log area parameters.
  • Each frame of a log area parameter can be quantized using 48 bits. Consequently, 12 log area parameters each having a 50 Hz bandwidth results in a total bit rate of 4800 bits/sec.
  • U.S. Patent 4,349,700 discloses arrangements that permit recognition of speech patterns having diverse sound patterns utilizing dynamic programming.
  • U.S. Patent 4,038,503 discloses a technique for nonlinear warping of time intervals of speech patterns so that the sound features are represented in a more uniform manner. These arrangements, however, require storing and processing acoustic feature signals that are sampled at a rate corresponding to the most rapidly changing feature in the pattern. It is an object of the invention to provide an improved speech representation and/or speech synthesis arrangements having reduced digital storage and processing requirements.
  • log area parameter signals sampled at closely spaced time intervals have been used in speech synthesis to obtain efficient representation of a speech pattern.
  • log area parameters are transformed into a sequence of individual sound or speech event feature signals ())x(n) such that the log area parameters
  • the speech event feature signals ( ⁇ )k ⁇ n) are sequential and occur at the speech event rate of the pattern which is substantially lower than the the log area parameter frame rate.
  • p is the total number of log area parameters y,(n) determined by linear prediction analysis.
  • m corresponds to the number of speech events in the pattern
  • n is the index of samples in the speech pattern at the sampling rate of the log area parameters
  • ⁇ k (n) is the kth speech event signal at sampling instant n
  • a lk is a combining coefficient corresponding to the contribution of the kth speech event function to the ith log area parameter.
  • Equation (1) may be expressed in matrix form as where Y is a pxN matrix whose (i,n) element is y,(n), A is a pxm matrix whose (i,k) element is a ik , and ⁇ is an mx N matrix whose (k,n) element is ⁇ k (n). Since each speech event k occupies only a small segment of the speech pattern, the signal ⁇ k (n) representative thereof should be non-zero over only a small range of the sampling intervals of the total pattern.
  • Each log area parameter y,(n) in equation (1) is a linear combination of the speech event functions ⁇ k(n) and the bandwidth of each y,(n) parameter is the maximum bandwidth of any one of the speech event functions ⁇ k (n). It is therefore readily seen that the direct coding of y,(n) signals will take more bits than the coding of the ⁇ k (n) switch event signals and the combining coefficient signals a lk in equation (1).
  • Fig. 1 shows a flow chart illustrative of the general method of the invention.
  • a speech pattern is analyzed to form a sequence of signals representative of log area parameter acoustic feature signals.
  • LPC Partial Autocorrelation
  • PARCOR Partial Autocorrelation
  • Other speech features see, e.g., U.S. patent 3,624,302
  • the feature signals are then converted into a set of speech event representative signals that are encoded at a lower bit rate for transmission or storage.
  • box 101 is entered in which an electrical signal corresponding to a speech pattern is low pass filtered to remove unwanted higher frequency noise and speech components and the filtered signal is sampled at twice the low pass filtering cutoff frequency.
  • the speech pattern samples are then converted into a sequence of digitally coded signals corresponding to the pattern as per box 110. Since the storage required for the sample signals is too large for most practical applications, they are utilized to generate log area parameter signals as per box 120 by linear prediction techniques well known in the art.
  • the log area parameter signals y,(n) are produced at a constant sampling rate high enough to accurately represent the fastest expected event in the speech pattern. Typically, a sampling interval between two and five milliseconds is selected.
  • the times of occurrence of the successive speech events in the pattern are detected and signals representative of the event timing are generated and stored as per box 130. This is done by partitioning the pattern into prescribed smaller segments, e.g., 0.25 second intervals. For each successive interval having a beginning frame n b and an ending frame n e , a matrix of log area parameter signals is formed corresponding to the log area parameters y,(n) of the segment. The redundancy in the matrix is reduced by factoring out the first four principal components so that and
  • the first four principal components may be obtained by methods well known in the art such as described in the article "An Efficient Linear Prediction Vocoder" by M. R. Sambur appearing in the Bell System Technical Journal Vol. 54, No. 10, pp. 1693-1723, December 1975.
  • the resulting u m (n) functions may be linearly combined to define the desired speech event signals as by selecting coefficients b km such that each ⁇ k (n) are most compact in time.
  • the speech pattern is represented by a sequence of successive compact (minimum spreading) speech event feature signals ⁇ k (n) each of which can be efficiently coded.
  • a distance measure is minimized to choose the optimum ⁇ (n) and its location is obtained from a speech event timing signal
  • the combining coefficients a, k in equations (1) and (2) may be generated by minimizing the mean-squared error where M is the total number of speech events within the range of index n over which the sum is performed.
  • the partial derivatives of E with respect to the coefficients a lk are set equal to zero and the coefficients a lk are obtained from the set of simultaneous linear equations
  • Fig. 2 shows a speech coding arrangement that includes electroacoustic transducer 201, filter and sampler circuit 203, analog to digital converter 205, and speech sample store 210 which cooperate to convert a speech pattern into a stored sequence of digital codes representative of the pattern.
  • Central processor 275 may comprise a microprocessor such as the Motorola type MC68000 controlled by permanently stored instructions in read only memories (ROM) 215, 220, 225, 230 and 235.
  • ROM read only memories
  • Processor 275 is adapted to direct the operations of arithmetic processor 280, and stores 210, 240, 245, 250, 255 and 260 so that the digital codes from store 210 are compressed into a compact set of speech event feature signals.
  • the speech event feature signals are then supplied to utilization device 285 via input output interface 265.
  • the utilization device may be a digital communication facility or a storage arrangement for delayed transmission or a store associated with a speech synthesizer.
  • the Motorola MC68000 integrated circuit is described in the publication MC68000 16 Bit Microprocessor User's Manual, second edition, Motorola, Inc., 1980 and arithmetic processor 280 may comprise the TRW type MPY-16HJ integrated circuit.
  • a speech pattern is applied to electroacoustic transducer 201 and the electrical signal therefrom is supplied to low pass filter and sampler circuit 203 which is operative to limit the upper end of the signal bandwidth to 3.5 KHz and to sample the filtered signal at an 8 KHz rate.
  • Analog to digital converter 205 converts the sampled signal from filter and sampler 203 into a sequence of digital codes, each representative of the magnitude of a signal sample. The resulting digital codes are sequentially stored in speech sample store 210.
  • central processor 275 causes the instructions stored in log area parameter program store 215 to be transferred to the random access memory associated with the central processor.
  • the flow chart of Fig. 3 illustrates the sequence of operations performed by the controller responsive to the instructions from store 215.
  • box 305 is initially entered and frame count index n.is reset to 1.
  • the speech samples of the current frame are then transferred from store 210 to arithmetic processor 280 via central processor 275 as per box 310.
  • the occurrence of an end of speech sample signal is checked in decision box 315.
  • control is passed to box 325 and an LPC analysis is performed for the frame in processors 275 and 280.
  • the LPC parameter signals of the current frame are then converted to log area parameter signals y,(k) as per box 330 and the log area parameter signals are stored in log area parameter store 240 (box 335).
  • the frame count is incremented by one in box 345 and the speech samples of the next frame are read (box 310).
  • control is passed to box 320 and a signal corresponding to the number of frames in the pattern is stored in processor 275.
  • Central processor 275 is operative after the log area parameter storing operation is completed to transfer the stored instructions of ROM 220 into its random access memory.
  • the instruction codes from store 220 correspond to the operations illustrated in the flow chart of Figs. 4 and 5. These instruction codes are effective to generate a signal v(L) from which the occurrences of the speech events in the speech pattern may be detected and located.
  • the frame count of the log area parameters is initially reset in processor 275 as per box 403 and the log area parameters y,(n) for an initial time interval n, to n 2 of the speech pattern are transferred from log area parameter store 240 to processor 275 (box 410).
  • the log area parameters of the current time interval are then represented by from which a set of signals are to be obtained.
  • This is accomplished through use of the 8(L) function of equation 6.
  • a signal v(L) representative of the speech event timing of the speech pattern is then formed in accordance with equation 7 in box 430 and the v(L) signal is stored in timing parameter store 245.
  • Frame counter n is incremented by a constant value, e.g., 5, on the basis of how close adjacent speech event signals cp k (n) are expected to occur (box 435) and box 410 is reentered to generate the ⁇ k (n) and v(L) signals for the next time interval of the speech pattern.
  • Fig. 11 illustrates the speech event timing parameter signal for the an utterance exemplary message. Each negative going zero crossing in Fig. 11 corresponds to the centroid of a speech event feature signal wk(n).
  • box 501 is entered in which speech event index I is reset to zero and frame index n is again reset to one.
  • the successive frames of speech event timing parameter signal are read from store 245 (box 505) and zero crossings therein are detected in processor 275 as per box 510.
  • the speech event index I is incremented (box 515) and the speech event location frame is stored in speech event location store 250 (box 520).
  • the frame index n is then incremented in box 525 and a check is made for the end of the speech pattern frames in box 530.
  • box 505 is reentered from box 530 after each iteration to detect the subsequent speech event location frames of the pattern.
  • central processor 235 Upon detection of end of the speech pattern signal in box 530, central processor 235 addresses speech event feature signal generation program store 225 and causes its contents to be transferred to the processor. Central processor 275 and arithmetic processor 280 are thereby adapted to form a sequence of speech event feature signals ⁇ k (n) responsive to the log area parameter signals in store 240 and the speech event location signals in store 250.
  • the speech event feature signal generation instructions are illustrated in the flow chart of Fig. 6.
  • location index I is set to one as per box 601 and the locations of the speech events in store 250 are transferred to central processor 275 (box 605).
  • the limit frames for a prescribed number of speech event locations e.g., 5, are determined.
  • the log area parameters for the speech pattern interval defined by the limit frames are read from store 240 and are placed in a section of the memory of central processor 275 (box 615). The redundancy in the log area parameters is removed by factoring out the number of principal components therein corresponding to the number of prescribed number of events (box 620).
  • the speech event feature signal ⁇ L (n) for the current location L is generated.
  • Equation (6) The minimization of equation (6) to determine ⁇ L (n) is accomplished by forming the derivative where m is the prescribed number of speech events and r can be either 1, 2,..., or m.
  • the derivative of equation (13) is set equal to zero to determine the minimum and is obtained. From equation (14) so that equation (15) can be changed to ⁇ (n) in equation (17) can be replaced by the right side of equation 14.
  • Equation (22) can be expressed in matrix notation as where
  • Equation 25 has exactly m solutions and the solution which minimizes ⁇ (L) is the one for whichI ⁇ is minimum.
  • the speech event feature signal ⁇ L (n) is generated in box 625 and is stored in store 255. Until the end of the speech pattern is detected in decision box 635, the loop including boxes 605, 610, 615, 620, 625 and 630 is iterated so that the complete sequence of speech events for the speech pattern is formed.
  • Fig. 12 shows waveforms illustrating a speech pattern and the speech event feature signals generated therefrom in accordance with the invention.
  • Waveform 1201 corresponds to a portion of a speech pattern and waveforms 1205-1 through 1205-n correspond to the sequence of speech event feature signals ⁇ L (n) obtained from the waveform in the circuit of Fig. 2.
  • Each feature signal is representative of the acoustic characteristics of a speech event of the pattern of waveform 1201.
  • the speech event feature signals may be combined with coefficients a, k of equation 1 to reform log area parameter signals that are representative of the acoustic features of the speech pattern.
  • each speech event feature signal ⁇ I (n) is encoded and transferred to utilization device 285 as illustrated in the flow chart of Fig. 7.
  • Central processor is adapted to receive the speech event signal encoding program instruction set stored in ROM 235.
  • the speech event index I is reset to one as per box 701 and the speech event feature signal ⁇ I (n) is read from store 255.
  • the sampling rate R, for the current speech event feature signal is selected in box 710 by one of the many methods well known in the art.
  • the instruction codes perform a Fourier analysis and generate a signal corresponding to the upper band limit of the feature signal from which a sampling rate signal R, is determined.
  • the sampling rate need only be sufficient to adequately represent the feature signal.
  • a slowly changing feature signal may utilize a lower sampling rate than a rapidly changing feature signal and the sampling rate for each feature signal may be different.
  • a sampling rate signal has been determined for speech event feature signal ⁇ I (n), it is encoded at rate R, as per box 715.
  • Any of the well-known encoding schemes can be used. For example, each sample may be converted into a PCM, ADPCM or A modulated signal and concatenated with a signal indicative of the feature signal location in the speech pattern and a signal representative of the sampling rate R,.
  • the coded speech event feature signal is then transferred to utilization device 285 via input output interface 265.
  • Speech event index I is then incremented (box 720) and decision box 725 is entered to determine if the last speech event signal has been coded.
  • the loop including boxes 705 through 725 is iterated until the last speech event signal has been encoded (I>I F ) at which time the coding of the speech event feature signals is completed.
  • the speech event feature signals must be combined in accordance with equation 1 to form replicas of the log area feature signals therein. Accordingly, the combining coefficients for the speech pattern are generated and encoded as shown in the flow chart of Fig. 8. After the speech event feature signal encoding, central processor 275 is conditioned to read the contents of ROM 225. The instruction codes permanently stored in the ROM control the formation and encoding of the combining coefficients.
  • the combining coefficients are produced for the entire speech pattern by matrix processing in central processor 275 and arithmetic processor 280.
  • the log area parameters of the speech pattern are transferred to processor 275 as per box 801.
  • a speech event feature signal coefficient matrix G is generated (box 805) in accordance with and a Y- ⁇ correlation matrix C is formed (box 810) in accordance with
  • the combining coefficient matrix is then produced as per box 815 according to the relationship
  • the elements of matrix A are the combining coefficients a, k of equation 1. These combining coefficients are encoded, as is well known in the art, in box 820 and the encoded coefficients are transferred to utilization device 285.
  • the linear predictive parameters sampled at a rate corresponding to the most rapid change therein are converted into a sequence of speech event feature signals that are encoded at the much lower speech event occurrence rate and the speech pattern is further compressed to reduce transmission and storage requirements without adversely affecting intelligibility.
  • Utilization device 285 may be a communication facility connected to one of the many speech synthesizer circuits using an LPC all pole filter known in the art.
  • the circuit of Fig. 2 is adapted to compress a spoken message into a sequence of coded speech event feature signals which are transmitted via utilization device 285 to a synthesizer.
  • the speech event feature signals and the combining coefficients of the message are decoded and recombined to form the message log area parameter signals. These log area parameter signals are then utilized to produce a replica of the original message.
  • Fig. 9 depicts a block diagram of a speech synthesizer circuit illustrative of the invention and Fig. 10 shows a flow chart illustrating its operation.
  • Store 915 of Fig. 9 is adapted to store the successive coded speech event feature signals and combining signals received from utilization device 285 of Fig. 2 via line 901 and interface circuit 904.
  • Store 920 receives the sequence of excitation signals required for synthesis via line 903.
  • the excitation signals may comprise a succession of pitch period and voiced/unvoiced signals generated responsive to the voice message by methods well known in the art.
  • Microprocessor 910 is adapted to control the operation of the synthesizer and may be the aforementioned Motorola-type MC68000 integrated circuit.
  • LPC feature signal store 925 is utilized to store the successive log area parameter signals of the spoken message which are formed from the speech event feature signals and combining signals of store 915. Formation of a replica of the spoken message is accomplished in LPC synthesizer 930 responsive to the LPC feature signals from store 925 and the excitation signals from store 920 under control of microprocessor 910.
  • the synthesizer operation is directed by microprocessor 910 under control of permanently stored instruction codes resident in a read only memory associated therewith.
  • the operation of the synthesizer is described in the flow chart of Fig. 10. Referring to Fig. 10, the coded speech event feature signals, the corresponding combining signals, and the excitation signals of the spoken message are received by interface 904 and are transferred to speech event feature signal and combining coefficient signal store 915 and to excitation signal store 920 as per box 1010.
  • the log area parameter signal index I is then reset to one in processor 910 (box 1020) so that the reconstruction of the first log area feature signal y l (n) is initiated.
  • Speech event feature signal location counter L is reset to one by processor 910 as per box 1025 and the current speech event feature signal samples are read from store 915 (box 1030). The signal sample sequence is filtered to smooth the speech event feature signal as per (box 1035) and the current log area parameter signal is partially formed in box 1040. Speech event location counter L is incremented to address the next speech event feature signal in store 915 (box 1045) and the occurrence of the last feature signal is tested in decision box 1050. Until the last speech event feature signal has been processed, the loop including boxes 1030 through 1050 is iterated so that the current log area parameter signal is generated and stored in LPC feature signal store 925 under control of processor 910.
  • box 1055 is entered from box 1050 and the log area index signal I is incremented (box 1055) to initiate the formation of the next log area parameter signal.
  • the loop from box 1030 through box 1050 is reentered via decision box 1060.
  • processor 910 causes a replica of the spoken message to be formed in LPC synthesizer 930.
  • the synthesizer circuit of Fig. 9 may be readily modified to store the speech event feature signal sequences corresponding to a plurality of spoken messages and to selectively generate replicas of these messages by techniques well known in the art.
  • the speech event feature signal generating circuit of Fig. 2 may receive a sequence of predetermined spoken messages and utilization device 285 may comprise an arrangement to permanently store the speech event feature signals and corresponding combining coefficients for the messages and to generate a read only memory containing said spoken message speech event and combining signals.
  • the read only memory containing the coded speech event and combining signals can be inserted as store 915 in the synthesizer circuit of Fig. 9.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A speech pattern is compressed to a degree not previously attainable by analyzing the pattern to generate (210, 215, 275, 280) a sequence of signals representative of its acoustic feature at a first rate. Responsive to the acoustic feature feature signals, a sequence of speech event representative signals is generated (225, 275, 280). A sequence of coded signals corresponding to the speech pattern is formed (235, 275, 280) at rate less than said first rate responsive to the speech event representative signals.

Description

  • This invention relates to speech processing and, particularly, to the compression of speech patterns and to the synthesis of speech patterns from such compressed patterns.
  • It is generally accepted that a speech signal requires a bandwidth of at least 4 kHz for reasonable intelligibility. In digital speech processing systems such as speech synthesizers, recognizers, or coders, the channel capacity needed for transmission or memory required for storage of the digital elements of the full 4 kHz bandwidth waveform is very large. Many techniques have been devised to reduce the number of digital codes needed to represent a speech signal. Waveform coding such as Pulse Code Modulation (PCM), Differential Pulse Code Modulation (DPCM), Delta Modulation or adaptive predictive coding result in natural sounding, high quality speech at bit rates between 16 and 64 kbps. The speech quality obtained from waveform coders, however, degrades as the bit rate is reduced below 16 kbps.
  • An alternative speech coding technique disclosed in U.S. Patent 3,624,302 utilizes a small number, e.g., 12-16, of slowly varying parameters which may be processed to produce a low distortion replica of a speech pattern. Such parameters, e.g., Linear Prediction Coefficient (LPC) or log area, generated by linear prediction analysis can be spectrum limited to 50 Hz without significant band limiting distortion. Encoding of the LPC or log area parameters generally requires sampling at a rate of twice the bandwidth and quantizing each resulting frame of log area parameters. Each frame of a log area parameter can be quantized using 48 bits. Consequently, 12 log area parameters each having a 50 Hz bandwidth results in a total bit rate of 4800 bits/sec.
  • Further reduction of bandwidth decreases the bit rate, but the resulting increase in distortion interferes with the intelligibility of speech synthesized from the lower bandwidth parameters. It is well known that sounds in speech patterns do not occur at a uniform rate and techniques have been devised to take into account such nonuniform occurrences. U.S. Patent 4,349,700 discloses arrangements that permit recognition of speech patterns having diverse sound patterns utilizing dynamic programming. U.S. Patent 4,038,503 discloses a technique for nonlinear warping of time intervals of speech patterns so that the sound features are represented in a more uniform manner. These arrangements, however, require storing and processing acoustic feature signals that are sampled at a rate corresponding to the most rapidly changing feature in the pattern. It is an object of the invention to provide an improved speech representation and/or speech synthesis arrangements having reduced digital storage and processing requirements.
  • In IEEE Transactions on Communications, vol COM-30, No 4, April 82, pages 674-686, Viswanathan and others disclose a method as set out in the preamble of claim 1. A speech pattern is analysed by linear prediction encoding and the LPC parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission.
  • In the invention as set out in the claims high compression efficiency is combined with accurate reproduction by a coding procedure based on the individual sounds constituting the speech pattern and determined at the centroids of the individual sounds.
  • Description of the drawing
    • Fig. 1 depicts a flowchart illustrating the general method of the invention;
    • Fig. 2 depicts a block diagram of a speech pattern coding circuit illustrative of the invention;
    • Figs. 3-8 depict detailed flowcharts illustrating the operation of the circuit of Fig. 2;
    • Fig. 9 depicts a speech synthesizer illustrative of the invention;
    • Fig. 10 depicts a flow chart illustrating the operation of the circuit of Fig. 9;
    • Fig. 11 shows a waveform illustrating a speech event timing signal obtained in the circuit of Fig. 2; and
    • Fig. 12 shows waveforms illustrative of a speech pattern and the speech event feature signals associated therewith.
    General description
  • It is well known in the art to represent a speech pattern by a sequence of acoustic feature signals derived from a linear prediction or other spectral analysis. Log area parameter signals sampled at closely spaced time intervals have been used in speech synthesis to obtain efficient representation of a speech pattern. In accordance with the invention, log area parameters are transformed into a sequence of individual sound or speech event feature signals ())x(n) such that the log area parameters
    Figure imgb0001
    Figure imgb0002
  • The speech event feature signals ({)k<n) are sequential and occur at the speech event rate of the pattern which is substantially lower than the the log area parameter frame rate. In equation (1), p is the total number of log area parameters y,(n) determined by linear prediction analysis. m corresponds to the number of speech events in the pattern, n is the index of samples in the speech pattern at the sampling rate of the log area parameters, φk(n) is the kth speech event signal at sampling instant n, and alk is a combining coefficient corresponding to the contribution of the kth speech event function to the ith log area parameter. Equation (1) may be expressed in matrix form as
    Figure imgb0003
    where Y is a pxN matrix whose (i,n) element is y,(n), A is a pxm matrix whose (i,k) element is aik, and Φ is an mx N matrix whose (k,n) element is φk(n). Since each speech event k occupies only a small segment of the speech pattern, the signal φk(n) representative thereof should be non-zero over only a small range of the sampling intervals of the total pattern. Each log area parameter y,(n) in equation (1) is a linear combination of the speech event functions φk(n) and the bandwidth of each y,(n) parameter is the maximum bandwidth of any one of the speech event functions φk(n). It is therefore readily seen that the direct coding of y,(n) signals will take more bits than the coding of the φk(n) switch event signals and the combining coefficient signals alk in equation (1).
  • Fig. 1 shows a flow chart illustrative of the general method of the invention. In accordance with the invention, a speech pattern is analyzed to form a sequence of signals representative of log area parameter acoustic feature signals. It is to be understood, however, that LPC, Partial Autocorrelation (PARCOR) or other speech features (see, e.g., U.S. patent 3,624,302) may be used instead of log area parameters. The feature signals are then converted into a set of speech event representative signals that are encoded at a lower bit rate for transmission or storage.
  • With reference to Fig. 1, box 101 is entered in which an electrical signal corresponding to a speech pattern is low pass filtered to remove unwanted higher frequency noise and speech components and the filtered signal is sampled at twice the low pass filtering cutoff frequency. The speech pattern samples are then converted into a sequence of digitally coded signals corresponding to the pattern as per box 110. Since the storage required for the sample signals is too large for most practical applications, they are utilized to generate log area parameter signals as per box 120 by linear prediction techniques well known in the art. The log area parameter signals y,(n) are produced at a constant sampling rate high enough to accurately represent the fastest expected event in the speech pattern. Typically, a sampling interval between two and five milliseconds is selected.
  • After the log area parameter signals are stored, the times of occurrence of the successive speech events in the pattern are detected and signals representative of the event timing are generated and stored as per box 130. This is done by partitioning the pattern into prescribed smaller segments, e.g., 0.25 second intervals. For each successive interval having a beginning frame nb and an ending frame ne, a matrix of log area parameter signals is formed corresponding to the log area parameters y,(n) of the segment. The redundancy in the matrix is reduced by factoring out the first four principal components so that
    Figure imgb0004
    and
    Figure imgb0005
  • The first four principal components may be obtained by methods well known in the art such as described in the article "An Efficient Linear Prediction Vocoder" by M. R. Sambur appearing in the Bell System Technical Journal Vol. 54, No. 10, pp. 1693-1723, December 1975. The resulting um(n) functions may be linearly combined to define the desired speech event signals as
    Figure imgb0006
    by selecting coefficients bkm such that each φk(n) are most compact in time. In this way, the speech pattern is represented by a sequence of successive compact (minimum spreading) speech event feature signals φk(n) each of which can be efficiently coded. In order to obtain the shapes and locations of the speech event sianals, a distance measure
    Figure imgb0007
    is minimized to choose the optimum φ(n) and its location is obtained from a speech event timing signal
    Figure imgb0008
  • In terms of equations 5, 6, and 7, a speech event signal φk(n) with minimum spreading is centered at each negative zero crossing of v(L).
  • Subsequent to the generation of the v(L) signals in box 130, box 140 is entered and the speech event signals φk(n) are accurately determined using the process of box 130 with the speech event occurrence signals from the negative going zero crossings of v(L). Having generated the sequence of speech event representative signals, the combining coefficients a,k in equations (1) and (2) may be generated by minimizing the mean-squared error
    Figure imgb0009
    where M is the total number of speech events within the range of index n over which the sum is performed. The partial derivatives of E with respect to the coefficients alk are set equal to zero and the coefficients alk are obtained from the set of simultaneous linear equations
    Figure imgb0010
    Figure imgb0011
    Figure imgb0012
  • Detailed description
  • Fig. 2 shows a speech coding arrangement that includes electroacoustic transducer 201, filter and sampler circuit 203, analog to digital converter 205, and speech sample store 210 which cooperate to convert a speech pattern into a stored sequence of digital codes representative of the pattern. Central processor 275 may comprise a microprocessor such as the Motorola type MC68000 controlled by permanently stored instructions in read only memories (ROM) 215, 220, 225, 230 and 235. Processor 275 is adapted to direct the operations of arithmetic processor 280, and stores 210, 240, 245, 250, 255 and 260 so that the digital codes from store 210 are compressed into a compact set of speech event feature signals. The speech event feature signals are then supplied to utilization device 285 via input output interface 265. The utilization device may be a digital communication facility or a storage arrangement for delayed transmission or a store associated with a speech synthesizer. The Motorola MC68000 integrated circuit is described in the publication MC68000 16 Bit Microprocessor User's Manual, second edition, Motorola, Inc., 1980 and arithmetic processor 280 may comprise the TRW type MPY-16HJ integrated circuit.
  • Referring to Fig. 2, a speech pattern is applied to electroacoustic transducer 201 and the electrical signal therefrom is supplied to low pass filter and sampler circuit 203 which is operative to limit the upper end of the signal bandwidth to 3.5 KHz and to sample the filtered signal at an 8 KHz rate. Analog to digital converter 205 converts the sampled signal from filter and sampler 203 into a sequence of digital codes, each representative of the magnitude of a signal sample. The resulting digital codes are sequentially stored in speech sample store 210.
  • Subsequent to the storage of the sampled speech pattern codes in store 210, central processor 275 causes the instructions stored in log area parameter program store 215 to be transferred to the random access memory associated with the central processor. The flow chart of Fig. 3 illustrates the sequence of operations performed by the controller responsive to the instructions from store 215.
  • Referring to Fig. 3, box 305 is initially entered and frame count index n.is reset to 1. The speech samples of the current frame are then transferred from store 210 to arithmetic processor 280 via central processor 275 as per box 310. The occurrence of an end of speech sample signal is checked in decision box 315. Until the detection of the end of speech pattern signal, control is passed to box 325 and an LPC analysis is performed for the frame in processors 275 and 280. The LPC parameter signals of the current frame are then converted to log area parameter signals y,(k) as per box 330 and the log area parameter signals are stored in log area parameter store 240 (box 335). The frame count is incremented by one in box 345 and the speech samples of the next frame are read (box 310). When the end of speech pattern signal occurs, control is passed to box 320 and a signal corresponding to the number of frames in the pattern is stored in processor 275.
  • Central processor 275 is operative after the log area parameter storing operation is completed to transfer the stored instructions of ROM 220 into its random access memory. The instruction codes from store 220 correspond to the operations illustrated in the flow chart of Figs. 4 and 5. These instruction codes are effective to generate a signal v(L) from which the occurrences of the speech events in the speech pattern may be detected and located.
  • Referring to Fig. 4, the frame count of the log area parameters is initially reset in processor 275 as per box 403 and the log area parameters y,(n) for an initial time interval n, to n2 of the speech pattern are transferred from log area parameter store 240 to processor 275 (box 410). After determining whether the end of the speech pattern has been reached in decision box 415, box 420 is entered and the redundancy of the log area parameter signals is removed by factoring out the first four principal components u,(n), i=1,...,4 as aforementioned.
  • The log area parameters of the current time interval are then represented by
    Figure imgb0013
    from which a set of signals
    Figure imgb0014
    are to be obtained. The u,(n) signals over the interval may be combined through use of parameters b,, i=1,...,4, in box 425 so that a set of signals
    Figure imgb0015
    are produced such that φk is most compact over the range n1 to n2. This is accomplished through use of the 8(L) function of equation 6. A signal v(L) representative of the speech event timing of the speech pattern is then formed in accordance with equation 7 in box 430 and the v(L) signal is stored in timing parameter store 245. Frame counter n is incremented by a constant value, e.g., 5, on the basis of how close adjacent speech event signals cpk(n) are expected to occur (box 435) and box 410 is reentered to generate the φk(n) and v(L) signals for the next time interval of the speech pattern.
  • When the end of the speech pattern is detected in decision box 415, the frame count of the speech. pattern is stored (box 440) and the generation of the speech event timing parameter signal for the speech pattern is completed. Fig. 11 illustrates the speech event timing parameter signal for the an utterance exemplary message. Each negative going zero crossing in Fig. 11 corresponds to the centroid of a speech event feature signal wk(n).
  • Referring to Fig. 5, box 501 is entered in which speech event index I is reset to zero and frame index n is again reset to one. After indices I and n are initialized, the successive frames of speech event timing parameter signal are read from store 245 (box 505) and zero crossings therein are detected in processor 275 as per box 510. Whenever a zero crossing is found, the speech event index I is incremented (box 515) and the speech event location frame is stored in speech event location store 250 (box 520). The frame index n is then incremented in box 525 and a check is made for the end of the speech pattern frames in box 530. Until the end of speech pattern frames signal is detected, box 505 is reentered from box 530 after each iteration to detect the subsequent speech event location frames of the pattern.
  • Upon detection of end of the speech pattern signal in box 530, central processor 235 addresses speech event feature signal generation program store 225 and causes its contents to be transferred to the processor. Central processor 275 and arithmetic processor 280 are thereby adapted to form a sequence of speech event feature signals φk(n) responsive to the log area parameter signals in store 240 and the speech event location signals in store 250. The speech event feature signal generation instructions are illustrated in the flow chart of Fig. 6.
  • Initially, location index I is set to one as per box 601 and the locations of the speech events in store 250 are transferred to central processor 275 (box 605). As per box 610, the limit frames for a prescribed number of speech event locations, e.g., 5, are determined. The log area parameters for the speech pattern interval defined by the limit frames are read from store 240 and are placed in a section of the memory of central processor 275 (box 615). The redundancy in the log area parameters is removed by factoring out the number of principal components therein corresponding to the number of prescribed number of events (box 620). Immediately thereafter, the speech event feature signal φL(n) for the current location L is generated.
  • The minimization of equation (6) to determine φL(n) is accomplished by forming the derivative
    Figure imgb0016
    where
    Figure imgb0017
    m is the prescribed number of speech events and r can be either 1, 2,..., or m. The derivative of equation (13) is set equal to zero to determine the minimum and
    Figure imgb0018
    is obtained. From equation (14)
    Figure imgb0019
    so that equation (15) can be changed to
    Figure imgb0020
    φ(n) in equation (17) can be replaced by the right side of equation 14. Thus,
    Figure imgb0021
    where
    Figure imgb0022
    Rearranging equation (18) yields
    Figure imgb0023
    Since ui(n) is the principal component of matrix Y,
    Figure imgb0024
    equation (20) can be simplified to
    Figure imgb0025
    where
    Figure imgb0026
    Equation (22) can be expressed in matrix notation as
    Figure imgb0027
    where
    Figure imgb0028
  • Equation 25 has exactly m solutions and the solution which minimizes θ(L) is the one for whichI λ is minimum. The coefficients b11 b2,..., bm for which λ=θ(L) attains its minimum value results in the optimum speech event feature signal φL(n).
  • In Fig. 6, the speech event feature signal φL(n) is generated in box 625 and is stored in store 255. Until the end of the speech pattern is detected in decision box 635, the loop including boxes 605, 610, 615, 620, 625 and 630 is iterated so that the complete sequence of speech events for the speech pattern is formed.
  • Fig. 12 shows waveforms illustrating a speech pattern and the speech event feature signals generated therefrom in accordance with the invention. Waveform 1201 corresponds to a portion of a speech pattern and waveforms 1205-1 through 1205-n correspond to the sequence of speech event feature signals φL(n) obtained from the waveform in the circuit of Fig. 2. Each feature signal is representative of the acoustic characteristics of a speech event of the pattern of waveform 1201. The speech event feature signals may be combined with coefficients a,k of equation 1 to reform log area parameter signals that are representative of the acoustic features of the speech pattern.
  • Upon completion of the operations shown in Fig. 6, the sequence of speech event feature signals for the speech pattern is stored in store 255. Each speech event feature signal φI(n) is encoded and transferred to utilization device 285 as illustrated in the flow chart of Fig. 7. Central processor is adapted to receive the speech event signal encoding program instruction set stored in ROM 235.
  • Referring to Fig. 7, the speech event index I is reset to one as per box 701 and the speech event feature signal φI(n) is read from store 255. The sampling rate R, for the current speech event feature signal is selected in box 710 by one of the many methods well known in the art. For example, the instruction codes perform a Fourier analysis and generate a signal corresponding to the upper band limit of the feature signal from which a sampling rate signal R, is determined. As is well known in the art, the sampling rate need only be sufficient to adequately represent the feature signal. Thus, a slowly changing feature signal may utilize a lower sampling rate than a rapidly changing feature signal and the sampling rate for each feature signal may be different.
  • Once a sampling rate signal has been determined for speech event feature signal φI(n), it is encoded at rate R, as per box 715. Any of the well-known encoding schemes can be used. For example, each sample may be converted into a PCM, ADPCM or A modulated signal and concatenated with a signal indicative of the feature signal location in the speech pattern and a signal representative of the sampling rate R,. The coded speech event feature signal is then transferred to utilization device 285 via input output interface 265. Speech event index I is then incremented (box 720) and decision box 725 is entered to determine if the last speech event signal has been coded. The loop including boxes 705 through 725 is iterated until the last speech event signal has been encoded (I>IF) at which time the coding of the speech event feature signals is completed.
  • The speech event feature signals must be combined in accordance with equation 1 to form replicas of the log area feature signals therein. Accordingly, the combining coefficients for the speech pattern are generated and encoded as shown in the flow chart of Fig. 8. After the speech event feature signal encoding, central processor 275 is conditioned to read the contents of ROM 225. The instruction codes permanently stored in the ROM control the formation and encoding of the combining coefficients.
  • The combining coefficients are produced for the entire speech pattern by matrix processing in central processor 275 and arithmetic processor 280. Referring to Fig. 8, the log area parameters of the speech pattern are transferred to processor 275 as per box 801. A speech event feature signal coefficient matrix G is generated (box 805) in accordance with
    Figure imgb0029
    and a Y-φ correlation matrix C is formed (box 810) in accordance with
    Figure imgb0030
    The combining coefficient matrix is then produced as per box 815 according to the relationship
    Figure imgb0031
  • The elements of matrix A are the combining coefficients a,k of equation 1. These combining coefficients are encoded, as is well known in the art, in box 820 and the encoded coefficients are transferred to utilization device 285.
  • In accordance with the invention, the linear predictive parameters sampled at a rate corresponding to the most rapid change therein are converted into a sequence of speech event feature signals that are encoded at the much lower speech event occurrence rate and the speech pattern is further compressed to reduce transmission and storage requirements without adversely affecting intelligibility. Utilization device 285 may be a communication facility connected to one of the many speech synthesizer circuits using an LPC all pole filter known in the art.
  • The circuit of Fig. 2 is adapted to compress a spoken message into a sequence of coded speech event feature signals which are transmitted via utilization device 285 to a synthesizer. In the synthesizer, the speech event feature signals and the combining coefficients of the message are decoded and recombined to form the message log area parameter signals. These log area parameter signals are then utilized to produce a replica of the original message.
  • Fig. 9 depicts a block diagram of a speech synthesizer circuit illustrative of the invention and Fig. 10 shows a flow chart illustrating its operation. Store 915 of Fig. 9 is adapted to store the successive coded speech event feature signals and combining signals received from utilization device 285 of Fig. 2 via line 901 and interface circuit 904. Store 920 receives the sequence of excitation signals required for synthesis via line 903. The excitation signals may comprise a succession of pitch period and voiced/unvoiced signals generated responsive to the voice message by methods well known in the art. Microprocessor 910 is adapted to control the operation of the synthesizer and may be the aforementioned Motorola-type MC68000 integrated circuit. LPC feature signal store 925 is utilized to store the successive log area parameter signals of the spoken message which are formed from the speech event feature signals and combining signals of store 915. Formation of a replica of the spoken message is accomplished in LPC synthesizer 930 responsive to the LPC feature signals from store 925 and the excitation signals from store 920 under control of microprocessor 910.
  • The synthesizer operation is directed by microprocessor 910 under control of permanently stored instruction codes resident in a read only memory associated therewith. The operation of the synthesizer is described in the flow chart of Fig. 10. Referring to Fig. 10, the coded speech event feature signals, the corresponding combining signals, and the excitation signals of the spoken message are received by interface 904 and are transferred to speech event feature signal and combining coefficient signal store 915 and to excitation signal store 920 as per box 1010. The log area parameter signal index I is then reset to one in processor 910 (box 1020) so that the reconstruction of the first log area feature signal yl(n) is initiated.
  • The formation of the log area signal requires combining the speech event feature signals with the combining coefficients of index I in accordance with equation 1. Speech event feature signal location counter L is reset to one by processor 910 as per box 1025 and the current speech event feature signal samples are read from store 915 (box 1030). The signal sample sequence is filtered to smooth the speech event feature signal as per (box 1035) and the current log area parameter signal is partially formed in box 1040. Speech event location counter L is incremented to address the next speech event feature signal in store 915 (box 1045) and the occurrence of the last feature signal is tested in decision box 1050. Until the last speech event feature signal has been processed, the loop including boxes 1030 through 1050 is iterated so that the current log area parameter signal is generated and stored in LPC feature signal store 925 under control of processor 910.
  • Upon storage of a log area feature signal in store 925, box 1055 is entered from box 1050 and the log area index signal I is incremented (box 1055) to initiate the formation of the next log area parameter signal. The loop from box 1030 through box 1050 is reentered via decision box 1060. After the last log area parameter signal is stored, processor 910 causes a replica of the spoken message to be formed in LPC synthesizer 930.
  • The synthesizer circuit of Fig. 9 may be readily modified to store the speech event feature signal sequences corresponding to a plurality of spoken messages and to selectively generate replicas of these messages by techniques well known in the art. For such an arrangement, the speech event feature signal generating circuit of Fig. 2 may receive a sequence of predetermined spoken messages and utilization device 285 may comprise an arrangement to permanently store the speech event feature signals and corresponding combining coefficients for the messages and to generate a read only memory containing said spoken message speech event and combining signals. The read only memory containing the coded speech event and combining signals can be inserted as store 915 in the synthesizer circuit of Fig. 9.

Claims (11)

1. A method for compressing speech patterns including the steps of: analyzing (101, 110, 120) a speech pattern to derive a set of signals (y,(n)) representative of acoustic features of the speech pattern at a first rate, generating (130, 140, 150) a sequence of coded signals representative of said speech pattern in response to said set of acoustic feature signals at a second rate less than said first rate, characterized in that the generating step includes: generating (420, 425) a sequence of signals (4)k(n)) each representative of an . individual sound of said speech pattern, each being a linear combination of said acoustic feature signals; determining (510) the time frames of the speech pattern at which the centroids of individual sounds occur in response to said set of acoustic feature signals; generating (625) a sequence of individual sound feature signals (φL(I)(n)) jointly responsive to said acoustic feature signals and said centroid time frame determination; generating (805-815) a set of individual sound representative signal combining coefficients (a,k) jointly responsive to said individual sound representative signals and said acoustic feature signals; and forming said coded signal responsive to said sequence of individual sound feature signals (715) and said combining coefficients (820).
2. A method for compressing speech patterns, as claimed in claim 1, wherein the step of determining the time frames of the speech pattern at which the centroids of individual sounds occur comprises producing (430) a signal (v(L)) representative of the timing of the individual sounds in said speech pattern responsive to the acoustic feature signals of the speech pattern, and detecting each negative going zero crossing in said individual sound time signal.
3. A method for compressing speech patterns as claimed in claim 1 or claim 2 wherein said coded signal forming step comprises generating (710) a signal representative of the bandwidth of each speech representative signal; sampling said speech event feature signal at a rate corresponding to its bandwidth representative signal; coding (715) each sampled speech event feature signal; and producing a sequence of encoded speech event coded signals at a rate corresponding to the rate of occurrence of speech events in said speech pattern.
4. A method for compressing speech patterns as claimed in any of the preceding claims wherein, said acoustic feature signals are linear predictive parameter signals representative of the speech pattern.
5. A method for compressing speech patterns as claimed in claim 4 wherein said linear predictive parameter signals are log area parameter signals representative of the speech pattern.
6. A method for compressing speech patterns as claimed in claim 4 wherein said linear predictive parameter signals are partial autocorrelation signals representative of the speech pattern.
7. Apparatus for compressing speech patterns, including means (210, 215, 225, 280) for analyzing a speech pattern to derive a set of signals representative of acoustic features of the speech pattern at a first rate and means (220-260) for generating a sequence of coded signals representative of said speech pattern in response to said set of acoustic feature signals at a second rate less than said first rate, characterized in that the generating means includes: means (220) for generating a sequence of signals ((Pk(n)) each representative of an individual sound of said speech pattern, each being a linear combination of said acoustic feature signals and determining the time frames of the speech pattern at which the centroids of individual sounds occur in response to said set of acoustic feature signals, means (230) for generating a set of individual sound representative signal combining coefficients (a,k) jointly responsive to said individual sound representative signals and said acoustic feature signals, means (225) for generating a sequence of individual sound feature signals (φL(I)(n)) jointly responsive to said acoustic feature signals and said centroid time frame determination, and means (235) for forming said coded signal responsive to said sequence of individual sound feature signals and said combining coefficients.
8. Apparatus for compressing speech patterns as claimed in claim 7, wherein the means for determining the time frames of the speech pattern at which the centroids of individual sounds occur comprises means (220) for producing a signal representative of the timing of the individual sounds in said speech pattern responsive to the acoustic feature signals of the speech pattern, and detecting each negative going zero crossing in said individual sound time signal.
9. Apparatus for compressing speech patterns as claimed in claim 7 or claim 8, wherein the means for forming a signal comprises means (part of 235) for generating a signal representative of the bandwidth of each speech representative signal; means (part of 235) for sampling each individual sound articulatory configuration representative signal in said speech pattern at a rate corresponding to its bandwidth signal; means (part of 235) for coding each individual sound articulatory configuration representative signal; and means (part of 235) for producing a sequence of said coded individual sound articulatory configuration representative sample signals at a rate corresponding to the individual sound articulatory configuration representative signal bandwidths.
10. Apparatus as claimed in any of claims 7 to 9, wherein the means for analyzing a speech pattern comprises means (210, 215, 275, 280) for generating a set of linear predictive parameter signals representative of the acoustic features of the speech pattern.
11. Apparatus as claimed in any of claims 7 to 10 including means (285 or 910-930) for generating a speech pattern from the coded signal.
EP19840901491 1983-04-12 1984-03-12 Speech pattern processing utilizing speech pattern compression Expired EP0138954B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US48423183A 1983-04-12 1983-04-12
US484231 1983-04-12

Publications (3)

Publication Number Publication Date
EP0138954A1 EP0138954A1 (en) 1985-05-02
EP0138954A4 EP0138954A4 (en) 1985-11-07
EP0138954B1 true EP0138954B1 (en) 1988-10-26

Family

ID=23923295

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19840901491 Expired EP0138954B1 (en) 1983-04-12 1984-03-12 Speech pattern processing utilizing speech pattern compression

Country Status (5)

Country Link
EP (1) EP0138954B1 (en)
JP (1) JP2648138B2 (en)
CA (1) CA1201533A (en)
DE (1) DE3474873D1 (en)
WO (1) WO1984004194A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8503304A (en) * 1985-11-29 1987-06-16 Philips Nv METHOD AND APPARATUS FOR SEGMENTING AN ELECTRIC SIGNAL FROM AN ACOUSTIC SIGNAL, FOR EXAMPLE, A VOICE SIGNAL.

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3598921A (en) * 1969-04-04 1971-08-10 Nasa Method and apparatus for data compression by a decreasing slope threshold test
US3715512A (en) * 1971-12-20 1973-02-06 Bell Telephone Labor Inc Adaptive predictive speech signal coding system
JPS595916B2 (en) * 1975-02-13 1984-02-07 日本電気株式会社 Speech splitting/synthesizing device
JPS5326761A (en) * 1976-08-26 1978-03-13 Babcock Hitachi Kk Injecting device for reducing agent for nox
US4280192A (en) * 1977-01-07 1981-07-21 Moll Edward W Minimum space digital storage of analog information
FR2412987A1 (en) * 1977-12-23 1979-07-20 Ibm France PROCESS FOR COMPRESSION OF DATA RELATING TO THE VOICE SIGNAL AND DEVICE IMPLEMENTING THIS PROCEDURE

Also Published As

Publication number Publication date
EP0138954A1 (en) 1985-05-02
DE3474873D1 (en) 1988-12-01
WO1984004194A1 (en) 1984-10-25
CA1201533A (en) 1986-03-04
EP0138954A4 (en) 1985-11-07
JP2648138B2 (en) 1997-08-27
JPS60501076A (en) 1985-07-11

Similar Documents

Publication Publication Date Title
US4472832A (en) Digital speech coder
US4701954A (en) Multipulse LPC speech processing arrangement
US5495556A (en) Speech synthesizing method and apparatus therefor
KR100427753B1 (en) Method and apparatus for reproducing voice signal, method and apparatus for voice decoding, method and apparatus for voice synthesis and portable wireless terminal apparatus
US7191125B2 (en) Method and apparatus for high performance low bit-rate coding of unvoiced speech
US5018200A (en) Communication system capable of improving a speech quality by classifying speech signals
EP0342687B1 (en) Coded speech communication system having code books for synthesizing small-amplitude components
USRE32580E (en) Digital speech coder
EP0232456A1 (en) Digital speech processor using arbitrary excitation coding
US4991215A (en) Multi-pulse coding apparatus with a reduced bit rate
US6141637A (en) Speech signal encoding and decoding system, speech encoding apparatus, speech decoding apparatus, speech encoding and decoding method, and storage medium storing a program for carrying out the method
US4764963A (en) Speech pattern compression arrangement utilizing speech event identification
US5839098A (en) Speech coder methods and systems
US5621853A (en) Burst excited linear prediction
EP0138954B1 (en) Speech pattern processing utilizing speech pattern compression
Dankberg et al. Development of a 4.8-9.6 kbps RELP Vocoder
Rebolledo et al. A multirate voice digitizer based upon vector quantization
JP3166673B2 (en) Vocoder encoding / decoding device
JPH0480400B2 (en)
JP3271966B2 (en) Encoding device and encoding method
EP0987680A1 (en) Audio signal processing
WO2001009880A1 (en) Multimode vselp speech coder
GB2266213A (en) Digital signal coding
JPH11500837A (en) Signal prediction method and apparatus for speech coder
KR19980035867A (en) Speech data encoding / decoding device and method

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): DE FR GB

17P Request for examination filed

Effective date: 19850326

17Q First examination report despatched

Effective date: 19870317

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 3474873

Country of ref document: DE

Date of ref document: 19881201

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20030224

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20030225

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20030310

Year of fee payment: 20

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20040311

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20