US2243089A - System for the artificial production of vocal or other sounds - Google Patents

System for the artificial production of vocal or other sounds Download PDF

Info

Publication number
US2243089A
US2243089A US273429A US27342939A US2243089A US 2243089 A US2243089 A US 2243089A US 273429 A US273429 A US 273429A US 27342939 A US27342939 A US 27342939A US 2243089 A US2243089 A US 2243089A
Authority
US
United States
Prior art keywords
frequency
waves
speech
wave
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US273429A
Inventor
Homer W Dudley
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Priority to US273429A priority Critical patent/US2243089A/en
Priority to FR865087D priority patent/FR865087A/en
Application granted granted Critical
Publication of US2243089A publication Critical patent/US2243089A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Definitions

  • y 'I'he invention relates to the genera1 type -of wave control or wave production that is disclosed in my United-States Patent 2,151,091, granted March 21, 1939, and for its general object it contemplates certain improvements or modifications of the basic system and method therein disclosed.
  • a wave having the characteristics of speech may be produced artificially using as raw material wave energy having a frequency distribution which may be uniform over a requisitely wide frequencyl range, and, therefore, lacking in character or intelligibility.
  • the wave construction is'accomplished by modifying such raw material wave energy in conformance with the fundamental pitch and energy distribution relations which characterize ⁇ and methods involved, reference is made to the patent disclosure.
  • the invention contemplates .specically two general ways of changing the type or character of the'reproduced wave.
  • the wave used as raw material in reconstructing the output Wave itself possesses or has imparted to it certain modifications or ycharacteristics before it is acted upon by the products vof analysis of the original Wave or in such manner that it aects the character of the ,reconstructed Wave.
  • a wave of either uniform tion with each other is imparted to it.
  • the present invention may also be used similarly y to the system of my prior patent, but the point of view of the present invention is directed more to securing a deliberate change of character between the analyzed Waves and the reconstructed waves.
  • Such change in character may be desirable for many purposes, such as rendering speech more understandable in the presence of noise or because of other conditions making reception difficult, or for purposes of imitation, imr personation, entertainment, research in speech or hearing, voice or musical training or instruction and kindred purposes.
  • Fig. 1 is a schematic circuit diagram of a complete system according to the invention.
  • Eig. 24 shows a modiiication which may replace the portion of 'Fig. 1 to the right of andbelow the broken line H-II;
  • Fig. 3 shows a diagram of frequency relations to be discussed.v
  • a frequency pattern control circuit FP which comprises but one channel FP1 discriminates as to the frequency pattern. This discrimination includes discrimination as to the fundamental frequency when there is one.
  • the amplitude pattern control circuit branchesv into a number of channels, for example, ten channels AP1 to AP1o and determines what amplitude pattern we have. For simplicity, channels AP4 to APs are omitted from the drawing.
  • any suitable trans'- mitting medium such for example, as lines Lo to Lio
  • This transmitting medium may have a limited frequency range of transmission, of much less Width than the frequency range of the speech signals to be communicated. If desired, it may l be a single conductor pair or a radio link, the transmission through this medium then being,
  • the frequency pattern control channel FP1 is a circuit that analyzes speech ,sounds on the basis of the above classification and which, within certain limits, delivers to, a load a wave whose frequency spectrum is of the same class as that of the speech sound applied to the input of the circuit.
  • the circuit operates so that when no signal V or an unvoiced speech sound (Class 2, above) ⁇ is have our reproduction or reconstruction of the original speech signal for any further transmission in the ordinary manner, or for reproduction of sound by loud-speaker 5.
  • the system as shown uses a 275 cycle total transmissionband in the transmission medium between the transmitting and receiving ends of the system, that is, in the lines Lo-to Lio.
  • This 275 cycle band is on the basis of eleven channels, each of 25 cycle pass-band. Ten of these are for amplitude pattern control and the other one is for frequency pattern control. As brought out in my above-mentioned patent, such a system is adequate for good quality transmission of speech.
  • the frequency pattern control channel FP1 will firstbe described with reference to its adjustment for use in natural reproduction of speech. It is a. circuit for analyzing and 'reproducing the frequency spectrum of the source of energy in speech sounds. For speech applied to its input from line l 'it delivers an output wave that has discrete components andis of the same fundamental frequency as the input when a voiced speech sound is applied, and an output Wave with a continuous spectrum when It per- First, at the transmitting or analyzing end of the system it derives from the speechy signal thefundamental or vocal cord frequency and expresses this as a current, the amplitude or magnitude of which is proportional to the fundamental frequency. Next, at the receiving or reproducing end of the system it uses this current to control the frequency set up by a relaxation oscillator so as to get back a wave of the original fundamental frequency,
  • the sounds of speech may be divided into three classes:
  • random noise from a ⁇ gasfilled tube amplifier is supplied to the load.
  • a voice or mixed speech sound (Classes 1 and 3, above) is applied to the input, the noise voltage is removed from the load and a wave of vthe s ame fundamental frequency as the 'speech sound and any desired frequency spectrum is supplied to the load.
  • the analyzing circuit of the frequency pattern control channel FP1, or the portion of this channel at the sending or analyzing end of the system, will first be described. It comprises a detector D which may be, for example, a full-wave copper-oxide rectifier, an attenuation discrimination network or so-ca'lled equalizer E1 having its loss increasing with frequency, a frequency measuring circuit FM and a 25 cycle low-pass filter Fao. fed through the rectifier D, which feeds the equalizer E1, which in turn feeds the frequency measuring circuit FM.
  • a detector D which may be, for example, a full-wave copper-oxide rectifier, an attenuation discrimination network or so-ca'lled equalizer E1 having its loss increasing with frequency, a frequency measuring circuit FM and a 25 cycle low-pass filter Fao. fed through the rectifier D, which feeds the equalizer E1, which in turn feeds the frequency measuring circuit FM.
  • This frequency measuring circuit may be any suitable circuit for delivering through the low-pass filter Fao a direct current that depends on the number of reversals per second of direction of the voltage wave applied to this circuit and is independent of itsy amplitude as long as the amplitude exceeds a certain threshold value.
  • this circuit may have the form shown in Fig. 2, of Patent 2,183,248 of R. R. Riesz issued Dec. 12, 1939. This form is preferred on account of its being stable and free from singing, free from false operation at high frequencies, positive in action upon application of the input wave, and economical of plate battery power.
  • the rectifier D modulates the various harmonics to give a.
  • the equalizer may be any suitable network having its loss increasing with frequency so as to insure that the fundamental frequency, which may vary for example from about to some A30() cycles, comes out at a high power level compared to any upper harmonics that may Ibe present.
  • the equalizer may practically cut off transmission abovea frequency' in the neighborhood of 300 cycles, for example.
  • the attenuation discrimination of the equalizer purifies the ⁇ fundamental tone though it may vary substantially more than an octave.
  • the level of the unvoiced sounds must be ad- The speech currents from line l are justed. to a value too low to cause operation of the frequency measuring circuit.
  • a voice amplifier G with its gain adjltable, may be provided, for example in the channel APi as shown in Fig. 1. ⁇
  • the direct current delivered by the frequency measuring circuit through the low-pass filter Fan may be made substantially directly proportional to the fundamental frequency applied to the frequency pattern control channel from line 1..
  • the direct current delivered by the 'low-pass filter Fao which may be substantially directly .proportional to the fundamental frequency ap- ⁇ plie'd to the frequency pattern control channel from. line I is transmitted-through line Lo to the energy source ⁇ of frequency patterns FPS. 'Ihe latter comprises a relaxation oscillator 40 and a noise source 'including gas-filled tube 4I followed by switching amplifier 43.
  • the Vrelaxation oscillator comprises gas-filled tube 60 together with circuit elements as described more fully in the Ries'z patent above recurrents to the receiving or reproducing end.
  • the noise source 42 comprises gas-filled tube 4I having its grid tied to the cathode and having suitable resistances in its plate circuit as shown together with a plate battery.
  • 'Ihis tube while shown as a triode may advantageously be a multigrid tube, such as a pentode or tetrode. It is found that this type of circuit produces a continuous energy spectrum of noise in the audio range.
  • an equalizer or amplitude limiter (not shown) may be included between tube 4I and switching amplifier 43 to make the output flat over the frequency band.
  • a resistance source of noise may be used as disclosed in the Riesz patent referred to. i
  • the function of the switching amplifier is to determine whether or not the continuous spectrum of noise is permitted to pass through to the circuit I4. This is accomplished under control of currents supplied across the grid resistor I3 from the control line L0.
  • the bias Voltage on the switching amplifier in the absence of any voltage drop across resistance I3 is insuflicient to block transmission. When voiced sounds are impressed on the system the-voltage developed across resistancev I3 is suflcient to blocktransmission.
  • the initial grid bias on the relaxation oscillator I6 is sufficiently negative so that in the absence of voltage applied across resistance I2 the tube will not oscillate.
  • the resulting direct current voltage across resistance I2 is of the right magnitude and sign to cause the tube to oscillate at some low frequency for a weak input current and at higher frequencies for stronger input current.
  • the speech bands chosen may be, for instance, one lband from 0 to 250 cycles and nine adjacent bands each 300 cycles wide, starting at 250 cycles. These bands are selected by filters F1 to F1o in the amplitude pattern contr'ol channels AP1 to APm, respectively.
  • the channel AP1 transmits information about the amplitudes in the speech range 0 to 250 cycles
  • the channel APs transmits information about the amplitudesin the speech range 250 to 550 cycles
  • the channel APa information about the amplitudes in the range 550 to 850 cycles, etc.
  • the output from the 0 to 250 cycle speech band-passv lter F1 is fed to detector D1, which may be, for instance, like the detector D. 'I'he syllabic frequencies in the output from the detector are passed through a 25 cycle low-pass filter F31 and the resulting variable direct current is passed through line L1.
  • This variable direct current is then applied to a biasing resistor B to give a grid bias to a .signal shaping network or push-pull variable gain amplifier SN1, which accordinglyvaries its gain in amplifying the waves received from the energy source of frequency, patterns FPS through 0 to 250 cycle speech band-pass filter F1', so that the average power in this band of Waves varies in accordance with the average power inthe corresponding band of the speech signals.
  • a .signal shaping network or push-pull variable gain amplifier SN1 which accordinglyvaries its gain in amplifying the waves received from the energy source of frequency, patterns FPS through 0 to 250 cycle speech band-pass filter F1', so that the average power in this band of Waves varies in accordance with the average power inthe corresponding band of the speech signals.
  • 'Ihe energy from the amplifier SN1 is then fed through a 0 to 250 cycle speech band-pass filter F1" tothe speech receiving circuit 4, Where it .is combined with the outputs from nine other speech band-pass filters of channels
  • the filters F1 and F1 may have the same pass-band as filter F1, and that channels
  • the equalizer 46 renders all of the harmonic components of the current wave of the relaxation oscillator 40 equal in amplitude. This may be followed by amplifier 41, if desired.
  • the delay in the frequency pattern control channel and in all of the amplitude pattern control channels it is desirable to have the delay in the frequency pattern control channel and in all of the amplitude pattern control channels the same. -If the frequency pattern control channel FP1 tends to vhave more inherent delay thanthe amplitude pattern control channels, it is desirablel to introduce a certain amount of deylay in -the amplitude pattern control circuit AF they may be sent through any other type of channel such as carrier channels or on a time division basis, as in my prior Patents 2,151,091 and 2,098,956 November' 16, 1937, or in other suitable manner.
  • These waves are of the type to give a voiced sound or of the type to give an unvoiced sound, as determined by the frequency. pattern channel.
  • the result is a reconstructed speech or sound corresponding to the original.
  • the present invention contemplates the use of other sources of waves for purposes of reconstruction of sounds.
  • the drawing shows a phonograph 24 with record 25 and picktacts 23.
  • The'switch I5 may beeither closed or open depending on whether the noise waves are to be used or not, An adjusted amountof the waves from the record 25 or microphone 21 can be passed directly to the output circuit 4 by closing switch S3 and adjusting pad 30.
  • variable gain amplifier like SN1 'in the leads from the microphone 21 and fromie reproducer 26 and connected to channel Lo at terminals X in the same way that SNi is connected to channel L1, so that sounds from the phonograph or microphone 21 are suppressed when only unvoiced sounds are spoken into microphone M.
  • This variable gain amplifier is preferablyadjusted to have an abrupt cut-off so that it is unblocked and opened for transmission in responsento even a small amplitude current, in line Lo, the amplitude modulation occurring principally in the SN circuits of the synthesizer.
  • This arrangement permits the noise to be applied through switch I5 by itself for the unvoiced sounds, the phonograph sounds being cut off momen-tarily just as the relaxation oscillator energy is suppressed when only unvoiced sounds are to be reproduced,
  • the waves applied to circuit I determine which of the SN circuits are unblocked andthe extent to which they are individually unblocked. These waves will be referred to as the spectrum control waves or spectrum since they determine the amplitude pattern or amplitude distribution over the frequency band.
  • the waves introduced at I0 will be called the pitch control or pitch waves since they furnish the actual frequencies passing through the SN circuits. For example, if a talker had a fundamental frequency of 220, the harmonics supplied from oscillator I0 would be multiples of 220 whereas another talker might have a fundamental, of, say 190 and the harmonies supplied from oscillator would be multiples of 190. supplying music waves -to circuit I 0 would fix the pitch of the reconstructed speech. In either case the spectrum waves would not affect the I pitch.
  • 'I'he pitch may be supplied by one talker at 21 and the spectrum by a second talker at M. It is understood, of course, that both talkers may be from recordings, a suitable reproducer being used Iin place of microphone M.
  • the pitch may be supplied from a record, orchestra or other instrumental music and the spectrum from a talkerat M. In this case the speech is automatically made into a song.
  • the pitch may be supplied from a singer, ac-
  • the pitch may be supplied by one singer and the spectrum from a person whispering'at M.
  • Variation may be introduced into either or both of the pitch and spectrum waves to secure vibrato or tremolo effects.
  • a bass voice contains harmonics of the order of 7,000 cycles or higher.
  • the principal difference between a. mans voice and a womans voice is in the fundamental pitch.
  • Fig. 2 Probably the main reathe analyzer. 'I'his is indicated in Fig. 2 in which each of the filters F1' F1o and the correspond- A- ing lters F1" F1o" have pass ranges different son why a womans high voice sounds thinner than a mans low voice is that there are fewer harmonics in a given frequency range in the case of a womans voice, since they are based on a fundamental frequency of about twice that of a mans voice.
  • the psychology of the lower frequency energy present in the mans voice is further the psychology of the lower frequency energy present in the mans voice.
  • the circuit may be used as rst described for speech transmission, except that the FP channel is disconnected from microphone M and connected to a second microphone P into which the pitch control sounds are directed.
  • This offers the possibility of producing a hybrid voice effect in which the second voice controls only the pitch of the first voice at M.
  • a Violin or other means may supply a tune to microphone P while a talker speaks words into ⁇ M in proper cadence. The result is a song. Other effects may be obtained by varying the method.
  • the invention contemplates, however, the use of filters in the synthesizer having different pass-bands from those in from the corresponding filters F1 F1o.
  • this has the eect of changing the character, for example, from that of a mans voice to that of a womans voice or vice versa.
  • the pass ranges of the filters in the synthesizer may be moved upward in the frequency scale by 10 per cent as indicated in the following table in which the total utilized range is assumed to be greater than that shown in Fig. 1.
  • the talker were a man.
  • the synthesizer filters had their pass-bands shifted downward by, say, 10 per cent, and the pitch were lowered a suitableramount, the effect would be to translate a womans voice into a mans voice.
  • measurements can be made for all sounds and the needed shifts in response frequencies provided for, including upward shifts at certain frequencies and downward shifts at others. These shifts are made merely by adjusting the pass ranges of the synthesizer filters. Such changes can be carried out to any desired detail. Similar changes can be made for translating from an adult voice to a child voice or from -the human voice to speech having the characteristics of animals, either very small animals or large' animals. For example, a parrots speech is not only high pitched but confined to a relatively narrow frequency range because of the smallness ofthe talking apparatus which determines the resonance. If an attempt is made to convert from human speech to that of a parrot by merely tuning the talking circuit, it is found that the articulation is very poor and that the amplitude in the resonance region is unduly high.
  • Fig. 3 represents by the series of dashes a, ⁇ b, c, etc., the band-pass ranges of the analyzer-filters suitable for human speech ⁇
  • the dotted curve r indicates in a general way what the resonance characteristics of a parrot or other small animal may
  • the pass ranges of the synthesizer filters would need to be shifted downward in the frequency spectrum and in some cases at least broadened. In some cases it might be necessary to shiftsome of the ranges downward and others upward to meet the requirements of a double humped resonance curve. Measurements may be made in any case to determine the type of shift necessary. As a further example, the following table indicates the kind of shifts that might -be necessary to imitate the speech of a small animal.
  • Analyzer bands Synthesizer bands -250- 0-500- 250-550 500-1000 550-850 1000-1500 850-1150 1500-1700 1150-1450 1700-1900 1450-1750 1900-2000 1750-2050 2000-2100' 2050-2350 2100-2200 2350-2950 2200-2400 2950-3500 2400-2600 3500-4500 Z600-3200 4500-6000 l 3200-4500 6000,-7500 o 4500-6000
  • Various other effects may be obtained by controlling the spectrum with or Without controlling the pitch. For example, a bass voice may be translated to soprano and' vice versa. Thus a soprano singer of great musical ability would be enabled to sing in the bass register or some other register with the same characteristic quality.
  • Fig. 2 the control 'channels are indicated as being carried through a switching mechanism designated merely by the box 50 in the drawing.
  • incoming channel I can be connected to outgoing channel 9 or any incoming channel can be connected to any outgoing channel by means of the switching mechanism.
  • Some interesting effects are possible by this means. For example, if channels I, 2 I0 are active and the sound e as in meet is spoken into the microphone M it is heard from the loud-speaker as the same sound. With only channels I and 9 operative the sound remains practically the same. If,
  • the currents in the individual lines L1, L2, etc. are in the form of slowly varying direct currents suitable for controlling relays. This olers the possibility of operating relays in the switching mechanism to make predetermined changes in the circuit connections of the individual channels under control of currents in these channels.
  • the relay 5I is shown controlled directly from the channel L1 and is symbolic of any relay switching mechanism necessary for carrying out the desired switching functions.
  • relay 5I may be arranged to switch channel L1 to one of the other channels leading to a higher frequency lter F2', Fs', etc., so that when low frequency energy of sufficient magnitude appears in channel L1, it causes the production of higher frequency components in the reconstructed y speech.
  • Various modifications may be made in the character of the switching and in the manner of controlling-the relays governing the switclung. Where a microphone pick-up has been disclosed it will be obvious that anyother type of device the record 25. The nature of the 'lized otherwise.
  • a translation from the ilrst to the second record can be modified in a wide variety of ways., by use of the pitch microphone P, the microphone 21, or
  • the shifting of the pass ranges of the reconstructing filters relative to the analyzer lters may be used along with the alternative sources of pitch control waves 25, 21, etc. to secure desired effects.
  • a source of music-representing waves means to derive from each of a, plurality of frequency subdivisions of the speech wavesan index of the energy content of such frequency Subdivision varying with time, means to-modulate said musicrepresenting waves in a plurality of circuits in accordance with said indices, and means to translate the resulting modulated wave energy into soundl energy.
  • the method comprising subdividingy a soundrepresenting wave into a number of 1- relatively narrow frequency bands, subdividing a second wave of band characteristics into a corresponding number of frequency bands certain of which are of different band width from the corresponding bands of the first wave, deriving an index of the energy content of each of the subdivided bands of the first wave, and controlling transmission of each of the subdivided bands of the secondmentioned wave in accordance with a different one of said indices.
  • changed voice comprising analyzing the speech waves to determine their energy content in each of a plurality of distinct frequency regions, generating waves having a frequency band extending over the range of the sounds to be produced,
  • the distinct frequency regions of the generated waves so controlled being displaced in frequency cies'comprised of a .fundamental frequency 'and harmonics thereof, means causing the fundawith respect to the respective distinct frequency regions of the analyzed waves.
  • the method of reproducing speech with changed voice comprising analyzing the speech waves .to determine their energy content;l in each of a plurality of distinct frequency regions, generating waves having a frequency band extending over the rangei of the sounds to be produced, and varying the energy content of each of a plurality of distinct frequency regions of said generated waves in accordance with the energy content of a respective region of the analyzed waves, the distinct frequency regions of the generated waves so controlled being of respectively different frequency width from the corresponding distinct frequency regions of the analyzed speech.
  • a source of waves of a range of acoustic frequencies a second source of waves of a range' of acoustic frequencies, a set of lters for subdividing the waves from the rst source into relatively narrow frequency bands, a set of filters for subdividing the waves from the second source into a corresponding number of relatively narrow frequency bands certain of which differ in width from the corresponding bands into which the waves from the rst source are divided, means determining the energy content of each of the subdivided bands from said rst source, means arranged to respond tothe waves of the 'subdivided bands from said second source, and means for controlling the response of said second means to each of said subdivided bands from said second source in accordance Swith the energy content of the corresponding subdivided band from said flrst source.
  • means for/analyzing speech Waves to determine their fund mental frequency and their energy rate in each of a number of frequency regions means for generating waves extending overa wide range of acoustic fIeQuen-.- .1
  • means for producing the elect of a small animal talking under control of human speech comprising means for generating acousticfrequency waves extending over the relatively narrow frequency range appropriate to said animal, means for subdividing said Waves into a number of narrow bands, means for subdividing the human speech waves into a .corre spending number of narrow bands, means determining the energy rate of each of said latter bands, and a plurality of means controlled individually in accordance with the energy rate of said individual bands for producing sound in accordance with each of said bands of said gen-' ing number of narrow bands, means determining the energy pattern of each of said latter bands, and a plurality of means controlled individually in accordance with the energy pattern of said individual bands for producing sound in accordance with each of said bands of generated waves.
  • the combination according to claim 9 including means to derive the fundamental frequency of said speech waves, and means to control the frequency of the lgenerated waves thereby.
  • a system comprising a plurality of wave transmission channels for respectively transmitting subbands of a frequency band of waves that represents a voice wave, means for producing a. complex tone having its fundamental frequency in controlled relation to that of said voice wave but having the relative amplitudes of its components independent of the relative amplitudes of the componentsyof said voice wave, analyzing circuits responsive to waves from said channels for determining the energy flow, varyingin time, in each of said subbands, means controlled by said analyzing circuits for producing from the energy of said complex tone a similitude of said energy, varying in time, in said subbands, and means for interchanging the control between individuall analyzing circuits and the meansv for producing said similitude of said energy in said subbands.
  • a system according to claim 12 including' means controlled by said voice wave for actuating said means for interchanging said control.
  • a source of wave energy of band characteristic a plurality of transmission control means operative at different frequency regions within said band, means actuating one of said control means in accordance with voice energy in a low frequency part of the voice range to control transmission of low frequency waves from said source, means actuating another of said control means in accordance with an intermediate range of .voice energy to control transmission of waves ofI intermediate frequency from said source, means actuating another of said control means in accordance with a high frequencyy range of voice energy to control -transmissionof waves of high frequency from said source, means making certain ofI the frequency ranges whose transmission is controlled by said several transmission control means different in width from the corresponding ranges of said voice wave, and means for combining theeffects of the transmitted ranges.
  • frequency subbands utilizing the energy in Ieach of said subbands of the speech waves to control transmission of the corresponding subbands of said characteristic sound, and making certain of the frequency subbands of saidcharacteristic wave of respectivelyl different width from the corresponding subbands of said speech wave.
  • the method of simulating the effect of a talking image or cartoon comprising vsupplying a frequency band of waves appropriate in pitch and ⁇ band width to said image or cartoon, subvdividing said band of waves linto a number of relatively narrow frequency subbands, subdividing speech waves into a corresponding number ⁇ ment in the form of a musical selection, subdividingl said wave into relatively narrow frequency subbands, subdividlng a voice wave into a corresponding number of relatively narrow frequency subbands, determining the energy rate in each of said frequency sub ands in said voice wave, controlling the ampli ude of the waves in each of the subbands o f said musical wave in accordance with the energy rate of the corresponding subband of said voice wave and combining the eilects of the subbands of said musical wave whose amplitudes are so controlled.
  • the method of producing sound effects comprising taking thel voice of a singer in the form of a musical selection, subdividlng said wave into relatively narrow frequency subbands, subdividlng anothervoice wave into a corresponding number of relatively narrow frequency subbands, determining the energy flow in each of said frequency subbands of' said other voice wave,controlling the amplitude of the waves in 19.
  • the method of enablinga speaker to change his voice comprising analyzing his voice wave to determine itsv fundamental frequency and the energy rate in each of a number 0f relatively narrow frequency subbands, generating4 waves coveringv the speech range under control of said fundamental frequency, subdividinglsaid generated waves into a corresponding number of relatively narrow frequency subbands certain of which differ in band width from the corresponding frequency subbands of said voice wave, controlling the amplitudes of the waves in the various subbands of said generated waves in accordance with the energy rate of the correspond- 22.
  • Means for producing the effect of a singing comprising analyzing his voice wave to determine itsv fundamental frequency and the energy rate in each of a number 0f relatively narrow frequency subbands, generating4 waves coveringv the speech range under control of said fundamental frequency, subdividinglsaid generated waves into a corresponding number of relatively narrow frequency subbands certain of which differ in band width from the corresponding frequency subbands of said voice wave, controlling the amplitudes of the waves in the various subbands of said generated waves in accord
  • musical instrument comprising means for producing waves characteristic' of such instrument and representing a musical selection, an input circuit, an output circuit, a plurality of transmission control devices connected in parallel paths between said input circuit and said output circuit, means for 'subdividlng said characteristic wave on a frequency basis into a corresponding plurality of frequency subbands, means to apply the waves in each of said subbands to a respective one of said control devices for transmission therethrough, means causing waves to flow in said input circuit representing words spoken in cadence with said musical selection, means subdividing said latter waves into a plurality of frequency subbands, means to determine the energy 4characteristic of each of said latter subbands,
  • two vinput circuits for sound-representing waves a common output circuit, a. source of complex tone waves based on a fundamental frequency, means for transmitting waves from said source into said output circuit, a plurality of amplitude patternv control channels for exercising control of the waves so transmitted in each of a number of frequency regions in accordance with frequency regions of the sound-representing waves in one of said input circuits and a pitch control circuit for controlling said fundamental frequency in accordance withother sound-representing'waves in said other input circuit.
  • a system comprising a, plurality of channels for transmitting energy having syllabic time rates of change in the channels, respectively, a source of waves having a sound spectrum the relative amplitudes of the components of which are independent of the relative amplitudes of the components of said transmitted energy, means controlled by the energy in said channels for varying at said rates, respectively, the energy in dierent frequency regions of Waves from said source, and switching means for interchanging said channels in their control'of the energy in said diierent frequency regions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Description

May 27, 1941. H. w. D'uDLl-:Y
SYSTEM FOR THE ARTIFICIAL PRODUCTION 0F VOCAL 0R OTHER SOUNDS Filed May 13, 1939 Patented May 27, 1941 SYSTEM Fon THE ARTIFICIAL PRODUCTION F vooAL on OTHER souNDs Homer W. Dudley, Garden'City, N. Y., assignor to Bell Telephone Laboratories, Incorporated, New York, N.`Y., a'corporation of New York Application May 13, 1-939. Serial No. 273,429A 26 Claims. (Cl. 179-1) -'Ihe present invention relates to wave transmission or control, involving waves of frequency band characteristics, for intelligence transmission, sound production or reproduction or other purposes. y 'I'he invention relates to the genera1 type -of wave control or wave production that is disclosed in my United-States Patent 2,151,091, granted March 21, 1939, and for its general object it contemplates certain improvements or modifications of the basic system and method therein disclosed.
In accordance with the disclosure of my prior patent referred to, a wave having the characteristics of speech, for example, may be produced artificially using as raw material wave energy having a frequency distribution which may be uniform over a requisitely wide frequencyl range, and, therefore, lacking in character or intelligibility. The wave construction is'accomplished by modifying such raw material wave energy in conformance with the fundamental pitch and energy distribution relations which characterize` and methods involved, reference is made to the patent disclosure.
While the system of my 'prior patent is susceptible of many types-of uses and embodiments, one of its aims is to secure faithful transmission and reproduction of speech, music or other Waves, although it is, of course, not limited to this aspect.
which it may be desired to produce a controlled change in character between the analyzed wave and the reconstructed wave.
The invention contemplates .specically two general ways of changing the type or character of the'reproduced wave. In accordance with one method, the wave used as raw material in reconstructing the output Wave itself possesses or has imparted to it certain modifications or ycharacteristics before it is acted upon by the products vof analysis of the original Wave or in such manner that it aects the character of the ,reconstructed Wave. In accordance with the other 'mentioned method, a wave of either uniform tion with each other.
'I'he present invention may also be used similarly y to the system of my prior patent, but the point of view of the present invention is directed more to securing a deliberate change of character between the analyzed Waves and the reconstructed waves. Such change in character may be desirable for many purposes, such as rendering speech more understandable in the presence of noise or because of other conditions making reception difficult, or for purposes of imitation, imr personation, entertainment, research in speech or hearing, voice or musical training or instruction and kindred purposes.
The invention contemplates in its objects the realization of these and any other purposes for AP is in the form of electrical currents which f The nature and objects of the invention will be more fully apparent from the detailed description to follow, `in connection with the accompanying drawing, in which:
Fig. 1 is a schematic circuit diagram of a complete system according to the invention;
Eig. 24 shows a modiiication which may replace the portion of 'Fig. 1 to the right of andbelow the broken line H-II; and
Fig. 3 shows a diagram of frequency relations to be discussed.v
Referring now to the drawing, in Fig. 1, speech or' music currents from line or circuit I energize a frequency pattern control circuit FP and an amplitude pattern lcontrol circuit AP. 'I'he frequency pattern control circuit, which comprises but one channel FP1 discriminates as to the frequency pattern. This discrimination includes discrimination as to the fundamental frequency when there is one. The amplitude pattern control circuit branchesv into a number of channels, for example, ten channels AP1 to AP1o and determines what amplitude pattern we have. For simplicity, channels AP4 to APs are omitted from the drawing.
The information obtained from the speech analysis eifected in these two circuits FP and can be transmitted through any suitable trans'- mitting medium, such for example, as lines Lo to Lio, to the receiving or reproducing end of thean unvoiced speech sound is applied.
forms three functions.
system. This transmitting medium may have a limited frequency range of transmission, of much less Width than the frequency range of the speech signals to be communicated. If desired, it may l be a single conductor pair or a radio link, the transmission through this medium then being,
CFI
frequency spectrum of such sounds is, ln general, continuous and devoid of discrete y components. 3. Mzed sounds.-F0r these sounds the acoustic energy is derived from both the `above sources and the frequency spectrum consists of discrete components superposed on a continuous spectrum. The frequency pattern control channel FP1 is a circuit that analyzes speech ,sounds on the basis of the above classification and which, within certain limits, delivers to, a load a wave whose frequency spectrum is of the same class as that of the speech sound applied to the input of the circuit.
The circuit operates so that when no signal V or an unvoiced speech sound (Class 2, above)` is have our reproduction or reconstruction of the original speech signal for any further transmission in the ordinary manner, or for reproduction of sound by loud-speaker 5.
The system as shown uses a 275 cycle total transmissionband in the transmission medium between the transmitting and receiving ends of the system, that is, in the lines Lo-to Lio. This 275 cycle band is on the basis of eleven channels, each of 25 cycle pass-band. Ten of these are for amplitude pattern control and the other one is for frequency pattern control. As brought out in my above-mentioned patent, such a system is adequate for good quality transmission of speech.
The frequency pattern control channel FP1 will firstbe described with reference to its adjustment for use in natural reproduction of speech. It is a. circuit for analyzing and 'reproducing the frequency spectrum of the source of energy in speech sounds. For speech applied to its input from line l 'it delivers an output wave that has discrete components andis of the same fundamental frequency as the input when a voiced speech sound is applied, and an output Wave with a continuous spectrum when It per- First, at the transmitting or analyzing end of the system it derives from the speechy signal thefundamental or vocal cord frequency and expresses this as a current, the amplitude or magnitude of which is proportional to the fundamental frequency. Next, at the receiving or reproducing end of the system it uses this current to control the frequency set up by a relaxation oscillator so as to get back a wave of the original fundamental frequency,
rich in upper harmonics. Finally, it provides for another source of energy at the receiving end, havinga continuous spectrum, when there is no fundamental in the speech. This y"condition occurs when sounds are unvoiced, asfor example, in Whispering and the unvoiced consonants.
From the standpoint of the source of acoustic energy the sounds of speech may be divided into three classes:
applied to the input, random noise from a` gasfilled tube amplifier is supplied to the load. The spectrum of such random noise is a continuous one and 'so is similar-to that of the source of acoustic energy in unvoced sounds but any other source with a similar spectrum could =be used. When a voice or mixed speech sound (Classes 1 and 3, above) is applied to the input, the noise voltage is removed from the load and a wave of vthe s ame fundamental frequency as the 'speech sound and any desired frequency spectrum is supplied to the load.
The analyzing circuit of the frequency pattern control channel FP1, or the portion of this channel at the sending or analyzing end of the system, will first be described. It comprises a detector D which may be, for example, a full-wave copper-oxide rectifier, an attenuation discrimination network or so-ca'lled equalizer E1 having its loss increasing with frequency, a frequency measuring circuit FM and a 25 cycle low-pass filter Fao. fed through the rectifier D, which feeds the equalizer E1, which in turn feeds the frequency measuring circuit FM. This frequency measuring circuit may be any suitable circuit for delivering through the low-pass filter Fao a direct current that depends on the number of reversals per second of direction of the voltage wave applied to this circuit and is independent of itsy amplitude as long as the amplitude exceeds a certain threshold value. For example, this circuit may have the form shown in Fig. 2, of Patent 2,183,248 of R. R. Riesz issued Dec. 12, 1939. This form is preferred on account of its being stable and free from singing, free from false operation at high frequencies, positive in action upon application of the input wave, and economical of plate battery power. In order to have the output current of the frequency meas- .uring` circuit controlled by the fundamental of the voice, the rectifier D modulates the various harmonics to give a. strong fundamental and the harmonics Aare suppressed by the equalizer E1. The equalizer ma'y be any suitable network having its loss increasing with frequency so as to insure that the fundamental frequency, which may vary for example from about to some A30() cycles, comes out at a high power level compared to any upper harmonics that may Ibe present. The equalizer may practically cut off transmission abovea frequency' in the neighborhood of 300 cycles, for example. For practical purposes the attenuation discrimination of the equalizer purifies the `fundamental tone though it may vary substantially more than an octave. The level of the unvoiced sounds must be ad- The speech currents from line l are justed. to a value too low to cause operation of the frequency measuring circuit. If desired, for this purpose a voice amplifier G, with its gain adjltable, may be provided, for example in the channel APi as shown in Fig. 1.` The direct current delivered by the frequency measuring circuit through the low-pass filter Fan may be made substantially directly proportional to the fundamental frequency applied to the frequency pattern control channel from line 1..
The direct current delivered by the 'low-pass filter Fao which may be substantially directly .proportional to the fundamental frequency ap-` plie'd to the frequency pattern control channel from. line I is transmitted-through line Lo to the energy source `of frequency patterns FPS. 'Ihe latter comprises a relaxation oscillator 40 and a noise source 'including gas-filled tube 4I followed by switching amplifier 43.
The Vrelaxation oscillator comprises gas-filled tube 60 together with circuit elements as described more fully in the Ries'z patent above recurrents to the receiving or reproducing end.
where the output of the energy source of frequency patterns FFS appearing in circuit I is 1 shaped accordingly. For transmitting a speech ferred to. This oscillator generates a wave which is rich in harmonics and the fundamental frequency of which is controlled by the voltage appearing across resistance I2 due to current flow in the line Lo.
The noise source 42 comprises gas-filled tube 4I having its grid tied to the cathode and having suitable resistances in its plate circuit as shown together with a plate battery. 'Ihis tube while shown as a triode may advantageously be a multigrid tube, such as a pentode or tetrode. It is found that this type of circuit produces a continuous energy spectrum of noise in the audio range. If desired, an equalizer or amplitude limiter (not shown) may be included between tube 4I and switching amplifier 43 to make the output flat over the frequency band. Instead of the type of noise source shown, a resistance source of noise may be used as disclosed in the Riesz patent referred to. i
fThe function of the switching amplifier is to determine whether or not the continuous spectrum of noise is permitted to pass through to the circuit I4. This is accomplished under control of currents supplied across the grid resistor I3 from the control line L0. The bias Voltage on the switching amplifier in the absence of any voltage drop across resistance I3 is insuflicient to block transmission. When voiced sounds are impressed on the system the-voltage developed across resistancev I3 is suflcient to blocktransmission.
Conversely, the initial grid bias on the relaxation oscillator I6 is sufficiently negative so that in the absence of voltage applied across resistance I2 the tube will not oscillate. When a voiced wave is applied to the frequency measuring circuit the resulting direct current voltage across resistance I2 is of the right magnitude and sign to cause the tube to oscillate at some low frequency for a weak input current and at higher frequencies for stronger input current.
frequency range from 0 to 2950 cycles, for example,'the speech bands chosen may be, for instance, one lband from 0 to 250 cycles and nine adjacent bands each 300 cycles wide, starting at 250 cycles. These bands are selected by filters F1 to F1o in the amplitude pattern contr'ol channels AP1 to APm, respectively. Thus, of these amplitude pattern control channels used to transmit information about the amplitude pattern, the channel AP1 transmits information about the amplitudes in the speech range 0 to 250 cycles, the channel APs transmits information about the amplitudesin the speech range 250 to 550 cycles,
the channel APa information about the amplitudes in the range 550 to 850 cycles, etc.
y Considering channel AP1, for example, the output from the 0 to 250 cycle speech band-passv lter F1 is fed to detector D1, which may be, for instance, like the detector D. 'I'he syllabic frequencies in the output from the detector are passed through a 25 cycle low-pass filter F31 and the resulting variable direct current is passed through line L1. This variable direct current is then applied to a biasing resistor B to give a grid bias to a .signal shaping network or push-pull variable gain amplifier SN1, which accordinglyvaries its gain in amplifying the waves received from the energy source of frequency, patterns FPS through 0 to 250 cycle speech band-pass filter F1', so that the average power in this band of Waves varies in accordance with the average power inthe corresponding band of the speech signals. 'Ihe energy from the amplifier SN1 is then fed through a 0 to 250 cycle speech band-pass filter F1" tothe speech receiving circuit 4, Where it .is combined with the outputs from nine other speech band-pass filters of channels AF2 to APml to give a reproduction of the original speech signal.
It will be understood that in the description thus far the filters F1 and F1" may have the same pass-band as filter F1, and that channels The equalizer 46 renders all of the harmonic components of the current wave of the relaxation oscillator 40 equal in amplitude. This may be followed by amplifier 41, if desired.
When the switch I5 is closed and the switch 20 is in its left-hand position to make contact with terminals II both the noise source and the relaxation oscillator are connected to the synthesizing portion of the system through circuit I0. This is in the condition for natural reproduction of 7 speech or music. In this condition switch S3 (its APz to APio may be like APi except as to frequencies involved. However, a modified arrangement in which the pass ranges are not the same will be described later on.
In the circuit design it is desirable to have the delay in the frequency pattern control channel and in all of the amplitude pattern control channels the same. -If the frequency pattern control channel FP1 tends to vhave more inherent delay thanthe amplitude pattern control channels, it is desirablel to introduce a certain amount of deylay in -the amplitude pattern control circuit AF they may be sent through any other type of channel such as carrier channels or on a time division basis, as in my prior Patents 2,151,091 and 2,098,956 November' 16, 1937, or in other suitable manner.
In the operation of the system, the ten circuits SNi SNio have their tubes biased so thatwhen no voltage is developed across resistance B, the circuits are blocked. 'Ihus even though the noise from circuit I4 is impressed on these circuits it does not get throughthem in the absence of applied speech currents.
When speech or other sounds are directed against the microphone M in circuit I the sound is analyzed for its fundamental frequency and for the flow of energy in each of the ten frequency ranges passed by filters F1 F1o in the manner described. The currents which represent these energy flows control the transmission, through the SNi SNio circuits, individually, of the waves from the generating system FPS.
as a song with instrumental accompaniment or as a singing instrument. With switch Sa open, no sounds emerge except when sounds are applied at M. With switch S3 closed,. pad 30' may be adjusted to give a background simulating instrumental accompaniment. ,With switch I5 open all of the reconstructed sounds must come from source 25 or 21 and in the case of words spoken into M the articulation is less perfectthan if switch I5 is closed. However, the provision .of more filters in the AP lines to further subdivide and extend the range would increase the fidelity of the sounds at M without the use of the source 42.
These waves, in turn, are of the type to give a voiced sound or of the type to give an unvoiced sound, as determined by the frequency. pattern channel. The result is a reconstructed speech or sound corresponding to the original.
The present invention contemplates the use of other sources of waves for purposes of reconstruction of sounds. For example, the drawing shows a phonograph 24 with record 25 and picktacts 23. The'switch I5 may beeither closed or open depending on whether the noise waves are to be used or not, An adjusted amountof the waves from the record 25 or microphone 21 can be passed directly to the output circuit 4 by closing switch S3 and adjusting pad 30.
'I'he invention contemplates use 'of a variable gain amplifier like SN1 'in the leads from the microphone 21 and fromie reproducer 26 and connected to channel Lo at terminals X in the same way that SNi is connected to channel L1, so that sounds from the phonograph or microphone 21 are suppressed when only unvoiced sounds are spoken into microphone M. This variable gain amplifier is preferablyadjusted to have an abrupt cut-off so that it is unblocked and opened for transmission in responsento even a small amplitude current, in line Lo, the amplitude modulation occurring principally in the SN circuits of the synthesizer. This arrangement permits the noise to be applied through switch I5 by itself for the unvoiced sounds, the phonograph sounds being cut off momen-tarily just as the relaxation oscillator energy is suppressed when only unvoiced sounds are to be reproduced,
when the system is used for talking. The control' of the-noise source from line Lo is unaffected by this circuit modification. AThe relaxation oscillator is without effect since switch 20 is in its right-hand position.'
Some very interesting and novel effects can be obtained by usev of the substitute source 25 or 21. For instance, if an instrumental rendition of the air of a song `is used in circuit Ill, and if the4 words of the song are spoken or sung into the microphone M, the reconstructed sounds appear In discussing the different effects obtainable, it will be convenien-t at this point to adopt terms which distinguish the waves applied to circuit.
I0 from those applied to circuit I. It is seen that the waves applied to circuit I determine which of the SN circuits are unblocked andthe extent to which they are individually unblocked. These waves will be referred to as the spectrum control waves or spectrum since they determine the amplitude pattern or amplitude distribution over the frequency band. The waves introduced at I0 will be called the pitch control or pitch waves since they furnish the actual frequencies passing through the SN circuits. For example, if a talker had a fundamental frequency of 220, the harmonics supplied from oscillator I0 would be multiples of 220 whereas another talker might have a fundamental, of, say 190 and the harmonies supplied from oscillator would be multiples of 190. supplying music waves -to circuit I 0 would fix the pitch of the reconstructed speech. In either case the spectrum waves would not affect the I pitch.
A few of the sound effects obtainable with this system will now be given by way of example, it being understood that these are 'but a few of the many possibilities.
'I'he pitch may be supplied by one talker at 21 and the spectrum by a second talker at M. It is understood, of course, that both talkers may be from recordings, a suitable reproducer being used Iin place of microphone M.
The pitch may be supplied from a record, orchestra or other instrumental music and the spectrum from a talkerat M. In this case the speech is automatically made into a song.
The pitch may be supplied from a singer, ac-
tual or recorded, and the spectrum from a second singer at M. This presents fan interesting case since the singer at M may sing off pitch without affecting the result. Thus a poor singer at M can be made to appear as a good singer provided the good singer is supplying the pitch. Also, a talker or singer at M can supply a foreign accent, nasal twang or southern drawl to a good voice, such as that of a prima donna. 'Ihis effect might be used to change a song sung in one language into a song sung in a different language.
The pitch may be supplied by one singer and the spectrum from a person whispering'at M.
The effects of various animated objects may be obtained. For example, if any sound of sustained character, such as the rustlingof leaves, Niagara Falls, the roar of surf, machinery noise, bird songs, airplane drone, rainfall, tap dancing, thunder, etc., is used as the pitch and speech is used as the spectrum, the result is :articulated sound of the type used as the pitch control so that we would then have talking leaves, "the voice of raindrops, etc.
Or, record 25 or pick-up 21 in' Voiceor second instrument for the spectrum with` either the same or different tunes in the two cases.v
Variation may be introduced into either or both of the pitch and spectrum waves to secure vibrato or tremolo effects.
In the modification in which a musical selection is sent from the record 25 or pick-up 21 into the synthesizer and words are sung or spoken into the microphone M, it might be thought that the pitch of the voice would need to correspond with the tone range of. the music in order to produce the effect of a song. For example, if the music were predominately in the high register andthe output waves are to be in the soprano range, lit might bethought that a bass voice at M would be unable to control the proper SN circuits to modulate the music suitably. This is found, however, not to be the case. If a mans voice is analyzed, for example, a bass voice, it is found to have frequencies extending over as high a range as those of a soprano voice. For example, a bass voice contains harmonics of the order of 7,000 cycles or higher. The principal difference between a. mans voice and a womans voice is in the fundamental pitch. Probably the main reathe analyzer. 'I'his is indicated in Fig. 2 in which each of the filters F1' F1o and the correspond- A- ing lters F1" F1o" have pass ranges different son why a womans high voice sounds thinner than a mans low voice is that there are fewer harmonics in a given frequency range in the case of a womans voice, since they are based on a fundamental frequency of about twice that of a mans voice. There is further the psychology of the lower frequency energy present in the mans voice. It is apparent, therefore, that a bass voice spoken into microphone M would contain the necessary frequencies to produce control currents in all of the lines L1 L10 and to control the transmission of musical waves, in any part of the spectrum, applied to theV circuit i0. Conversely, a womans voice would also cover a sufficient range to actuate all of the synthesizing branches so that songs could be produced in an register. v
When sounds from the record or microphone 21 are usedfor reconstructing the waves, not only is the pitch obtained from these sources,but also enou'fh of the spectrum is obtained to bring out its salient characteristics, such as the typical resonances of a violin or a horn. So long as this spectrum does not interfere too much with the spectrum patterns transmitted through the AP.
channels, intelligibility can be passed over the latter.
Y If it is desired to control only the pitch char-v acteristics by means of a second sound input, the circuit may be used as rst described for speech transmission, except that the FP channel is disconnected from microphone M and connected to a second microphone P into which the pitch control sounds are directed. This offers the possibility of producing a hybrid voice effect in which the second voice controls only the pitch of the first voice at M. Also a Violin or other means may supplya tune to microphone P while a talker speaks words into `M in proper cadence. The result is a song. Other effects may be obtained by varying the method.
In the circuit of Fig. 1 the lters appearing in any one channeLsuch as L1, are all indicated as having the same pass range', for example, 0
'to 250 cycles per second. The invention contemplates, however, the use of filters in the synthesizer having different pass-bands from those in from the corresponding filters F1 F1o. In the case of speech this has the eect of changing the character, for example, from that of a mans voice to that of a womans voice or vice versa. For example, the pass ranges of the filters in the synthesizer may be moved upward in the frequency scale by 10 per cent as indicated in the following table in which the total utilized range is assumed to be greater than that shown in Fig. 1.
Analyzer' bands Synthesizer bands In addition to changing ythe transmission ranges of the filters it will ordinarily be desirableto change the pitch of the voice, which canl be done very simply by shifting the movable slider along the resistance I2 of' the relaxation oscillator so as to vary the ratio between the generated frequency and the amplitude of the frequency control wave. A womans voice has a fundamental frequency about twice as high as a mans voice. 'Ihis means that there are twice as many harmonics in a mans voice of given frequency Aband as in a womans voice of the same band. Changing the transmission bands of the filters alone would produce the effect of a womans voice of low pitch, such as contralto, if
. the talkerwere a man. By changing the pitch as well as the pass-bands of the filters a simulation can be obtained. 1f, on the other hand, the synthesizer filters had their pass-bands shifted downward by, say, 10 per cent, and the pitch were lowered a suitableramount, the effect would be to translate a womans voice into a mans voice.
To obtain even better simulation of a Ifemale voice, for example, measurements can be made for all sounds and the needed shifts in response frequencies provided for, including upward shifts at certain frequencies and downward shifts at others. These shifts are made merely by adjusting the pass ranges of the synthesizer filters. Such changes can be carried out to any desired detail. Similar changes can be made for translating from an adult voice to a child voice or from -the human voice to speech having the characteristics of animals, either very small animals or large' animals. For example, a parrots speech is not only high pitched but confined to a relatively narrow frequency range because of the smallness ofthe talking apparatus which determines the resonance. If an attempt is made to convert from human speech to that of a parrot by merely tuning the talking circuit, it is found that the articulation is very poor and that the amplitude in the resonance region is unduly high.
Fig. 3 represents by the series of dashes a,`b, c, etc., the band-pass ranges of the analyzer-filters suitable for human speech\ The dotted curve r indicates in a general way what the resonance characteristics of a parrot or other small animal may |be. By shifting the pass ranges of the synthesizer lters to the frequency positions indicated at c', d', etc., and making these bands narrower so that a greater number of them can be included within the resonance range of curve r, and by using suitably higher pitch, the output speech may be made to appear like that of a parrot or small animal and enough frequency components of the original speech will be utilized to permit of good articulation.
On the other hand, if human speech were to be made to simulate the speech of a wolf, lion or giant, or other large being where the resonances are broad and low, the pass ranges of the synthesizer filters would need to be shifted downward in the frequency spectrum and in some cases at least broadened. In some cases it might be necessary to shiftsome of the ranges downward and others upward to meet the requirements of a double humped resonance curve. Measurements may be made in any case to determine the type of shift necessary. As a further example, the following table indicates the kind of shifts that might -be necessary to imitate the speech of a small animal.
Analyzer bands Synthesizer bands -250- 0-500- 250-550 500-1000 550-850 1000-1500 850-1150 1500-1700 1150-1450 1700-1900 1450-1750 1900-2000 1750-2050 2000-2100' 2050-2350 2100-2200 2350-2950 2200-2400 2950-3500 2400-2600 3500-4500 Z600-3200 4500-6000 l 3200-4500 6000,-7500 o 4500-6000 Various other effects may be obtained by controlling the spectrum with or Without controlling the pitch. For example, a bass voice may be translated to soprano and' vice versa. Thus a soprano singer of great musical ability would be enabled to sing in the bass register or some other register with the same characteristic quality.
In Fig. 2 the control 'channels are indicated as being carried through a switching mechanism designated merely by the box 50 in the drawing. This ls to indicate that the variouschannels can be interchanged at will by the switching mechanism. For example, incoming channel I can be connected to outgoing channel 9 or any incoming channel can be connected to any outgoing channel by means of the switching mechanism, Some interesting effects are possible by this means. For example, if channels I, 2 I0 are active and the sound e as in meet is spoken into the microphone M it is heard from the loud-speaker as the same sound. With only channels I and 9 operative the sound remains practically the same. If,
possible, by manipulating the switching mechanism, including the shifting of energy sources, to
cause the word yes spoken into the microphone M to emer'geifrom the loud-speaker as no, the word we to emerge as you, etc. This suggests the desirability of an automatic control of the switching mechanism to produce predetermined types of translation. The currents in the individual lines L1, L2, etc. are in the form of slowly varying direct currents suitable for controlling relays. This olers the possibility of operating relays in the switching mechanism to make predetermined changes in the circuit connections of the individual channels under control of currents in these channels. The relay 5I is shown controlled directly from the channel L1 and is symbolic of any relay switching mechanism necessary for carrying out the desired switching functions. For example, relay 5I may be arranged to switch channel L1 to one of the other channels leading to a higher frequency lter F2', Fs', etc., so that when low frequency energy of sufficient magnitude appears in channel L1, it causes the production of higher frequency components in the reconstructed y speech. Various modifications may be made in the character of the switching and in the manner of controlling-the relays governing the switclung. Where a microphone pick-up has been disclosed it will be obvious that anyother type of device the record 25. The nature of the 'lized otherwise.
A translation from the ilrst to the second record can be modified in a wide variety of ways., by use of the pitch microphone P, the microphone 21, or
egects produced has been discussed above. In talking movies, these possibilities may sometimes save retakes, or may be used variously, as in a series of steps, to work up the desired type of nal sound record.
-The shifting of the pass ranges of the reconstructing filters relative to the analyzer lters may be used along with the alternative sources of pitch control waves 25, 21, etc. to secure desired effects.
It is recognized that the system disclosed herein is susceptible of wide Variation and modification of an obvious nature in view of. the examples given. All such variations and modifications are intended to be protected by this application.
Moreover, the various examples, Circut details' frequency ranges and sound effects disclosedare to be construed as illustrative rathervthan limiting and the scope of the invention is dened in the claims which follow.
What is claimed is:
1. In combination, a source of music-representing waves, a source of speech waves, means to derive from each of a, plurality of frequency subdivisions of the speech wavesan index of the energy content of such frequency Subdivision varying with time, means to-modulate said musicrepresenting waves in a plurality of circuits in accordance with said indices, and means to translate the resulting modulated wave energy into soundl energy.
2. The method comprising subdividingy a soundrepresenting wave into a number of 1- relatively narrow frequency bands, subdividing a second wave of band characteristics into a corresponding number of frequency bands certain of which are of different band width from the corresponding bands of the first wave, deriving an index of the energy content of each of the subdivided bands of the first wave, and controlling transmission of each of the subdivided bands of the secondmentioned wave in accordance with a different one of said indices. l
3. The method of producing vocal modulation of a musical wave comprising subdividing into narrow frequency bands the energy spectra of both the vocal wave and the musical wave, deriving an index of the energy content, varying with time, of each of the subdivided vocal wave bands,
changed voice comprising analyzing the speech waves to determine their energy content in each of a plurality of distinct frequency regions, generating waves having a frequency band extending over the range of the sounds to be produced,
and varying the energy content of each of a plurality of distinct frequency regions of said generated waves in accordance with the energy content of a respective region of the analyzed waves, the distinct frequency regions of the generated waves so controlled being displaced in frequency cies'comprised of a .fundamental frequency 'and harmonics thereof, means causing the fundawith respect to the respective distinct frequency regions of the analyzed waves.
5. The method of reproducing speech with changed voice comprising analyzing the speech waves .to determine their energy content;l in each of a plurality of distinct frequency regions, generating waves having a frequency band extending over the rangei of the sounds to be produced, and varying the energy content of each of a plurality of distinct frequency regions of said generated waves in accordance with the energy content of a respective region of the analyzed waves, the distinct frequency regions of the generated waves so controlled being of respectively different frequency width from the corresponding distinct frequency regions of the analyzed speech.
6. In combination, a source of waves of a range of acoustic frequencies, a second source of waves of a range' of acoustic frequencies, a set of lters for subdividing the waves from the rst source into relatively narrow frequency bands, a set of filters for subdividing the waves from the second source into a corresponding number of relatively narrow frequency bands certain of which differ in width from the corresponding bands into which the waves from the rst source are divided, means determining the energy content of each of the subdivided bands from said rst source, means arranged to respond tothe waves of the 'subdivided bands from said second source, and means for controlling the response of said second means to each of said subdivided bands from said second source in accordance Swith the energy content of the corresponding subdivided band from said flrst source.
'7. In combination, means for/analyzing speech Waves to determine their fund mental frequency and their energy rate in each of a number of frequency regions, means for generating waves extending overa wide range of acoustic fIeQuen-.- .1
`mental frequency of the generated Wavesto be differentfrom the fundamental frequency of said speech waves but definitely related thereto,
means producing sounds from said generated waves, and means controlled by said several energy rates for determining the eifectiveness of said means in producing sound from each of several frequency regions of said generated waves, different in frequency from the regions of the speech waves used to determine said energy rates.
8. ,In combination, means for producing the elect of a small animal talking under control of human speech, comprising means for generating acousticfrequency waves extending over the relatively narrow frequency range appropriate to said animal, means for subdividing said Waves into a number of narrow bands, means for subdividing the human speech waves into a .corre spending number of narrow bands, means determining the energy rate of each of said latter bands, and a plurality of means controlled individually in accordance with the energy rate of said individual bands for producing sound in accordance with each of said bands of said gen-' ing number of narrow bands, means determining the energy pattern of each of said latter bands, and a plurality of means controlled individually in accordance with the energy pattern of said individual bands for producing sound in accordance with each of said bands of generated waves.
10. The combination according to claim-8 including means to Aderive the fundamental frequency lof said speech waves, and means to control the frequency of the generated waves thereby.
11. The combination according to claim 9 including means to derive the fundamental frequency of said speech waves, and means to control the frequency of the lgenerated waves thereby.
12. A system comprising a plurality of wave transmission channels for respectively transmitting subbands of a frequency band of waves that represents a voice wave, means for producing a. complex tone having its fundamental frequency in controlled relation to that of said voice wave but having the relative amplitudes of its components independent of the relative amplitudes of the componentsyof said voice wave, analyzing circuits responsive to waves from said channels for determining the energy flow, varyingin time, in each of said subbands, means controlled by said analyzing circuits for producing from the energy of said complex tone a similitude of said energy, varying in time, in said subbands, and means for interchanging the control between individuall analyzing circuits and the meansv for producing said similitude of said energy in said subbands.
13. A system according to claim 12 including' means controlled by said voice wave for actuating said means for interchanging said control.
14. In combination, a source of wave energy of band characteristic, a plurality of transmission control means operative at different frequency regions within said band, means actuating one of said control means in accordance with voice energy in a low frequency part of the voice range to control transmission of low frequency waves from said source, means actuating another of said control means in accordance with an intermediate range of .voice energy to control transmission of waves ofI intermediate frequency from said source, means actuating another of said control means in accordance with a high frequencyy range of voice energy to control -transmissionof waves of high frequency from said source, means making certain ofI the frequency ranges whose transmission is controlled by said several transmission control means different in width from the corresponding ranges of said voice wave, and means for combining theeffects of the transmitted ranges.
15. The method of simulating the effect of a talking object which gives off a characteristic sound of its own, such as a-waterfall, machine,
etc., which comprises subdividlng said characteristic sound into a number of relatively narrow frequency subbands, subdividlng speech waves intoa corresponding number of relatively narrow ing subband of said other voice wave and combining the effects of the subbands of said musical wave whose amplitudes are so controlled.
frequency subbands, utilizing the energy in Ieach of said subbands of the speech waves to control transmission of the corresponding subbands of said characteristic sound, and making certain of the frequency subbands of saidcharacteristic wave of respectivelyl different width from the corresponding subbands of said speech wave.
16. The method of simulating the effect of a talking image or cartoon comprising vsupplying a frequency band of waves appropriate in pitch and `band width to said image or cartoon, subvdividing said band of waves linto a number of relatively narrow frequency subbands, subdividing speech waves into a corresponding number` ment in the form of a musical selection, subdividingl said wave into relatively narrow frequency subbands, subdividlng a voice wave into a corresponding number of relatively narrow frequency subbands, determining the energy rate in each of said frequency sub ands in said voice wave, controlling the ampli ude of the waves in each of the subbands o f said musical wave in accordance with the energy rate of the corresponding subband of said voice wave and combining the eilects of the subbands of said musical wave whose amplitudes are so controlled.
18. The method of producing sound effects comprising taking thel voice of a singer in the form of a musical selection, subdividlng said wave into relatively narrow frequency subbands, subdividlng anothervoice wave into a corresponding number of relatively narrow frequency subbands, determining the energy flow in each of said frequency subbands of' said other voice wave,controlling the amplitude of the waves in 19. The method of enablinga speaker to change his voice comprising analyzing his voice wave to determine itsv fundamental frequency and the energy rate in each of a number 0f relatively narrow frequency subbands, generating4 waves coveringv the speech range under control of said fundamental frequency, subdividinglsaid generated waves into a corresponding number of relatively narrow frequency subbands certain of which differ in band width from the corresponding frequency subbands of said voice wave, controlling the amplitudes of the waves in the various subbands of said generated waves in accordance with the energy rate of the correspond- 22. Means for producing the effect of a singing;
musical instrument comprising means for producing waves characteristic' of such instrument and representing a musical selection, an input circuit, an output circuit, a plurality of transmission control devices connected in parallel paths between said input circuit and said output circuit, means for 'subdividlng said characteristic wave on a frequency basis into a corresponding plurality of frequency subbands, means to apply the waves in each of said subbands to a respective one of said control devices for transmission therethrough, means causing waves to flow in said input circuit representing words spoken in cadence with said musical selection, means subdividing said latter waves into a plurality of frequency subbands, means to determine the energy 4characteristic of each of said latter subbands,
band in accordance with the characteristic energy rates of speech at corresponding frequencies of the speech band, and means comprising -a second speech input circuit for` speech waves from another wave source to control said fundamental frequency independently of said first speech wave source.
24. In a sound system, two vinput circuits for sound-representing waves, a common output circuit, a. source of complex tone waves based on a fundamental frequency, means for transmitting waves from said source into said output circuit, a plurality of amplitude patternv control channels for exercising control of the waves so transmitted in each of a number of frequency regions in accordance with frequency regions of the sound-representing waves in one of said input circuits and a pitch control circuit for controlling said fundamental frequency in accordance withother sound-representing'waves in said other input circuit.
25. A system comprising a, plurality of channels for transmitting energy having syllabic time rates of change in the channels, respectively, a source of waves having a sound spectrum the relative amplitudes of the components of which are independent of the relative amplitudes of the components of said transmitted energy, means controlled by the energy in said channels for varying at said rates, respectively, the energy in dierent frequency regions of Waves from said source, and switching means for interchanging said channels in their control'of the energy in said diierent frequency regions.
' trolling the amplitudes of the components of the generated waves in diierent frequency regions in accordance with the energy ow in respective bands ofsaid subdivided waves, and controlling the pitch of the generated waves independently of said reproduced waves from said first record -in order to eiect controlled changes in the rerecording.
HOMER W. DUDLEY.
US273429A 1939-05-13 1939-05-13 System for the artificial production of vocal or other sounds Expired - Lifetime US2243089A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US273429A US2243089A (en) 1939-05-13 1939-05-13 System for the artificial production of vocal or other sounds
FR865087D FR865087A (en) 1939-05-13 1940-04-22 Signaling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US273429A US2243089A (en) 1939-05-13 1939-05-13 System for the artificial production of vocal or other sounds

Publications (1)

Publication Number Publication Date
US2243089A true US2243089A (en) 1941-05-27

Family

ID=23043901

Family Applications (1)

Application Number Title Priority Date Filing Date
US273429A Expired - Lifetime US2243089A (en) 1939-05-13 1939-05-13 System for the artificial production of vocal or other sounds

Country Status (1)

Country Link
US (1) US2243089A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2458227A (en) * 1941-06-20 1949-01-04 Hartford Nat Bank & Trust Co Device for artificially generating speech sounds by electrical means
US2635146A (en) * 1949-12-15 1953-04-14 Bell Telephone Labor Inc Speech analyzing and synthesizing communication system
US2672512A (en) * 1949-02-02 1954-03-16 Bell Telephone Labor Inc System for analyzing and synthesizing speech
US2857465A (en) * 1955-11-21 1958-10-21 Bell Telephone Labor Inc Vocoder transmission system
US2928902A (en) * 1957-05-14 1960-03-15 Vilbig Friedrich Signal transmission
US3090837A (en) * 1959-04-29 1963-05-21 Ibm Speech bandwidth compression system
US3180936A (en) * 1960-12-01 1965-04-27 Bell Telephone Labor Inc Apparatus for suppressing noise and distortion in communication signals
US3470323A (en) * 1944-06-30 1969-09-30 Bell Telephone Labor Inc Signaling system
US3524930A (en) * 1968-07-08 1970-08-18 Us Army Resonance synthesizer for speech research
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer
US4591673A (en) * 1982-05-10 1986-05-27 Lee Lin Shan Frequency or time domain speech scrambling technique and system which does not require any frame synchronization

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2458227A (en) * 1941-06-20 1949-01-04 Hartford Nat Bank & Trust Co Device for artificially generating speech sounds by electrical means
US3470323A (en) * 1944-06-30 1969-09-30 Bell Telephone Labor Inc Signaling system
US2672512A (en) * 1949-02-02 1954-03-16 Bell Telephone Labor Inc System for analyzing and synthesizing speech
US2635146A (en) * 1949-12-15 1953-04-14 Bell Telephone Labor Inc Speech analyzing and synthesizing communication system
US2857465A (en) * 1955-11-21 1958-10-21 Bell Telephone Labor Inc Vocoder transmission system
US2928902A (en) * 1957-05-14 1960-03-15 Vilbig Friedrich Signal transmission
US3090837A (en) * 1959-04-29 1963-05-21 Ibm Speech bandwidth compression system
US3180936A (en) * 1960-12-01 1965-04-27 Bell Telephone Labor Inc Apparatus for suppressing noise and distortion in communication signals
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer
US3524930A (en) * 1968-07-08 1970-08-18 Us Army Resonance synthesizer for speech research
US4591673A (en) * 1982-05-10 1986-05-27 Lee Lin Shan Frequency or time domain speech scrambling technique and system which does not require any frame synchronization

Similar Documents

Publication Publication Date Title
US2151091A (en) Signal transmission
Dudley Remaking speech
Crandall The sounds of speech
Dudley The carrier nature of speech
Snow Audible frequency ranges of music, speech and noise
US2243089A (en) System for the artificial production of vocal or other sounds
US2183248A (en) Wave translation
US2855816A (en) Music synthesizer
US4241235A (en) Voice modification system
US2243527A (en) Production of artificial speech
US2254284A (en) Electrical musical instrument
US2403664A (en) Solo electrical musical instrument
Brice Music engineering
US3083606A (en) Electrical music system
US2121142A (en) System for the artificial production of vocal or other sounds
US2339465A (en) System for the artificial production of vocal or other sounds
US2824906A (en) Transmission and reconstruction of artificial speech
US2243090A (en) Sound record
Olson et al. Electronic music synthesis
Sterne Media or instruments? Yes
Dudley The vocoder—Electrical re-creation of speech
US2484914A (en) Photoelectric keyboard instrument
US2294178A (en) Electrical musical instrument
US2830481A (en) Electrical musical instruments
US1940093A (en) Electric musical instrument