CN101116135B - Sound synthesis - Google Patents

Sound synthesis Download PDF

Info

Publication number
CN101116135B
CN101116135B CN2006800046437A CN200680004643A CN101116135B CN 101116135 B CN101116135 B CN 101116135B CN 2006800046437 A CN2006800046437 A CN 2006800046437A CN 200680004643 A CN200680004643 A CN 200680004643A CN 101116135 B CN101116135 B CN 101116135B
Authority
CN
China
Prior art keywords
noise
parameter
sound
component
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2006800046437A
Other languages
Chinese (zh)
Other versions
CN101116135A (en
Inventor
M·施泽尔巴
A·C·登布林克
A·J·格里茨
A·W·J·乌门
M·克莱恩米德林克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of CN101116135A publication Critical patent/CN101116135A/en
Application granted granted Critical
Publication of CN101116135B publication Critical patent/CN101116135B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H7/00Instruments in which the tones are synthesised from a data store, e.g. computer organs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/18Selecting circuits
    • G10H1/22Selecting circuits for suppressing tones; Preference networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2230/00General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
    • G10H2230/025Computing or signal processing architecture features
    • G10H2230/041Processor load management, i.e. adaptation or optimization of computational load or data throughput in computationally intensive musical processes to avoid overload artifacts, e.g. by deliberately suppressing less audible or less relevant tones or decreasing their complexity
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/471General musical sound synthesis principles, i.e. sound category-independent synthesis methods
    • G10H2250/481Formant synthesis, i.e. simulating the human speech production mechanism by exciting formant resonators, e.g. mimicking vocal tract filtering as in LPC synthesis vocoders, wherein musical instruments may be used as excitation signal to the time-varying filter estimated from a singer's speech
    • G10H2250/495Use of noise in formant synthesis

Abstract

A device (1) is arranged for synthesizing sound represented by sets of parameters, each set comprising noise parameters (NP) representing noise components of the sound and optionally also other parameters representing other components, such as transients and sinusoids. Each set of parameters may correspond with a sound channel, such as a MIDI voice. In order to reduce the computational load, the device comprises a selection unit (2) for selecting a limited number of sets from the total number of sets on the basis of a perceptual relevance value, such as the amplitude or energy. The device further comprises a synthesizing unit (3) for synthesizing the noise components using the noise parameters of the selected sets only.

Description

Sound is synthetic
The present invention relates to the synthetic of sound.More specifically, the present invention relates to a kind of equipment and method of synthetic video, wherein sound is represented by parameter set, and each set comprises other parameters of noise parameter with other components of expression of the noise component of representing sound.
Represent that with parameter set sound is well-known.So-called parameter coding technology is used to encode sound efficiently, representes sound with series of parameters.Suitable demoder can utilize this series of parameters to rebuild original sound fully.This series of parameters can be divided into set, and each is gathered corresponding to individual other sound source (sound channel), such as (people) speaker or musical instrument.
Popular MIDI (musical instrument digital interface) agreement allows music to represent through the set of instructions for musical instruments.Give particular instrument with each command assignment.Each instruction can utilize one or multichannel (in MIDI, being called " part (voices) ") more.The number of channels that can use simultaneously is called polyphony level (polyphony level) or polyphony (polyphony).The MIDI instruction can be by high efficiency of transmission and/or storage.
Compositor generally includes the sound definition of data, for example voice bank (sound bank) or tone color (patch) data.In voice bank, the sample of musical instrument sound is stored as voice data, and tamber data is a sound generator definition controlled variable.
The MIDI instruction makes compositor from voice bank, retrieve voice data, and synthetic sound by this data representation.As common wave table (wavetable) was synthetic, these voice datas can be actual sample sounds, are digitized sound (waveforms).Yet sample sound needs a large amount of storeies usually, and this is infeasible in relative small device, particularly in such as the hand-held consumer devices that moves (honeycomb) phone.
Replacedly, sample sound can be by parameter that comprises amplitude, frequency, phase place and/or envelope shape parameter and the parametric representation that allows to rebuild sample sound.The needed memory space of stored sound sample parameter is significantly less than the actual needed memory space of sample sound of storage usually.Yet the synthetic of sound possibly have huge calculated amount.In the time need carrying out synthetic simultaneously (polyphony of height), especially like this to a lot of parameter sets of representing different sound channels (" part " among the MIDI).Usually linear growth that is to say computation burden along with the passage that will be synthesized (" part ") quantity, along with the degree linear growth of polyphony.This just makes and in handheld device, uses very difficulty of this technology.
By M.Szczerba; The paper " Parametric Audio Coding Based Wavetable Synthesis " that W.Oomen and M.Klein Middelink accomplish; AudioEngineering Society Convention Paper No.6063; Berlin (Germany), discloses a kind of SSC (sinusoidal coding) wavetable synthesis in May, 2004.The SSC scrambler is decomposed into instantaneous, sine and noise component with audio frequency input, and be each generation parametric representation of these components.These parametric representations are stored in the voice bank.SSC demoder (compositor) utilizes this parametric representation to rebuild the original audio input.For the reconstruction noise component, the temporal envelope of individual channels and gain combination separately and addition are mixed white noise with the temporal envelope of this combination, to produce the noise signal of shaping in time then mutually.Utilize the spectrum envelope parameter generating filter coefficient of individual channels, this filter coefficient is used for the time is gone up the noise signal of shaping and carries out filtering, thus produce in time with frequency spectrum on all by the noise signal of shaping.
Although this known configuration is very effective, yet confirming temporal envelope and spectrum envelope for a lot of sound channels needs a large amount of calculated loads.In a lot of modern audio systems, can use 64 sound channels, and imagine more sound channel.This just makes that this known configuration does not suit to be used in the limited relative small device of computing power.
On the other hand, to realizing that in such as the hand-held consumer devices of mobile phone the synthetic demand of sound increases.The consumer hopes that now their handheld device can produce the wider sound of scope, such as different ring tones.
Therefore, the objective of the invention is to overcome these and other problems of prior art, and a kind of equipment and method of noise component of synthetic video are provided, this equipment and method are more efficient, and can reduce calculated load.
Therefore, the present invention provides a kind of equipment of synthetic video, and wherein sound is represented by parameter set, and each set comprises the noise parameter of the noise component of representing sound, and this equipment comprises:
-selecting arrangement based on perceptual relevance value (perceptual relevance value), is selected a limited number of set from whole set,
-synthesizer only utilizes the noise parameter composite noise component of selected set.
Through selecting a limited number of parameter set and only utilizing these a limited number of parameter sets to synthesize, abandon the residue set effectively, can reduce synthetic calculated load greatly.Through utilizing perceptual relevance value to select set, do not use the perceived effect of some parameter set shockingly little.
Should reckon with, for example only utilize 5 in 64 parameter sets, will badly influence the perceived quality that institute rebuilds (that is, synthesizing) sound.Yet the inventor has been found that as in this example, through five set of suitable selection, sound quality is not affected.When number of sets further reduces, cause sound quality to descend.Yet this decline is gradually, and selects the number of three set still can accept.
Except the noise parameter of noise component of expression sound, parameter sets can also comprise other parameters of other components of expression sound.Therefore, each parameter sets can comprise noise parameter and other parameters, such as sine and/or instantaneous parameters.Yet it also is possible that set includes only noise parameter.
Notice that the selection of sets of noise parameters is preferably irrelevant with other arbitrary parameters, such as sine and instantaneous parameters.Yet in certain embodiments, selecting arrangement also is configured to or more other parameters based on other sound component of expression, from whole set, selects a limited number of set.That is to say, can comprise any sine and/or the transient component parameter of set, and the selection of influence set noise parameter thus.
In a preferred embodiment, this equipment comprises and is used to adjudicate the judgement part that will select which parameter set, and the selection part of the Information Selection parameter set that is used for providing based on the judgement part.Yet, it is contemplated that such embodiment, wherein, the judgement part is formed an independent integral unit with the selection part.Replacedly, this equipment can comprise the selection part that is used for selecting based on the perceptual relevance value that is included in parameter set parameter set.If comprise perceptual relevance value or any other values that need not any other judging process and confirm to select in the parameter set, so just no longer need adjudicate part.
Synthesis device of the present invention can comprise the single wave filter that all noises that are selected set is carried out frequency spectrum shaping; And Paul levinson-De Bin (Levinson-Durbin) unit that is used for confirming the filter filtering parameter, wherein this single wave filter preferably is made up of (Laguerre) wave filter in the glug.By this way, can realize synthesizing very efficiently.
Valuably, equipment of the present invention may further include gain compensation means, is used for selected noise component is carried out gain compensation to any owing to be rejected the energy loss that any noise component of (rejected) causes.Because the energy distribution of unaccepted any noise component is on selected noise component, so this gain compensation means allows the gross energy of noise to keep basically not influenced by selection course.
In addition; The present invention provides a kind of encoding device that utilizes parameter set to represent sound; Each parameter set comprises the noise parameter of the noise component of representing sound, and this equipment comprises relevant (relevance) detecting device, is used to provide the relevant correlation of perception of each noise parameter of expression.This correlation parameter preferably is added in each set, and can be determined based on sensor model.The parameter set that obtains can convert sound into again by the synthesis device of above-mentioned definition.
The present invention also provides a kind of subscriber equipment that comprises the synthesis device of above-mentioned definition.This subscriber equipment preferably but must not be portable be more preferably hand-heldly, can be made up of mobile (honeycomb) phone, CD Player, DVD player, MP3 player, PDA (personal digital assistant) or other suitable equipment.
The present invention further provides a kind of synthetic sound method of being represented by parameter set, and each set comprises the noise parameter of the noise component of representing sound, and this method comprises the steps:
-based on perceptual relevance value, from whole set, select a limited number of set,
-only utilize the noise parameter composite noise component of selected set.
In the method for the invention, perceptual relevance value can be indicated noise amplitude and/or noise energy.
Parameter set can only comprise noise parameter, but also can comprise other parameters of other components of expression sound, such as sinusoidal and/or instantaneous.
Method of the present invention can comprise further step: be directed against any because the energy loss that unaccepted any noise component causes is carried out gain compensation to selected noise component.Through using this step, the noise gross energy does not receive the influence of selection course basically.
The present invention is extra also to provide a kind of computer program, is used to carry out the method for above-mentioned definition.Computer program can comprise on the light or magnetic carrier that is stored in such as CD or DVD, perhaps the set of storage and the computer executable instructions can be for example downloaded from remote server via the Internet.
Below with reference to exemplary embodiment shown in the drawings, the present invention is further explained, wherein:
Fig. 1 schematically shows according to noise synthesis device of the present invention.
Fig. 2 schematically shows the parameter set of the expression sound that is used for the present invention.
Fig. 3 has schematically shown equipments choice part among Fig. 1 in more detail.
Fig. 4 has schematically shown the composite part of equipment among Fig. 1 in more detail.
Fig. 5 schematically shows the sound synthesis device that has merged present device.
Fig. 6 schematically shows audio coding equipment.
Only comprise selected cell (selecting arrangement) 2 and synthesis unit (synthesizer) 3 through the noise synthesis device 1 shown in the limiting examples among Fig. 1.According to the present invention, selected cell 2 receives noise parameter NP, selects the noise parameter of limited quantity, and the parameter N P ' of these selections is passed to synthesis unit 3.Synthesis unit 3 only utilizes the noise parameter NP ' that selects to synthesize shaped noise, and promptly time and/or spectrum envelope are by the noise of shaping.To combine Fig. 4 below, discuss an exemplary embodiment of synthesis unit 3 in more detail.
Noise parameter NP can be audio parameter collection S 1, S 2..., S NA part, as shown in Figure 2.Shown in instance in, parameter set S i(i=1...N) comprise the instantaneous parameters TP that representes instantaneous sound component, the sine parameter SP of expression sinusoidal sound components and the noise parameter NP of expression noise sound components.S set iCan utilize aforesaid SSC scrambler or other suitable scramblers to produce.Will understand that some scramblers can not produce instantaneous parameters (TP) and other scramblers can not produce sine parameter (SP).These parameters can be followed midi format, also can not follow midi format.
Each S set iThe sound channel (perhaps " part " in the MIDI system) that can represent an activation.
Illustrate in greater detail the selection of noise parameter among Fig. 3, it schematically shows the embodiment of the selected cell 2 of equipment 1.The exemplary selected cell 2 of this of Fig. 3 comprises judgement part 21 and selects part 22.Judgement part 21 all receives noise parameter NP with selection part 22.21 of parts of judgement need suitable choice judgement based on composition parameter.
Suitable composition parameter is gain g iIn a preferred embodiment, g iBe noise collection S iThe gain of (referring to Fig. 2) temporal envelope.Yet, can also use the amplitude of individual noise components, perhaps can derive energy value by parameter.Will be clear, amplitude and energy have been indicated the perception of noise, so their amplitude has been formed perceptual relevance value.Valuably, usability perception model (acoustics and the psychological perception that for example comprise people's ear) is confirmed also (selectively) weighting proper parameters.
It is synthetic that which noise parameter 21 judgements of judgement part will be used for noise with.The optimization criterion that utilization is applied on the perceptual relevance value is entered a judgement, for example from AG g iIn find five highest-gains.Corresponding assembly (for example 2,3,12,23 and 41) is fed to and selects part 22.In certain embodiments, selecting parameter (being correlation) can be included in noise parameter NP has suffered.In these embodiment, judgement part 21 can be omitted.
Select part 22 to be configured to be used to select noise parameter by the set of judgement part 21 indications.Abandon the noise parameter of residue set.As a result, next have only a limited number of noise parameter to be passed to synthesis unit (3 among Fig. 1) also is synthesized.Therefore, greatly reduce the calculated load of synthesis unit.
The inventor recognizes that the quantity of the noise parameter that is used to synthesize can significantly reduce, and sound quality is not had any substantial loss.The number that is selected set can be less relatively, for example from selecting 5 (7.8%) 64 altogether.Usually, although at least 10% be preferred, the number that is selected set should be 4.5% of total number at least, in case sound quality has any loss that perceives.Further reduce to and be lower than approximately 4.5% if be selected the number of set, the sound quality that then is synthesized descends gradually, but for some is used, can also accept.Will be appreciated that also can use the higher number percent such as 15%, 20%, 30% or 40%, even now will increase calculated load.
The judgement that comprise which set, does not comprise which set is made based on perceptual relevance value by judgement part 21; The for example amplitude of noise component (grade), the sharpness data (articulation data) that from voice bank (control envelop generator, LF oscillator etc.), obtain and the information that from the MIDI data, obtains for example have (note-on) speed and the controller relevant with sharpness of record.Also can utilize other perceptual relevance value.Usually, have maximum related value, for example M set of the highest noise amplitude (or gain) is selected.
In addition, or alternately, judgement part 21 can be used other parameters from each set.For example, can use sine parameter to reduce the number of noise parameter.Utilize sinusoidal (and/or instantaneous) parameter, can construct and shelter curve, can be left in the basket thereby amplitude is lower than the noise parameter of sheltering curve.The noise parameter of set therefore can with shelter curve ratio.If they are fallen below the curve, then refuse the noise parameter of this set.
Will be appreciated that S set i(Fig. 2) in each time quantum, carry out usually, for example each time frame with the selection of noise and synthetic.Therefore, noise parameter can only refer to certain time quantum with other parameters.Time quantum such as time frame can be overlapped.
Illustrate in greater detail the exemplary embodiment of the synthesis unit 3 of Fig. 1 among Fig. 4.In this embodiment, utilize time (time domain) envelope and frequency spectrum (frequency field) envelope to produce noise.
Temporal envelope generator 311,312 and 313 receives to correspond respectively to and is selected S set iEnvelope parameters b i(i=1...M).According to the present invention, the number M that is selected set is less than available number of sets N.The temporal envelope parameter b iDefinition is by the temporal envelope of generator 311-313 output.Multiplier 331,332 and 333 usefulness gain g separately iMultiply by temporal envelope.The adjusted temporal envelope of the gain that as a result of obtains is by totalizer 341 additions, and is fed to next multiplier 339, and (in vain) noise that generates with noise generator 305 there multiplies each other.As a result of obtain by shaping in time but the noise signal that has in fact balanced frequency spectrum usually is fed to (optional) overlapping addition again (overlap-and-add) circuit 360.In this circuit, the noise segments of time frame is combined subsequently, forms continuous signal, and it is fed to wave filter 390.
As stated, gain g 1To g MCorresponding to selecteed set.Because N available set arranged, g therefore gains M+1To g NCorresponding to unaccepted set.In preferred embodiment shown in Figure 4, do not abandon gain g M+1To g N, but regulate gain g with them 1To g MThis gain compensation is used for reducing or even eliminates noise parameter and select the influence to the grade (being amplitude) that is synthesized noise.
Therefore, extra totalizer 343 and convergent-divergent (scaling) unit 349 of also comprising of the embodiment of Fig. 4.Totalizer 343 g that will gain M+1To g NAddition, and the storage gain that will as a result of obtain is fed to the unit for scaling 349 of applying of zooming coefficient 1/M, to produce compensating gain g c, wherein M is selecteed as stated number of sets.Then with this compensating gain g cBe added to each gain g through totalizer 334,335... lTo g M, number of adders equals M.Be distributed in through the storage gain that will be rejected component and be selected on the component, it is constant basically that noise energy keeps, because the change in sound level that noise component is selected to cause has been avoided.
Will be appreciated that totalizer 343, unit for scaling 349 and totalizer 334,335... are optional, in other embodiments, these unit can not occur.If unit for scaling 349 can replacedly be arranged between totalizer 341 and the multiplier 339.
Wave filter 390 is (Laguerre) wave filter in the glug in a preferred embodiment, is used for to the frequency spectrum of noise signals shaping.From being selected S set iThe spectrum envelope parameter a that derives iBe fed to auto-correlation unit 321, auto-correlation unit 321 calculates the auto-correlation of these parameters.The auto-correlation addition that totalizer 342 will as a result of obtain, and it is fed to unit 370, so that confirm the filter coefficient of frequency spectrum shaping wave filter 390.In a preferred embodiment, unit 370 is configured to confirm filter coefficient according to known Paul levinson-De Bin (Levinson-Durbin) algorithm.The linear filter coefficient that will as a result of be obtained by converting unit 380 then converts (Laguerre) filter coefficient in the glug into.Utilize (Laguerre) wave filter 390 in the glug to come the spectrum envelope of shaping (in vain) noise then.
As definite every group of parameter a iSubstituting of autocorrelation function can be used more high-efficiency method.The power spectrum that calculating is selected set (that is, selecteed activate channel or " part ") carries out inverse fourier transform through the power spectrum to addition then and calculates autocorrelation function.The autocorrelation function that will as a result of obtain then is fed to Paul levinson-De Bin (Levinson-Durbin) unit 370.
Will be appreciated that parameter a i, b i, g iWith going into all is the part of the noise parameter represented with NP among Fig. 1 and Fig. 2.In the selected cell embodiment of Fig. 3,22 of judgement parts are used gain parameter g iYet, it is contemplated that such embodiment, wherein parameter a i, b i, g iWith λ some or all and possibly also have other parameters (for example about sinusoidal component and/or instantaneous) also can be used by judgement part 22.It is noted that it can be constant that parameter is gone into, and need not be the part of noise parameter NP.
Fig. 5 schematically shows the sound synthesizer that the present invention is used for.Compositor 5 comprises noise compositor 51, sinusoidal compositor 52 and instantaneous compositor 53.Output signal (synthetic instantaneous, sine and noise) by totalizer 54 additions, forms synthetic audio output signal.Noise compositor 51 comprises as above defined equipment (1 among Fig. 1) valuably.
Compositor 5 can be the part of audio frequency (sound) demoder (not shown).Audio decoder can comprise demodulation multiplexer, is used for the incoming bit stream demultiplexing, and isolates the set of instantaneous parameters (TP), sine parameter (SP) and noise parameter (NP).
Only come coding audio signal s (n) with three phases through the audio coding equipment 6 shown in the limiting examples among Fig. 6.
In phase one, any momentary signal component that utilizes instantaneous parameters to extract among the 61 couples of sound signal s in (TPE) unit (n) is encoded.This parameter is offered multiplexed (MUX) unit 68 and instantaneous synthetic (TS) unit 62.When the 68 pairs of parameters in multiplexed unit suitably make up and multiplexed so that in sending to such as Fig. 5 during the demoder of equipment 5, instantaneous synthesis unit 62 is rebuild instantaneous (transients) of coding.In first assembled unit 63, the instantaneous of these reconstructions deducted from original audio signal s (n), eliminated instantaneous M signal basically to form.
In the subordinate phase, any sinusoidal signal component in the M signal (being sine and cosine) is by sinusoids parameter extraction (SPE) unit 64 codings.The parameter that as a result of obtains is fed to multiplexed unit 68 and sinusoidal synthetic (SS) unit 65.In second assembled unit 66, from middle signal, deduct the sine of rebuilding by sinusoidal synthesis unit 65, obtain residual (residual) signal.
In phase III, utilize time/frequency envelope data extraction (TFE) unit 67, residual signal is encoded.Therefore it is noted that owing to removed instantaneously and sinusoidal in first and second stages, suppose that residual signal is a noise signal.Therefore, the suitable noise parameter of time/frequency envelope data extraction (TFE) unit 67 usefulness is represented residual noise.
According to prior art about the summary of noise modeling and coding techniques in the S.N.Levine of Stanford Univ USA statement to some extent in the 5th chapter of the paper of delivering in 1999 " Audio Representations forData Compression and Compressed Domain Processing ", its full content here is incorporated in this document.
The parameter that obtains from all three phases is by appropriate combination, and multiplexed by multiplexed (MUX) unit 68, and additional parameter coding is also carried out in this multiplexed unit 68, and huffman coding or time difference coding for example is so that reduce the required bandwidth of transmission.
Notice that parameter extraction (i.e. coding) unit 61,64 and 67 can quantize the parameter of being extracted.Replacedly or additionally, can in multiplexed (MUX) unit 68, quantize.Notice that further s (n) is a digital signal, n representes sample number, S set i(n) being used as digital signal sends.Yet, also can be applied to simulating signal.
When in MUX unit 68, make up with multiplexed (and encode alternatively and/or quantize) after, come transmission parameter via transmission medium, transmission medium is such as being satellite link, glass fiber cable, copper cable and other suitable media arbitrarily.
Audio coding equipment 6 further comprises correlation detector (RD) 69.This correlation detector 69 receives predetermined parameter, such as noise gain g iAnd confirm that their acoustics (perception) is relevant (as shown in Figure 3).The correlation that as a result of obtains is fed back to multiplexer 68, and they are inserted into S set there i(n) form output bit flow.Demoder can use the correlation that is included in this set to select suitable noise parameter then, and needn't confirm that their perception is relevant.Like this, demoder can be simpler and quick.
Although correlation detector (RD) 69 shown in Fig. 6 for being connected to multiplexer 68, correlation detector 69 also may instead be the time of being directly connected to/frequency envelope data extraction (TFE) unit 67.The operation of correlation detector 69 can be similar with the operation of the judgement part 21 shown in Fig. 3.
The equipment of audio coding shown in Fig. 66 has three phases.Yet audio coding equipment 6 can also be made up of the stage that is less than three, for example had only two stages that produce sinusoidal and noise parameter, perhaps more than three phases, produced extra parameter.Therefore it is contemplated that and unit 61,62 and 63 wherein do not occur by such embodiment.The audio coding equipment 6 of Fig. 6 can be arranged to generation valuably can be by the audio frequency parameter of the decoding of synthesis device shown in Fig. 1 (synthesizing).
Synthesis device of the present invention can be used for portable set; Hand-held consumer devices particularly is such as cell phone, PDA (personal digital assistant), wrist-watch, game station, solid state audio player, electronic musical instrument, digital telephone answer phone, portable CD and/or DVD player etc.
Can be clear that from above; The present invention also provides a kind of synthetic sound method of being represented by parameter set; Wherein each parameter set comprises the noise parameter of the noise component of representing sound, also comprises other parameters of representing other components alternatively, such as instantaneous and/or sinusoidal.Method of the present invention comprises the steps: in essence
-based on perceptual relevance value, from whole set, select a limited number of set,
-only utilize the noise parameter composite noise component of selected set.
Method of the present invention can extraly comprise following optional step: be directed against any because the energy loss that the refusal noise component causes is carried out gain compensation to selected noise component.Further optional method step can be derived from top description.
Additionally; The present invention provides a kind of encoding device of representing sound with parameter set; Each parameter set comprises the noise parameter of the noise component of representing sound; Preferably also comprise instantaneous and/or sine parameter, this equipment comprises correlation detector, is used to provide the relevant correlation of perception of each noise parameter of expression.
The present invention is based on such understanding, promptly when the noise component of synthetic video, select a limited number of sound channel in fact can not can to cause being synthesized sound and degrade.The present invention has benefited from further understanding, promptly selects sound channel to minimize based on perceptual relevance value or has eliminated the distortion that is synthesized sound.
Notice that any term that uses in the document should not be construed as limiting the scope of the invention.Especially, word " comprises " and " comprising " do not mean that and get rid of any element do not have special declaration.(circuit) element can be replaced by a plurality of (circuit) elements or their equivalent.
It will be apparent to those skilled in the art that the embodiment that explains above the invention is not restricted to, can not depart under the situation of liking the defined scope of the invention of claim enclosed, make a lot of modifications and interpolation.

Claims (18)

1. equipment (1) that is used for synthetic video, wherein sound utilizes the incompatible expression of parameter set, and each set comprises the noise parameter (NP) of the noise component of representing sound, and this equipment comprises:
-selecting arrangement (2) is used for the part set of selecting to have maximum perceptual relevance value from the set of whole numbers based on perceptual relevance value, the amplitude of wherein said perceptual relevance value indication noise component and/or energy and
-synthesizer (3) is used for only utilizing the noise parameter of selected set to come the composite noise component.
2. equipment according to claim 1, wherein each parameter sets further comprises the transient component of representing sound and/or other parameters (SP of sinusoidal component; TP).
3. equipment according to claim 2, wherein selecting arrangement (2) also is arranged to: based on or more other parameters (SP of other components of representing sound; TP), from the set of whole numbers, select a limited number of set.
4. equipment according to claim 1, the wherein temporal envelope and/or the spectrum envelope of noise parameter (NP) definition noise.
5. equipment according to claim 1, wherein each parameter sets is corresponding to single sound channel.
6. equipment according to claim 1 comprises: be used to adjudicate the judgement part (21) that will select which parameter sets, and the selection part (22) that is used for selecting based on the information that judgement part (21) provides parameter sets.
7. equipment according to claim 1 comprises the selection part (22) that is used for selecting based on the perceptual relevance value that is comprised in parameter sets parameter sets.
8. equipment according to claim 1, wherein synthesizer (3) comprising: be used for selecting the noise of set to carry out the single filter (390) of frequency spectrum shaping to all, and Paul levinson-De Bin unit (370) that are used for the filtering parameter of definite wave filter (390).
9. equipment according to claim 1 further comprises gain compensation means (343,349), is used for to any because energy loss that any unaccepted noise component caused and the gain of selected noise component is compensated.
10. an audio frequency compositor (5) comprises the equipment (1) that is used for synthetic video according to claim 1.
11. a subscriber equipment comprises the equipment (1) that is used for synthetic video according to claim 1.
12. the method for a synthetic video, wherein sound utilizes the incompatible expression of parameter set, and each set comprises the noise parameter (NP) of the noise component of representing sound, and this method comprises the steps:
-based on perceptual relevance value, from the set of whole numbers, select to have the part set of maximum perceptual relevance value, the amplitude of wherein said perceptual relevance value indication noise component and/or energy and
-only utilize the noise parameter of selected set, composite noise component.
13. method according to claim 12, wherein each parameter sets further comprises the transient component of representing sound and/or other parameters (SP of sinusoidal component; TP).
14. method according to claim 13 is wherein also based on of other component or more other parameters (SP of expression sound; TP) carry out the step of from the set of whole numbers, selecting a limited number of set.
15. method according to claim 12, the wherein temporal envelope and/or the spectrum envelope of noise parameter definition noise.
16. method according to claim 12, wherein each parameter sets is corresponding to single sound channel.
17. method according to claim 12 further may further comprise the steps:, the gain of selected noise component is compensated to any because energy loss that any unaccepted noise component caused.
18. method according to claim 12, wherein each parameter sets comprises perceptual relevance value.
CN2006800046437A 2005-02-10 2006-02-01 Sound synthesis Expired - Fee Related CN101116135B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP05100948 2005-02-10
EP05100948.8 2005-02-10
PCT/IB2006/050338 WO2006085244A1 (en) 2005-02-10 2006-02-01 Sound synthesis

Publications (2)

Publication Number Publication Date
CN101116135A CN101116135A (en) 2008-01-30
CN101116135B true CN101116135B (en) 2012-11-14

Family

ID=36540169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006800046437A Expired - Fee Related CN101116135B (en) 2005-02-10 2006-02-01 Sound synthesis

Country Status (6)

Country Link
US (1) US7781665B2 (en)
EP (1) EP1851752B1 (en)
JP (1) JP5063364B2 (en)
KR (1) KR101207325B1 (en)
CN (1) CN101116135B (en)
WO (1) WO2006085244A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5063363B2 (en) * 2005-02-10 2012-10-31 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech synthesis method
JP2009543112A (en) * 2006-06-29 2009-12-03 エヌエックスピー ビー ヴィ Decoding speech parameters
US20080184872A1 (en) * 2006-06-30 2008-08-07 Aaron Andrew Hunt Microtonal tuner for a musical instrument using a digital interface
US9111525B1 (en) * 2008-02-14 2015-08-18 Foundation for Research and Technology—Hellas (FORTH) Institute of Computer Science (ICS) Apparatuses, methods and systems for audio processing and transmission
CN102057356A (en) * 2008-06-11 2011-05-11 高通股份有限公司 Method and system for measuring task load
JP6821970B2 (en) * 2016-06-30 2021-01-27 ヤマハ株式会社 Speech synthesizer and speech synthesizer
CN113053353B (en) * 2021-03-10 2022-10-04 度小满科技(北京)有限公司 Training method and device of speech synthesis model
CN113470691A (en) * 2021-07-08 2021-10-01 浙江大华技术股份有限公司 Automatic gain control method of voice signal and related device thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004040553A1 (en) * 2002-10-31 2004-05-13 Nec Corporation Bandwidth expanding device and method
US6766293B1 (en) * 1997-07-14 2004-07-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for signalling a noise substitution during audio signal coding

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2581047B2 (en) * 1986-10-24 1997-02-12 ヤマハ株式会社 Tone signal generation method
US5029509A (en) * 1989-05-10 1991-07-09 Board Of Trustees Of The Leland Stanford Junior University Musical synthesizer combining deterministic and stochastic waveforms
DE69028072T2 (en) * 1989-11-06 1997-01-09 Canon Kk Method and device for speech synthesis
FR2679689B1 (en) * 1991-07-26 1994-02-25 Etat Francais METHOD FOR SYNTHESIZING SOUNDS.
US5248845A (en) * 1992-03-20 1993-09-28 E-Mu Systems, Inc. Digital sampling instrument
US5763800A (en) * 1995-08-14 1998-06-09 Creative Labs, Inc. Method and apparatus for formatting digital audio data
JPH11513820A (en) * 1995-10-23 1999-11-24 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア Control structure for speech synthesis
US5686683A (en) * 1995-10-23 1997-11-11 The Regents Of The University Of California Inverse transform narrow band/broad band sound synthesis
WO1997017692A1 (en) * 1995-11-07 1997-05-15 Euphonics, Incorporated Parametric signal modeling musical synthesizer
JPH1091194A (en) * 1996-09-18 1998-04-10 Sony Corp Method of voice decoding and device therefor
US5886276A (en) * 1997-01-16 1999-03-23 The Board Of Trustees Of The Leland Stanford Junior University System and method for multiresolution scalable audio signal encoding
US5977469A (en) 1997-01-17 1999-11-02 Seer Systems, Inc. Real-time waveform substituting sound engine
EP0878790A1 (en) 1997-05-15 1998-11-18 Hewlett-Packard Company Voice coding system and method
US5920843A (en) * 1997-06-23 1999-07-06 Mircrosoft Corporation Signal parameter track time slice control point, step duration, and staircase delta determination, for synthesizing audio by plural functional components
US7756892B2 (en) * 2000-05-02 2010-07-13 Digimarc Corporation Using embedded data with file sharing
US5900568A (en) * 1998-05-15 1999-05-04 International Business Machines Corporation Method for automatic sound synthesis
US6240386B1 (en) 1998-08-24 2001-05-29 Conexant Systems, Inc. Speech codec employing noise classification for noise compensation
WO2000011649A1 (en) 1998-08-24 2000-03-02 Conexant Systems, Inc. Speech encoder using a classifier for smoothing noise coding
US6493666B2 (en) 1998-09-29 2002-12-10 William M. Wiese, Jr. System and method for processing data from and for multiple channels
JP3707300B2 (en) * 1999-06-02 2005-10-19 ヤマハ株式会社 Expansion board for musical sound generator
JP4220108B2 (en) 2000-06-26 2009-02-04 大日本印刷株式会社 Acoustic signal coding system
JP2002140067A (en) * 2000-11-06 2002-05-17 Casio Comput Co Ltd Electronic musical instrument and registration method for electronic musical instrument
SG118122A1 (en) * 2001-03-27 2006-01-27 Yamaha Corp Waveform production method and apparatus
PL365018A1 (en) * 2001-04-18 2004-12-27 Koninklijke Philips Electronics N.V. Audio coding
WO2002087241A1 (en) * 2001-04-18 2002-10-31 Koninklijke Philips Electronics N.V. Audio coding with partial encryption
EP1451809A1 (en) * 2001-11-23 2004-09-01 Koninklijke Philips Electronics N.V. Perceptual noise substitution
ES2354427T3 (en) * 2003-06-30 2011-03-14 Koninklijke Philips Electronics N.V. IMPROVEMENT OF THE DECODED AUDIO QUALITY THROUGH THE ADDITION OF NOISE.
US7676362B2 (en) * 2004-12-31 2010-03-09 Motorola, Inc. Method and apparatus for enhancing loudness of a speech signal
JP2009543112A (en) * 2006-06-29 2009-12-03 エヌエックスピー ビー ヴィ Decoding speech parameters

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766293B1 (en) * 1997-07-14 2004-07-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for signalling a noise substitution during audio signal coding
WO2004040553A1 (en) * 2002-10-31 2004-05-13 Nec Corporation Bandwidth expanding device and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Marek Szczerba, Werner Oomen and Marc Klein Middelink.Parametric Audio Coding Based Wavetable Synthesis.《AES 116th Convention, Berlin, Germany》.2004, *
MarekSzczerba Werner Oomen and Marc Klein Middelink.Parametric Audio Coding Based Wavetable Synthesis.《AES 116th Convention

Also Published As

Publication number Publication date
JP2008530608A (en) 2008-08-07
KR101207325B1 (en) 2012-12-03
CN101116135A (en) 2008-01-30
KR20070104465A (en) 2007-10-25
EP1851752A1 (en) 2007-11-07
EP1851752B1 (en) 2016-09-14
WO2006085244A1 (en) 2006-08-17
JP5063364B2 (en) 2012-10-31
US20080184871A1 (en) 2008-08-07
US7781665B2 (en) 2010-08-24

Similar Documents

Publication Publication Date Title
CN101116135B (en) Sound synthesis
KR101315075B1 (en) Sound synthesis
JP5934922B2 (en) Decoding device
EP2054875B1 (en) Enhanced coding and parameter representation of multichannel downmixed object coding
JP4705203B2 (en) Voice quality conversion device, pitch conversion device, and voice quality conversion method
US20060050898A1 (en) Audio signal processing apparatus and method
EP1701336B1 (en) Sound processing apparatus and method, and program therefor
JP2007187905A (en) Signal-encoding equipment and method, signal-decoding equipment and method, and program and recording medium
JP2003108197A (en) Audio signal decoding device and audio signal encoding device
CN101213592B (en) Device and method of parametric multi-channel decoding
JP2796408B2 (en) Audio information compression device
JP4403721B2 (en) Digital audio decoder
KR100264389B1 (en) Computer music cycle with key change function
D'Aguanno et al. MP3 window-switching pattern analysis for general purposes beat tracking on music with drums
JP2001083971A (en) Composing device for waveform signal, and compressing and extenting device for time axis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121114

Termination date: 20180201

CF01 Termination of patent right due to non-payment of annual fee