WO2016009654A1 - 雑音抑圧システムと雑音抑圧方法及びプログラムを格納した記録媒体 - Google Patents
雑音抑圧システムと雑音抑圧方法及びプログラムを格納した記録媒体 Download PDFInfo
- Publication number
- WO2016009654A1 WO2016009654A1 PCT/JP2015/003604 JP2015003604W WO2016009654A1 WO 2016009654 A1 WO2016009654 A1 WO 2016009654A1 JP 2015003604 W JP2015003604 W JP 2015003604W WO 2016009654 A1 WO2016009654 A1 WO 2016009654A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- signal
- ratio
- prior
- model
- Prior art date
Links
- 230000001629 suppression Effects 0.000 title claims abstract description 151
- 238000000034 method Methods 0.000 title claims description 41
- 238000004364 calculation method Methods 0.000 claims abstract description 117
- 238000012937 correction Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 description 73
- 238000009826 distribution Methods 0.000 description 34
- 238000006243 chemical reaction Methods 0.000 description 25
- 238000010586 diagram Methods 0.000 description 10
- 238000001228 spectrum Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000003595 spectral effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
Definitions
- the present invention relates to a noise suppression technique, and more particularly, to a noise suppression system, a noise suppression method, and a program suitable for a system or application for extracting a desired signal by suppressing a noise component included in an input signal.
- Patent Document 1 obtains temporary estimated speech by suppressing noise included in the input speech signal, and corrects the temporary estimated speech using a standard pattern of speech without missing speech information.
- a configuration that can remove a noise component with high accuracy is disclosed.
- the probability distribution of the temporary estimated speech obtained by the expected value calculation process using the probability that the probability distribution constituting the standard pattern outputs the temporary estimated speech and the average value of the probability distribution constituting the standard pattern.
- the expected value is used as a correction value for provisional estimated speech.
- Patent Document 2 discloses a method for removing noise.
- the noise removal method first obtains a first signal-to-noise ratio for each frequency, obtains a weight for each frequency based on the first signal-to-noise ratio, and obtains a weight for each frequency by weighting the frequency domain signal.
- the estimated noise for each frequency is obtained based on the weighted frequency domain signal.
- the noise removal method obtains a second signal-to-noise ratio based on the frequency domain signal and the estimated noise for each frequency, determines a suppression coefficient based on the second signal-to-noise ratio, and determines the suppression coefficient in the frequency domain. Weight the signal.
- the present invention was devised in view of the above-described problems, and its purpose is to improve the accuracy of noise suppression even when the noise level fluctuates with respect to an input signal in which noise is mixed in a desired signal. It is an object of the present invention to provide a technique for avoiding a decrease and suppressing a noise component with high accuracy.
- a noise suppression system is provided with the following configuration.
- the noise suppression system is based on a prior signal-to-noise ratio model or a signal model and a noise model for an estimated value of a signal-to-noise prior signal-to-noise ratio (Signal to Noise ratio) estimated from an input signal mixed with a signal and noise.
- a prior SN ratio estimation / expectation value calculation unit that performs correction and obtains an expected SN ratio value.
- the noise suppression system includes a noise suppression coefficient calculation unit that calculates a noise suppression coefficient using the expected value of the prior S / N ratio, and multiplies the input signal by the noise suppression coefficient to reduce noise included in the input signal. It has a noise suppression unit for suppressing.
- a noise suppression method uses a prior signal-to-noise ratio model or a signal for an estimated value of an prior signal-to-noise ratio related to the signal and the noise estimated from an input signal in which the signal and noise are mixed. Based on the model and the noise model, correction is performed to obtain the expected value of the prior S / N ratio. Further, the noise suppression method calculates a noise suppression coefficient using the expected value of the prior S / N ratio, and multiplies the input signal by the noise suppression coefficient, thereby suppressing a noise component included in the input signal. .
- a program for causing a computer to execute the following processing is provided.
- the processing is performed by correcting the estimated value of the prior signal-to-noise ratio related to the signal and the noise estimated from the input signal mixed with the signal and the noise based on the prior signal-to-noise ratio model or the signal model and the noise model.
- This is a process of acquiring the expected value of the prior SN ratio.
- the processing includes a process of calculating a noise suppression coefficient using the expected value of the prior S / N ratio, and a noise component included in the input signal is suppressed by multiplying the input signal by the noise suppression coefficient. It is processing to do.
- a computer-readable recording medium non-transitorytranscomputer readable recording medium
- the present invention it is possible to avoid a reduction in noise suppression accuracy and to suppress noise components with high accuracy even when the magnitude of noise fluctuates with respect to an input signal in which noise is mixed in a desired signal.
- FIG. 12 is a diagram schematically illustrating a basic concept common to the embodiments.
- a noise suppression system (10) according to an aspect of the present invention includes a prior SN ratio estimation / expectation value calculation unit (11), a noise suppression coefficient calculation unit (12), and a noise suppression unit (13).
- the prior signal-to-noise ratio estimation / expected value calculation unit (11) corrects the signal-to-noise signal-to-noise ratio estimated value (prior signal-to-noise ratio estimated value) estimated from the input signal in which the signal and noise are mixed, and An expected value (R snE ) is acquired. The correction is based on a prior signal-to-noise ratio model or a signal model and a noise model.
- the noise suppression coefficient calculation unit (12) calculates the noise suppression coefficient (W o ) using the expected value of the prior S / N ratio (R snE ). Further, the noise suppression unit (13) suppresses a noise component included in the input signal by multiplying the input signal by a noise suppression coefficient (W o ), and outputs an estimated value of the signal. You may make it implement
- the noise suppression system (100 in FIG. 1) includes a first prior SN ratio estimation unit (101 in FIG. 1), a storage unit (105 in FIG. 1), a prior SN ratio. Expected value calculation unit (102 in FIG. 1).
- the first prior signal-to-noise ratio estimation unit (101) receives an input signal mixed with a signal and noise, estimates the signal and noise from the input signal, and estimates the prior signal-to-noise ratio related to the estimated signal and noise.
- the storage unit (105) stores a pre-SNR model (M sn ) prepared in advance.
- the expected value calculation unit (102) of the prior SN ratio corrects the prior SN ratio estimated by the first prior SN ratio estimation unit (101) using the prior SN ratio model stored in the storage unit (105). Thus, the expected value (R snE ) of the prior SN ratio is calculated.
- the noise suppression coefficient calculation unit (103 in FIG. 1) calculates the noise suppression coefficient (W o ) using the expected value (R snE ) of the prior S / N ratio.
- the noise suppression unit (104 in FIG. 1) multiplies the input signal by a noise suppression coefficient (W o ) to suppress a noise component included in the input signal, and outputs an estimated value of the signal.
- the first prior SN ratio estimation unit (101), the storage unit (105), and the prior SN ratio expected value calculation unit (102) are the prior SN ratio estimation / expectation value calculation unit (11) of FIG. Corresponding to
- a prior S / N ratio model may be estimated using a speech model prepared in advance and a noise model prepared in advance instead of a pre-prepared S / N ratio model.
- the noise suppression system includes a first speech and first noise estimation unit (305 in FIG. 6), a storage unit (307 in FIG. 6), a storage unit (308 in FIG. 6), An advance SN ratio expected value calculation unit (306 in FIG. 6) is provided.
- the first speech and first noise estimation unit (305) receives an input signal in which a signal and noise are mixed, and estimates the signal and noise from the input signal.
- the storage unit (307) stores a voice model (M s ) prepared in advance.
- the storage unit (308) stores a noise model (M n ) prepared in advance.
- the expected value calculation unit (306) of the prior S / N ratio receives the first speech and the signal and noise estimated by the first noise estimation unit (305), and stores the prior SNR with respect to the noise of the signal. Correction is performed using the speech model and noise model respectively stored in the units (307, 308), and an expected value (R snE ) of the prior S / N ratio is calculated.
- the noise suppression coefficient calculation unit (303 in FIG. 6) calculates the noise suppression coefficient (W o ) using the expected value (R snE ) of the prior S / N ratio.
- the first speech and first noise estimation unit (305), the storage units (307, 308), and the prior SN ratio expected value calculation unit (306) perform the prior SN ratio estimation / expectation in FIG. This corresponds to the value calculation unit (11).
- the noise suppression system (400 in FIG. 9) receives an input signal mixed with a signal and noise, and estimates the signal and noise from the input signal. And a first noise estimation unit (405 in FIG. 9) and a storage unit (407 in FIG. 9) for storing a speech model prepared in advance. Furthermore, the noise suppression system (400) includes an expected value calculation unit (406 in FIG. 9) of the prior S / N ratio. The expected value calculation unit (406) of the prior signal-to-noise ratio inputs the first speech and the signal and noise estimated by the first noise estimation unit (405 in FIG. 9), and a noise model (M n ), and the signal to noise ratio (pre-SNR) is corrected using the speech model and the noise model.
- the expected value calculation unit (406) of the prior signal-to-noise ratio inputs the first speech and the signal and noise estimated by the first noise estimation unit (405 in FIG. 9), and a noise model (M n ), and the signal to noise ratio (pre-SNR) is corrected using the speech model
- the expected value calculation unit (406) of the prior SN ratio calculates the expected value (R snE ) of the prior SN ratio.
- the noise suppression coefficient calculation unit (403 in FIG. 9) calculates the noise suppression coefficient using the expected value of the prior S / N ratio.
- the noise suppression unit (404 in FIG. 9) may be configured to suppress the noise component included in the input signal by multiplying the input signal by the noise suppression coefficient and output an estimated value of the signal.
- the first speech and first noise estimation unit (405), storage unit (407), and prior SN ratio expected value calculation unit (406) perform prior SN ratio estimation / expected value calculation in FIG. Corresponds to part (11).
- FIG. 1 is a diagram illustrating a configuration of a noise suppression system 100 according to the first embodiment.
- the noise suppression system 100 includes a first prior S / N ratio estimation unit 101, an expected S / N ratio expected value calculation unit 102, a noise suppression coefficient calculation unit 103, a noise suppression unit 104, And a storage unit 105 that stores an S / N ratio model (M sn ).
- the prior SN ratio and ex-post SN ratio are defined separately as follows.
- Prior signal to noise ratio desired signal power / noise power
- a posteriori signal-to-noise ratio (mixed signal power of desired signal and noise) / noise power
- the first prior signal-to-noise ratio estimation unit 101 receives an input signal X 0 in which a desired signal and noise are mixed.
- the first pre SN ratio estimation portion 101 estimates the ratio (pre SN ratio) R sn1 of the desired signal power and the noise power contained in the input signal X 0, and outputs the pre-SN ratio R sn1 estimated.
- the input signal X 0 is the frequency spectrum (frequency-amplitude spectrum, frequency power spectrum, etc.) of the mixed signal desired signal and noise are mixed is, the discrete Fourier transform of the signal in the time domain (Discrete Fourier Transform: DFT) or the like Is converted into a frequency domain signal (a complex signal including a real part and an imaginary part). The same applies to the input signal X 0, denoted in the subsequent embodiments.
- DFT discrete Fourier Transform
- the prior SN ratio expected value calculation unit 102 receives the prior SN ratio R sn1 output from the first prior SN ratio estimation unit 101 and the prior SN ratio model M sn stored in the storage unit 105 in advance. To do.
- the prior S / N ratio model M sn includes a pattern of a prior S / N ratio.
- the expected value calculation unit 102 of the prior SN ratio compares the prior SN ratio R sn1 and the prior SN ratio model M sn, and calculates a value obtained by correcting the prior SN ratio R sn1 by the prior SN ratio model M sn .
- the ratio is output as an expected value R snE .
- the noise suppression coefficient calculation unit 103 receives the expected value R snE of the prior SN ratio output from the expected value calculation unit 102 of the prior SN ratio.
- the noise suppression coefficient calculation unit 103 calculates the noise suppression coefficient W 0 using the expected value R snE of the prior S / N ratio, and outputs the noise suppression coefficient W 0 .
- the noise suppression unit 104 receives the noise suppression coefficient W 0 output from the noise suppression coefficient calculation unit 103 and the input signal X 0 as inputs.
- the noise suppression unit 104 by multiplying the noise suppression coefficient W 0 to the input signal X 0, suppresses a noise component included in the input signal X 0, and outputs the estimated value S 0 of the desired signal.
- the first prior SN ratio estimation unit 101, the prior SN ratio expected value calculation unit 102, the noise suppression coefficient calculation unit 103, the noise suppression unit 104, and the storage unit 105 are integrated into a single unit. You may mount in the apparatus. Or you may comprise as a distributed system in which each is mutually connected via communication means, such as a network. Further, at least some of the processes and functions of the first prior SN ratio estimation unit 101, the prior SN ratio expected value calculation unit 102, and the noise suppression coefficient calculation unit 103 are realized by a program executed on a computer. You may do it. Further, at least a part of the processing and functions of the noise suppression unit 104 and the storage unit 105 (reading control and writing control) may be realized by a program executed on a computer. The same applies to other embodiments.
- the prior S / N ratio R sn1 is corrected by the prior S / N ratio model M sn that takes into account fluctuations in noise magnitude.
- the noise suppression coefficient W 0 calculated using the expected value R snE of the prior S / N ratio, it is possible to obtain high accuracy even if the noise level fluctuates without removing a desired signal component. Noise components can be suppressed.
- FIG. 5 is a flowchart showing processing of the noise suppression system of the second embodiment.
- FIG. 2 is a diagram illustrating the configuration of the noise suppression system 200 according to the second embodiment.
- the noise suppression system 200 according to the second embodiment acquires (extracts) a desired signal from a mixed signal in which a desired signal and noise are mixed.
- a desired signal is described as an audio signal, but the desired signal is not limited to an audio signal.
- the noise suppression system 200 includes a first prior S / N ratio estimation unit 201, a prior S / N expected value calculation unit 202, a noise suppression coefficient calculation unit 203, a noise suppression unit 204, a prior S / N ratio model (pre-S / N ratio).
- the storage unit 205 stores and holds M sn in advance.
- the first prior signal-to-noise ratio estimation unit 201 receives an input signal X 0 in which a desired signal and noise are mixed. Then, the first prior SN ratio estimation unit 201 estimates a desired signal power to noise power ratio (preliminary SN ratio) R sn1 included in the input signal X 0 and outputs the estimated R sn1 .
- a desired signal power to noise power ratio preliminary SN ratio
- the prior SN ratio expected value calculation unit 202 receives the prior SN ratio R sn1 output from the first prior SN ratio estimation unit 201 and the prior SN ratio model M sn stored in the storage unit 205 in advance. To do.
- the expected value calculation unit 202 of the prior SN ratio compares the estimated prior SN ratio R sn1 with the prior SN ratio model M sn, and the expected value R of the prior SN ratio of the value corrected by the prior SN ratio model M sn. Output snE .
- the noise suppression coefficient calculation unit 203 receives the output R snE of the expected SNR expected value calculation unit 202 as an input.
- the noise suppression coefficient W 0 is calculated using the expected value R snE of the prior SN ratio, and W 0 is output.
- the noise suppression unit 204 receives the noise suppression coefficient W 0 output from the noise suppression coefficient calculation unit 203 and the input signal X 0 as inputs.
- the noise suppression section 204 by multiplying the noise suppression coefficient W 0 to the input signal X 0, suppresses a noise component included in the input signal, and outputs the estimated value S 0 of the desired signal.
- X 0 (f, t) is a frequency spectrum (frequency amplitude spectrum, frequency power spectrum, etc.) of a mixed signal in which a desired signal and noise are mixed.
- a time domain signal is converted to a frequency domain signal by, for example, Discrete Fourier Transform (DFT) (a complex signal including a real part and an imaginary part).
- DFT Discrete Fourier Transform
- a power component is obtained by the square calculation of the component.
- f is a frequency index (frequency index is, for example, a DC (direct current) component (index: 0) to Nyquist frequency)
- t is an index of time (discrete time).
- X 0 , S, and N at the time index t are vectors having components in the frequency direction as elements.
- S on the right side is the frequency spectrum of the desired audio component.
- N is the frequency spectrum of the noise component.
- FIG. 3 is a diagram illustrating the configuration of the first prior S / N ratio estimation unit 201.
- the first prior SN ratio estimation unit 201 includes a first noise estimation unit 2011, a first speech estimation unit 2012, and a prior SN ratio estimation unit 2013.
- First noise estimation unit 2011 receives the input signal X 0, estimates a noise component included in the input signal X 0, and outputs a first estimated noise N 1.
- the first speech estimation unit 2012 receives the input signal X 0 and the first estimated noise N 1 and outputs a first estimated speech S 1 .
- the first noise estimation unit 2011 estimates a noise component included in the input signal X 0 and outputs a first estimated noise N 1 .
- NE [] is a noise estimation operator (noise estimator).
- a noise estimation operator noise estimator
- a known method such as minimum statistics or weighted noise estimation is used. Can be used.
- the right side of Equation 2 is calculated for each component of the vector X 0 by the noise estimation operator NE [] and output corresponding to the component of the vector X 0 .
- First speech estimation unit In the first speech estimation unit 2012, by suppressing the noise component included in the input signal X 0, estimates the speech component contained in the input signal X 0, to output a first estimate speech S 1.
- NS [] is a noise suppression operator (Noise Suppressor), and for example, a spectral subtraction (SS) method described in Non-Patent Document 1 can be used.
- the right side of Expression 3 is calculated for each component of the vector X 0 and the vector N 1 by the noise suppression operator NS [], and is output corresponding to the components of the vector X 0 and the vector N 1 .
- y i NS [X i , N i ] (y i is the i-th component of the output vector, X i and N i are the vectors X i and N 1 , respectively) I-th component).
- a Wiener Filter (WF) method an MMSE STSA (Minimum Mean Square Error Short Time Spectral Amplitude) method, an MMSE LSA (Minimum Mean Square Error Log Spectral Amplitude) method, or the like can be used.
- WF Wiener Filter
- MMSE STSA Minimum Mean Square Error Short Time Spectral Amplitude
- MMSE LSA Minimum Mean Square Error Log Spectral Amplitude
- the vector S 1, is calculated for each component of the vector N 1, vector S 1, is output corresponding to the component of the vector N 1, for example, S 1 / N 1 is (S 12 / N 11 , S 12 / N 11 ,..., S 1n / N 1n ).
- the prior SN ratio R sn1 is given by the following (formula 5).
- Equation 5 is calculated for each component of vector X 0 and vector S 1 .
- the first speech estimation unit 2012 uses the WF method, the MMSE STSA method, or the MMSE LSA method
- the first speech estimation unit 2012 can obtain the prior S / N ratio.
- the prior SN ratio estimated by the first speech estimation unit 2012 may be the output of the first prior SN ratio estimation unit 201 (preliminary SN ratio R sn1 ). In this case, the prior SN ratio estimation unit 2013 in FIG. 3 is not necessary.
- the prior SN ratio R sn1 is, for example, a frequency band B (for example, a mel frequency) in which the indexes f of a plurality of frequencies in (Expression 7) are combined in addition to the value for each frequency index f in (Expression 6) below.
- the value may be calculated using a value for each (band) or a value obtained by collecting all f in (Expression 8).
- the prior SN ratio R sn1 at the time index t exists as much as the frequency index f and the number of frequency bands B. Accordingly, the prior S / N ratio R sn1 at t is a vector having elements in the frequency direction as elements.
- FIG. 4 is a diagram exemplifying the configuration of the expected SN ratio expected value calculation unit 202 of FIG.
- the expected SN ratio expected value calculation unit 202 includes a feature amount conversion unit 2021, an expected value calculation unit 2022, and a feature amount inverse conversion unit 2023.
- the feature amount conversion unit 2021 receives the prior SN ratio R sn1 output from the first prior SN ratio estimation unit 201, and outputs the feature amount F sn1 of the prior SN ratio R sn1 .
- the expected value calculation unit 2022 receives the feature value F sn1 and the previously prepared prior SN ratio model (preliminary SN ratio pattern) M sn and outputs the expected value feature amount F snE of the prior SN ratio.
- Feature quantity inverse transforming section 2023 inputs the feature quantity F SNE, and outputs the expected value R SNE pre SN ratio.
- the feature quantity conversion unit 2021 converts the pre SN ratio R sn1 the feature amount F sn1, and outputs the feature amount F sn1.
- As the feature amount for example, a logarithmic value of (Equation 9) below, or a value obtained by cosine transform (Discrete Cosine Transform (DCT)) as shown in (Equation 10) (cepstrum) Etc. can be used.
- DCT Discrete Cosine Transform
- Equation 9 is a natural logarithm. The same applies to the logs shown below. In addition to the natural logarithm, the logarithm can be a common logarithm.
- the right side of the equation 9 is logarithmically calculated for each component of the vector R sn1, is outputted corresponding to the components of the vector R sn1.
- C [] is a cosine transform operator (DCT operator).
- DCT operator cosine transform operator
- the right side of Expression 10 is cosine transformed for each component of the vector logR sn1 and output corresponding to the component of the vector R sn1 .
- the logarithmic calculation of Expression 10 is the same as the calculation in Expression 9.
- the feature amount F sn1 can be calculated for each time index t, but a difference from a feature amount in the past time (eg, t ⁇ 1) may be taken and a primary difference feature amount may be used. Alternatively, the difference may be further calculated and the secondary difference feature amount may be used.
- the feature amount F sn1 at the time index t is a multidimensional vector because there are the number of cepstrum dimensions, the number of primary difference feature amounts, and the number of secondary difference feature amounts.
- the expected value calculation unit 2022 receives the feature value F sn1 and the prior SN ratio model M sn stored in advance in the storage unit 205, and outputs an expected value feature amount F snE of the prior SN ratio.
- the prior signal-to-noise ratio model M sn is described as a mixed Gaussian distribution model (GMM: Gaussian Mixture Model) composed of G Gaussian distributions.
- GMM Gaussian Mixture Model
- the prior signal-to-noise ratio model M sn is a mixed Gaussian distribution model in which G (G> 1) Gaussian distributions having an average value ⁇ sn, g and variance ⁇ 2 sn, g are mixed with weights w sn, g .
- G G> 1
- Gaussian distributions having an average value ⁇ sn, g and variance ⁇ 2 sn, g are mixed with weights w sn, g .
- the expected value calculation unit 2022 calculates the feature value F snE of the expected value of the prior SN ratio as a weighted sum of the average values ⁇ sn, g of the prior SN ratio model M sn as shown in the following (Equation 11).
- F sn1 ) as a weight is a posterior probability for the feature amount F sn1 .
- F sn1 ) is calculated as shown in (Equation 12), for example.
- g) is a probability that the Gaussian distribution g of the prior S / N ratio model M sn outputs the feature value F sn1 , and is calculated as in (Expression 13) below.
- the feature quantity F sn1 and the average value ⁇ sn, g are both D-dimensional column vectors, and the variance ⁇ 2 sn, g is a D ⁇ D matrix.
- det [] is a determinant operator.
- T represents transposition, and ⁇ F sn1 ⁇ sn, g ⁇ T is a D-dimensional row vector. Note that the value of D indicating the number of dimensions can be appropriately changed according to the type of the input signal. When an audio signal is included, 10 dimensions or more are desirable.
- the prior SN ratio model M sn stored and held in advance in the storage unit 105 is expressed using an average value ⁇ sn, g and a variance ⁇ 2 sn, g , and the variance ⁇ 2 sn, g includes a voice. It includes signal fluctuations and noise fluctuations. For this reason, in (Equation 11), the posterior probability P (g
- the prior S / N ratio model M sn may be created in advance using the feature amount F sn1 for a large amount of input signals.
- the prior SN ratio model M sn may be learned (created) using, for example, an expectation maximization algorithm.
- the prior S / N ratio model M sn can be created by combining the speech model M s and the noise model M n . A method of combining the speech model M s and the noise model M n will be described in the next embodiment (see the description of the expected value calculation unit 3062 in FIG. 8).
- the feature amount inverse conversion unit 2023 converts the feature amount F snE of the expected value of the prior SN ratio and outputs the expected value R snE of the prior SN ratio.
- the logarithmic value of (Equation 9) is used in the feature amount conversion unit 2021, the inverse transformation is performed according to (Equation 14), and the value obtained by cosine transforming the logarithmic value as shown in (Equation 10) is used. Can be inversely transformed by (Equation 15).
- exp [] is an exponent operator
- C ⁇ 1 [] is an inverse cosine transform operator (Inverse Discrete Cosine Transform (IDCT)).
- IDCT Inverse Discrete Cosine Transform
- the right side of Expression 14 can be expressed as exp [F snE ] as an exp function, calculated for each component of the vector F snE , and a vector such as (e FsnE1 , e FsnE2 ,..., E FsnEn ).
- Expression 15 can be expressed as exp [C ⁇ 1 [F snE ]] as an exp function.
- C ⁇ 1 [F snE ] is calculated for each component of the vector F snE that has been subjected to inverse cosine transform, and is output corresponding to the component of the vector F snE .
- the exponent calculation of Expression 15 is the same as the calculation in Expression 14.
- the inverse cosine transform C ⁇ 1 is a linear transform
- a value C ⁇ 1 [ ⁇ sn, g ] obtained by performing an inverse cosine transform on the average value ⁇ sn, g of the prior SN ratio model M sn is stored in the storage unit 205.
- the calculation result of the inverse cosine transform is obtained by using the calculation result C ⁇ 1 [ ⁇ sn, g ] of the storage unit 205 in (Expression 16). Is no longer necessary.
- the noise suppression coefficient calculation unit 203 calculates and outputs the noise suppression coefficient W 0 using the expected value R snE of the prior S / N ratio.
- the noise suppression coefficient by the Wiener filter method can be calculated as follows using the expected value R snE of the prior S / N ratio.
- Equation 17 The right side of Equation 17 is calculated for each component of the vector R snE , for example, ⁇ (R snE1 / (1 + R snE1 ), (R snE2 / (1 + R snE2 ),..., (R snEn / (1 + R snEn ))
- y i x i / (1 + x i )
- y i is the i-th component of the output vector
- x i Means the i-th component of the vector R snE ).
- noise suppression coefficient calculation unit 203 other noise suppression methods such as the MMSE STSA method and the MMSE LSA method may be used for calculating the noise suppression coefficient using the expected value R snE of the prior S / N ratio. It is.
- the noise suppression coefficient calculation unit 203 the calculation of the noise suppression coefficient, when the noise suppression method using a posteriori SN ratio (desired signal and noise mixed signal and noise ratio) of the input signal X 0 from the first pre-SN
- the a posteriori SN ratio (X 0 / N 1 ) may be calculated from the first estimated noise N 1 in the ratio estimation unit 201 and used for calculating the noise suppression coefficient.
- the noise suppressor 204 by multiplying the noise suppression coefficient W 0 to the input signal X 0, suppresses a noise component included in the input signal X 0, and outputs the estimated value S 0 of the desired signal.
- FIG. 5 is a flowchart for explaining the processing procedure (operation) of the second embodiment described with reference to FIGS. 2 to 4.
- Step S601 The first prior SN ratio estimation unit 201 estimates a desired signal / noise ratio R sn1 included in the input signal X 0 in which the desired signal and noise are mixed.
- Step S602 The prior SN ratio expected value calculation unit 202 compares the prior SN ratio R sn1 estimated by the first prior SN ratio estimation unit 201 with the prior SN ratio model M sn of the storage unit 205 to determine the prior SN ratio model. An expected value R snE of the prior S / N ratio that is a value corrected by M sn is calculated.
- Step S603 The noise suppression coefficient calculation unit 203 calculates the noise suppression coefficient W 0 using the expected value R snE of the prior S / N ratio.
- Step S604 The noise suppression section 204, by multiplying the noise suppression coefficient W 0 to the input signal X 0, suppresses a noise component included in the input signal to obtain an estimate S 0 of the desired signal.
- the prior SN ratio R sn1 is corrected by the prior SN ratio model M sn that takes into account fluctuations in the magnitude of noise.
- the noise suppression coefficient calculated using the expected value R snE of the corrected prior S / N ratio can suppress the noise component with high accuracy even if the noise level fluctuates without removing the desired signal component. it can.
- FIG. 6, FIG. 7, and FIG. a noise suppression system according to a third embodiment of the present invention will be described with reference to FIG. 6, FIG. 7, and FIG.
- the first prior signal-to-noise ratio estimation unit 201 in FIG. 2 is replaced by the first speech and first noise estimation unit 305 in FIG.
- the prior SN ratio expected value calculation unit 202 of FIG. 2 is replaced with the prior SN ratio expected value calculation unit 306 of FIG.
- the prior S / N ratio model M sn stored and held in the storage unit 205 of FIG.
- the voice model M s and noise model M n stored and stored in the storage units 307 and 308 in FIG. Different from the second embodiment. Note that in FIG. 6 and the like, the speech model M s and the noise model M n are stored and held in separate storage units for ease of explanation, but the speech model M s and the noise model M n are the same. Of course, it is good also as a structure which memorize
- the operations of the noise suppression coefficient calculation unit 303 and the noise suppression unit 304 in FIG. 6 are the same as the operations of the noise suppression coefficient calculation unit 203 and the noise suppression unit 204 in FIG.
- the same parts as those of the second embodiment in FIG. 2 are omitted as appropriate in order to avoid duplication.
- differences of the present embodiment from the second embodiment will be described. That is, in the following, the first speech and first noise estimation unit 305, the prior SNR expected value calculation unit 306, the speech model M s and the noise model M n will be described.
- the first speech and first noise estimation unit 305 receives an input signal X 0 in which a desired signal and noise are mixed. Then, an estimated value S 1 of the first desired signal (speech) included in the input signal X 0 and an estimated value N 1 of the first noise are output.
- Expectation value calculation portion 306 of the pre-SN ratio the estimated values S 1 of the first desired signal output from the estimation unit 305 of the first speech and the first noise (sound), the estimation of the first noise
- the value N 1 and the speech model (speech pattern) M s stored in advance in the storage unit 307 are input. Further, the expected SN ratio expected value calculation unit 306 receives the noise model (noise pattern) M n stored and stored in the storage unit 308 in advance.
- the expected SNR expected value calculation unit 306 compares the estimated value S 1 of the desired signal (speech) with the estimated noise value N 1 , the speech model M s, and the noise model M n, and calculates the prior S / N ratio.
- the expected value R snE is output.
- FIG. 7 is a diagram illustrating the configuration of the first speech and first noise estimation unit 305.
- the first speech and first noise estimation unit 305 includes a first noise estimation unit 3051 and a first speech estimation unit 3052.
- the first noise estimation unit 3051 receives the input signal X 0 and outputs the first estimated noise N 1 .
- First sound estimation unit 3052 an input signal X 0, the first inputs the estimated noise N 1, and outputs the first estimate speech S 1.
- the operations of the first noise estimation unit 3051 and the first speech estimation unit 3052 in FIG. 7 are the same as the operations of the first noise estimation unit 2011 and the first speech estimation unit 2012 in FIG. Description is omitted.
- a first estimated noise N 1 using the input signal X 0 first estimated speech S 1, and re-estimated noise
- the component N 1 ′ may be used (see the right-hand side denominator of (Equation 5)).
- FIG. 8 is a diagram illustrating a configuration of the expected value calculation unit 306 for the prior S / N ratio.
- the prior SN ratio expected value calculation unit 306 includes a feature amount conversion unit 3061s, a feature amount conversion unit 3061n, an expected value calculation unit 3062, and a feature amount inverse conversion unit 3063.
- Feature transformation unit 3061s has a first estimated speech S 1 as input, and outputs a first estimated feature amount F s1 voice S 1.
- Feature transformation unit 3061n includes a first estimated noise N 1 as input, and outputs a first estimated noise N 1 feature quantity F n1.
- the expected value calculation unit 3062 receives the feature value F s1 , the feature value F n1 , the voice model M s prepared in advance, and the noise model M n and outputs the feature value F snE of the expected value of the prior SN ratio. .
- Feature quantity inverse transforming section 3063 inputs the feature quantity F SNE, and outputs the expected value R SNE pre SN ratio.
- the operation of the feature amount inverse transform unit 3063 is the same as the operation of the feature amount inverse transform unit 2023 in FIG.
- Feature transformation unit 3061s has a first estimated speech S 1 as input, and outputs the feature quantity F s1 to convert the first estimated speech S 1 inputted.
- a logarithmic value of (Equation 19) or a value (cepstrum) obtained by cosine transform (discrete cosine transform) of the logarithmic value as shown in (Equation 20) can be used.
- Equation 19 Note right side of Equation 19, it should be noted that the right-hand side of Equation 19 is logarithmically calculated for each component of the vector S 1, is outputted corresponding to the components of the vector S 1.
- Expression 20 is cosine transformed for each component of the vector logS 1 and is output corresponding to the component of the vector S 1 .
- the logarithmic calculation of Expression 20 is the same as the calculation in Expression 19.
- Feature transformation unit 3061n includes a first estimated noise N 1 as input, and outputs the feature quantity F n1 to convert the first estimated noise N 1 input.
- a logarithmic value of (Expression 21) or a value (cepstrum) obtained by cosine transform (discrete cosine transform) of the logarithmic value as shown in (Expression 22) can be used.
- Expression 22 is cosine transformed for each component of the vector logN 1 and is output corresponding to the component of the vector N 1 .
- Right side of the equation 20 is cosine transform for each component of the vector logN 1, is outputted corresponding to the components of the vector N 1.
- the logarithmic calculation of Expression 22 is the same as the calculation in Expression 21.
- the feature quantities F s1 and F n1 can be calculated for each time index t.
- a difference from the feature quantity of the past time may be taken, and a primary difference feature quantity may be used. Further, a difference can be taken and a secondary difference feature amount can be used.
- the feature quantities F s1 and F n1 at the time index t are multidimensional vectors because there exist the number of cepstrum dimensions, the number of primary difference feature quantities, and the number of secondary difference feature quantities.
- the expected value calculation unit 3062 A feature amount F s1 output from the feature amount conversion unit 3061s; A feature amount F n1 output from the feature amount conversion unit 3061n; A voice model M s stored in the storage unit 307; The noise model M n stored in the storage unit 308, Is input, and the feature value F snE of the expected value of the prior S / N ratio is output.
- the prior SN ratio is the ratio of S 1 and N 1 as in (Equation 4) to (Equation 8);
- the feature value is a logarithmic value or linear transformation of the logarithmic value as in (Equation 9) and (Equation 10); and
- the feature quantity of speech and noise is a logarithmic value as in (Equation 19) to (Equation 22), or a linear transformation of the logarithmic value, .
- the feature amount F sn1 of the prior SN ratio can be expressed as follows using the feature amounts F s1 and F n1 .
- the speech model M s is composed of a mixed Gaussian distribution model in which G s Gaussian distributions with mean values ⁇ s, gs and variances ⁇ 2 s, gs are mixed with weights w s, gs. To do.
- the noise model M n is a mixed Gaussian distribution model in which G n Gaussian distributions having average values ⁇ n, gn and variances ⁇ 2 n, gn are mixed with weights w n, gn .
- g s and g n is the index of the Gaussian distribution.
- the speech model M s and the noise model M n may be held in the storage unit (307, 308) instead of the prior S / N ratio model M sn of the second embodiment.
- this embodiment can reduce a required storage capacity compared to the second embodiment.
- a + B ⁇ AB is established when the number of models of the speech model M s is A (A> 2) and the number of models of the noise model M n is B (B> 2).
- the number of models of the speech model M s is 3, and the number of models of the noise model M n is 2, the number of models of the prior S / N ratio model can be configured from these. That is, the number of models stored in the storage unit can be reduced.
- the noise feature value F n1 is substituted with the average value ⁇ n, gn of the noise model. As a result, it is possible to avoid a situation where the voice is mistaken for noise and suppressed. Whether or not the noise feature amount F n1 is reliable may be determined by comparing the noise feature amount F n1 with the noise model M n .
- the reliability is high. If it is out of range, the reliability may be low.
- the expected value of the feature amount of the prior S / N ratio is calculated using the feature amount of the prior S / N ratio and the prior S / N ratio model configured from the speech model and the noise model, and the prior SN ratio is calculated.
- the noise suppression coefficient is obtained from the expected value of the ratio feature quantity.
- FIG. 9 A noise suppression system according to a fourth embodiment of the present invention will be described with reference to FIGS.
- FIG. 9 in the noise suppression system according to the fourth embodiment, The point where the prior SN ratio expected value calculation unit 306 in FIG. 6 is replaced by the prior SN ratio expected value calculation unit 406, FIG.
- the noise model M n stored and held in advance in the storage unit 308 is unnecessary in FIG. Different from the third embodiment.
- the first speech and first noise estimation unit 405, noise suppression coefficient calculation unit 403, and noise suppression unit 404 in FIG. 9 are respectively the first speech and first noise estimation unit 305 in FIG.
- the operation is the same as that of the noise suppression coefficient calculation unit 303 and the noise suppression unit 304.
- the expected value calculation unit 406 of the prior S / N ratio receives the output values S 1 and N 1 of the first speech and first noise estimation unit 405 and a speech model (speech pattern) M s prepared in advance. And Using the estimated S 1 and N 1 and the speech model M s , an expected value R snE of the prior SN ratio is output.
- FIG. 10 is a diagram exemplifying a configuration of the expected SN ratio expected value calculation unit 406.
- the expected SN ratio expected value calculation unit 406 includes a feature amount conversion unit 4061s, a feature amount conversion unit 4061n, an expected value calculation unit 4062, a feature amount inverse conversion unit 4063, and a noise model creation unit. 4064.
- a noise model M n is created (sequentially updated) by the noise model creation unit 4064 from the feature quantity F n1 of the first estimated noise and input to the expected value calculation unit 4062.
- the operations of the feature amount conversion unit 4061s, the feature amount conversion unit 4061n, and the feature amount inverse conversion unit 4063 are the same as the operations of the feature amount conversion unit 3061s, the feature amount conversion unit 3061n, and the feature amount inverse conversion unit 3063 in FIG. Therefore, the description is omitted.
- the noise model creation unit 4064 receives the feature quantity F n1 of the first estimated noise, creates a noise model M n (updates it sequentially), and outputs it.
- the noise model will be described as a single Gaussian distribution for simplicity of explanation. However, it goes without saying that the fourth embodiment of the present invention is not limited to such a distribution.
- the noise model M n is a single Gaussian distribution with an average value ⁇ n and a variance ⁇ 2 n .
- VAR [] is an operator that calculates a variance value.
- the mean value ⁇ n (t) and the variance ⁇ 2 n (t) of the noise model M n at the time index t are sequentially updated as in the following (Expression 26) and (Expression 27), respectively. .
- ⁇ ⁇ and ⁇ ⁇ are time constants (0.0 to 1.0) for calculating an average value and a variance value, respectively, and are usually set to values of 0.9 to 1.0.
- the noise model M n may be created by a method different from the method exemplified above.
- the expected value calculation unit 4062 A feature amount F s1 output from the feature amount conversion unit 4061s; A feature amount F n1 output from the feature amount conversion unit 4061n; A speech model (speech pattern) M s stored in advance in the storage unit 407; A noise model (noise pattern) M n from the noise model creation unit 4064; Is input, and the feature value F snE of the expected value of the prior S / N ratio is output.
- the operation of the expected value calculation unit 4062 is basically the same as the operation of the expected value calculation unit 3062 in FIG.
- the expected value calculation unit 4062 it is difficult to create a prior S / N ratio model by combining the noise model M n that changes from moment to moment with the speech model M s from the viewpoint of calculation amount.
- the amount of calculation can be reduced by applying the following ideas.
- the difference between the feature value F sn1 of the prior SN ratio and the average value ⁇ sn, g of the prior SN ratio model is calculated as the average value ⁇ s, gs of the speech model and the average value ⁇ n of the noise model.
- the difference between the average value ⁇ s and gs of the speech model M s is calculated with respect to the value obtained by adding the average value ⁇ n of the noise model to the feature amount F sn1 of the prior S / N ratio. For this reason, the calculation which calculates the average value of a prior
- the mixed Gaussian distribution 1-1 of the first layer is composed of two Gaussian distributions
- the two Gaussian distributions of the first layer are composed of the mixed Gaussian distributions 2-1, 2-2 of the second layer, respectively.
- the two distributions of the mixed Gaussian distribution 2-1 (2-2) in the second layer are respectively composed of the mixed Gaussian distributions 3-1, 3-2 (3-3, 3-4) in the third layer.
- the calculation amount is reduced while maintaining the noise suppression accuracy by reducing the calculation frequency of the variance ⁇ 2 sn, g of the prior SN ratio model. can do.
- the noise model M n is created from the input signal X 0, it is not necessary to prepare a noise model in advance.
- the noise model M n by sequentially updating the noise model M n, it can be used noise model suitable for the noise included in the input signal X 0. As a result, it is possible to suppress noise with higher accuracy than in the third embodiment.
- the noise suppression system described in the above embodiment may be applied to a microphone unit.
- the present invention is also applicable to a case where a noise suppression program that realizes the functions of the noise suppression system of the above-described embodiment is supplied directly or remotely to the system or apparatus. Therefore, the present invention also provides a program installed in a computer, a medium storing the program, and a WWW (World Wide Web) server for downloading the program to be realized by a computer. According to the present invention, a non-transitory computer-readable medium that stores a program for causing a computer to execute the processing steps included in the embodiment is provided.
- the present invention is not limited to the above-described embodiment, and for example, may be configured by combining various embodiments. Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device.
- Noise suppression system 101 100, 200, 300, 400 Noise suppression system 101, 201 First prior S / N ratio estimation unit 102, 202, 306, 406 Pre-SNR expected value calculation unit 103, 203, 303, 403 Noise suppression coefficient calculation unit 104, 204, 304, 404 Noise suppression unit 105, 205 Pre-SNR model (storage unit) 305, 405 First speech and first noise estimation unit 307, 407 Speech model (storage unit) 308 Noise model (storage unit) 2011, 3051 First noise estimation unit 2012, 3052 First speech estimation unit 2013 Pre-SNR estimation unit 2021, 3061s, 3061n, 4061s, 4061n Feature value conversion unit 2022, 3062, 4062 Expected value calculation unit 2023, 3063, 4063 feature amount inverse transform unit 4064 noise model creation unit
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Noise Elimination (AREA)
Abstract
Description
図1は、第1の実施形態に係る雑音抑圧システム100の構成を例示する図である。図1を参照して、本発明の第1の実施形態としての雑音抑圧システム100について説明する。図1に示すように、雑音抑圧システム100は、第1の事前SN比推定部101と、事前SN比の期待値計算部102と、雑音抑圧係数計算部103と、雑音抑圧部104と、事前SN比モデル(Msn)を記憶する記憶部105と、を含む。
次に、図2乃至図5を参照して、本発明の第2の実施形態に係る雑音抑圧システム200について説明する。なお、図5は、第2の実施形態の雑音抑圧システムの処理を示すフローチャートである。
図2は、第2の実施形態に係る雑音抑圧システム200の構成を例示する図である。第2の実施形態に係る雑音抑圧システム200は、所望の信号と雑音とが混在する混在信号から所望の信号を取得(抽出)する。以下の例では、所望の信号を音声信号として説明するが、所望の信号は、音声信号にのみ限定されるものでないことは勿論である。
まず、図2の第1の事前SN比推定部201の処理について説明する。所望の信号と雑音が混在する入力信号X0を、以下の(式1)のようにモデル化する。
第1の雑音推定部2011では、入力信号X0に含まれる雑音成分を推定し、第1の推定雑音N1を出力する。
第1の音声推定部2012では、入力信号X0に含まれる雑音成分を抑圧することにより、入力信号X0に含まれる音声成分を推定し、第1の推定音声S1を出力する。
この他、ウィナーフィルタ(WF: Wiener Filter)法、MMSE STSA (Minimum Mean Square Error Short Time Spectral Amplitude) 法、MMSE LSA(Minimum Mean Square Error Log Spectral Amplitude)法等を用いることができる。
事前SN比推定部2013は、第1の音声推定部2012からの第1の推定音声S1(入力信号X0に含まれる音声成分)と、第1の雑音推定部2011からの第1の推定雑音N1とを入力とし、音声信号と雑音のSN比(=S1/N1)を推定し、この値を、事前SN比Rsn1として出力する。
ただし、事前SN比推定部2013において、(式4)の右辺の分母の第1の推定雑音N1は、入力信号X0と第1の推定音声S1を用いて再推定した雑音成分N1’(=X0-S1)を用いてもよい。この場合、事前SN比Rsn1は、以下の(式5)で与えられる。
図4は、図2の事前SN比の期待値計算部202の構成を例示する図である。図4を参照すると、事前SN比の期待値計算部202は、特徴量変換部2021と、期待値計算部2022と、特徴量逆変換部2023と、を含む。
特徴量変換部2021では、事前SN比Rsn1を特徴量Fsn1に変換し、特徴量Fsn1を出力する。特徴量としては、例えば、以下の(式9)の対数値、あるいは、(式10)に示すように、対数値をコサイン変換(Discrete Cosine Transform(DCT):離散コサイン変換)した値(ケプストラム)等を用いることがきる。
・・・・(式9)
なお、式9に示すlogは自然対数とする。以降で示されるlogも同様である。なお、logは自然対数の他に常用対数を用いることもできる。なお、式9の右辺は、ベクトルRsn1の成分ごとに対数計算され、ベクトルRsn1の成分に対応して出力される。ここでベクトルの成分に対して出力されるとは、yi=logxi(yiは出力ベクトルの第i成分、xiはベクトルRsn1の第i成分)を意味する。
期待値計算部2022では、特徴量Fsn1と、記憶部205に予め記憶されている事前SN比モデルMsnと、を入力とし、事前SN比の期待値の特徴量FsnEを出力する。以下では、一例として、事前SN比モデルMsnをG個のガウス分布から構成される混合ガウス分布モデル(GMM:Gaussian Mixture Model)として説明する。ただし、本発明は以下の例に限定されるものでないことは勿論である。
あるいは、事前SN比モデルMsnは、音声のモデルMsと雑音のモデルMnを組み合わせることにより作成することができる。音声のモデルMsと雑音のモデルMnの組み合わせ方法については、次の実施の形態(図8の期待値計算部3062の説明参照)で説明する。
特徴量逆変換部2023では、事前SN比の期待値の特徴量FsnEを変換し、事前SN比の期待値RsnEを出力する。特徴量変換部2021において、(式9)の対数値を用いた場合には、(式14)により逆変換し、(式10)に示すように、対数値をコサイン変換した値を用いた場合には、(式15)により逆変換すればよい。
雑音抑圧係数計算部203では、事前SN比の期待値RsnEを用いて、雑音抑圧係数W0を計算して出力する。例えばウィナーフィルタ法による雑音抑圧係数は、事前SN比の期待値RsnEを用いて、次式のように計算できる。
・・・・(式17)
式17の右辺は、ベクトルRsnEの成分ごとに計算され、例えば、{(RsnE1/(1+RsnE1),(RsnE2/(1+RsnE2),・・・,(RsnEn/(1+RsnEn))のようにベクトルの成分に対応して出力される。ベクトルの成分に対応して出力されるとは、yi=xi/(1+xi)(yiは出力ベクトルの第i成分、xiはベクトルRsnEの第i成分)を意味する。
雑音抑圧部204では、雑音抑圧係数W0を入力信号X0に乗じることにより、入力信号X0に含まれる雑音成分を抑圧し、所望の信号の推定値S0を出力する。
第1の事前SN比推定部201は、所望の信号と雑音が混在する入力信号X0に含まれる所望の信号と雑音の比Rsn1を推定する。
事前SN比の期待値計算部202は、第1の事前SN比推定部201によって推定された事前SN比Rsn1と、記憶部205の事前SN比モデルMsnとを比較し、事前SN比モデルMsnにより補正した値である事前SN比の期待値RsnEを計算する。
雑音抑圧係数計算部203は、事前SN比の期待値RsnEを用いて雑音抑圧係数W0を計算する。
雑音抑圧部204は、雑音抑圧係数W0を入力信号X0に乗じることにより、入力信号に含まれる雑音成分を抑圧し、所望の信号の推定値S0を得る。
次に、図6、図7、図8を参照して、本発明の第3の実施形態に係る雑音抑圧システムについて説明する。図2の第2の実施形態に係る雑音抑圧システム200と、図6の第3の実施形態に係る雑音抑圧システム300を比較すると、
・図2の第1の事前SN比推定部201が、図6の第1の音声と第1の雑音の推定部305に置き換えられている点、
・図2の事前SN比の期待値計算部202が、図6の事前SN比の期待値計算部306に置き換えられている点、
・図2の記憶部205に記憶保持される事前SN比モデルMsnが、図6では、記憶部307、308にそれぞれ記憶保持される音声モデルMsと雑音モデルMnである点が、
第2の実施形態と異なる。なお、図6等では、単に、説明を容易化するため、音声モデルMsと雑音モデルMnを別々の記憶部に記憶保持する構成としたが、音声モデルMsと雑音モデルMnを同一の記憶部に記憶保持する構成としてもよいことは勿論である。
図7は、第1の音声と第1の雑音の推定部305の構成を例示する図である。第1の音声と第1の雑音の推定部305は、第1の雑音推定部3051、第1の音声推定部3052を含む。
図8は、事前SN比の期待値計算部306の構成を例示する図である。事前SN比の期待値計算部306は、特徴量変換部3061sと、特徴量変換部3061nと、期待値計算部3062と、特徴量逆変換部3063と、を含む。
特徴量変換部3061sは、第1の推定音声S1を入力とし、入力した第1の推定音声S1を変換して特徴量Fs1を出力する。特徴量としては、(式19)の対数値、あるいは、(式20)に示すように、対数値をコサイン変換(離散コサイン変換)した値(ケプストラム)等を用いることができる。
・・・・(式19)
なお式19の右辺は、なお、式19の右辺は、ベクトルS1の成分ごとに対数計算され、ベクトルS1の成分に対応して出力される。ここでベクトルの成分に対して出力されるとは、yi=logxi(yiは出力ベクトルの第i成分、xiはベクトルS1の第i成分)を意味する。
・・・・(式20)
また、式20の右辺は、のベクトルlogS1の成分ごとにコサイン変換され、ベクトルS1の成分に対応して出力される。ここでベクトルの成分に対して出力されるとは、zi=C[xi](ziは出力ベクトルの第i成分、xiはベクトルS1の第i成分)を意味する。また、式20の対数演算については式19における計算と同様である。
・・・・(式21)
なお式21の右辺は、なお、式21の右辺は、ベクトルN1の成分ごとに対数計算され、ベクトルN1の成分に対応して出力される。ここでベクトルの成分に対して出力されるとは、yi=logxi(yiは出力ベクトルの第i成分、xiはベクトルN1の第i成分)を意味する。
なお、特徴量Fs1とFn1は、時間のインデックスt毎に計算できるが、過去の時間(例えばt-1)の特徴量との差分をとり、一次差分特徴量を用いてもよいし、さらに差分をとり二次差分特徴量を用いることもできる。時間のインデックスtにおける特徴量Fs1とFn1は、ケプストラムの次元数や、一次差分特徴量、二次差分特徴量の数だけ存在するため、多次元のベクトルである。
期待値計算部3062は、
・特徴量変換部3061sから出力される特徴量Fs1と、
・特徴量変換部3061nから出力される特徴量Fn1と、
・記憶部307に記憶されている音声モデルMsと、
・記憶部308に記憶されている雑音モデルMnと、
を入力とし、事前SN比の期待値の特徴量FsnEを出力する。
・音声モデルをGs個のガウス分布から構成される混合ガウス分布モデル、
・雑音モデルをGn個のガウス分布から構成される混合ガウス分布モデル
として説明するが、本発明の第3の実施形態は、以下の例に限定されるものでないことは勿論である。
・特徴量が、(式9)、(式10)のように、対数値、又は、該対数値の線形変換であること、及び、
・音声と雑音の特徴量が(式19)~(式22)のように対数値、又は、該対数値の線形変換であること、
を考慮すると、事前SN比の特徴量Fsn1は、特徴量Fs1とFn1を用いて、次のように表すことができる。
・(式23)の事前SN比の特徴量Fsn1(=Fs1-Fn1)と、
・音声モデルMsと雑音モデルMnから構成する事前SN比モデルと、
を用いて、図4の期待値計算部2022と同様にして、(式11)により、期待値の特徴量FsnEを計算して出力する。
図9、図10を参照して、本発明の第4の実施形態に係る雑音抑圧システムについて説明する。図9を参照すると、第4の実施形態に係る雑音抑圧システムでは、
・図6の事前SN比の期待値計算部306を、図9の事前SN比の期待値計算部406で置き換えた点、
・図6において、記憶部308に予め記憶保持されている雑音モデルMnが、図9では不要である点が、
第3の実施形態と異なる。
図10は、事前SN比の期待値計算部406の構成を例示する図である。図10を参照すると、事前SN比の期待値計算部406は、特徴量変換部4061sと、特徴量変換部4061nと、期待値計算部4062と、特徴量逆変換部4063と、雑音モデル作成部4064とを含む。第1の推定雑音の特徴量Fn1から雑音モデル作成部4064で雑音モデルMnを作成し(逐次的に更新し)、期待値計算部4062に入力する。特徴量変換部4061s、特徴量変換部4061n、特徴量逆変換部4063の動作は、それぞれ、図8の特徴量変換部3061s、特徴量変換部3061n、特徴量逆変換部3063の動作と同じであるため、説明を省略する。
雑音モデル作成部4064は、第1の推定雑音の特徴量Fn1を入力とし、雑音モデルMnを作成して(逐次的に更新し)、出力する。以下では、説明の簡単化のため、雑音モデルを単一ガウス分布として説明する。ただし、本発明の第4の実施形態は、かかる分布に限定されるものでないことは勿論である。
期待値計算部4062は、
・特徴量変換部4061sから出力される特徴量Fs1と、
・特徴量変換部4061nから出力される特徴量Fn1と、
・記憶部407に予め記憶保持されている音声モデル(音声のパタン)Msと、
・雑音モデル作成部4064からの雑音モデル(雑音のパタン)Mnと、
を入力とし、事前SN比の期待値の特徴量FsnEを出力する。
この出願は、2014年7月16日に出願された日本出願特願2014-145753を基礎とする優先権を主張し、その開示の全てをここに取り込む。
101、201 第1の事前SN比推定部
102、202、306、406 事前SN比の期待値計算部
103、203、303、403 雑音抑圧係数計算部
104、204、304、404 雑音抑圧部
105、205 事前SN比モデル(記憶部)
305、405 第1の音声と第1の雑音の推定部
307、407 音声モデル(記憶部)
308 雑音モデル(記憶部)
2011、3051 第1の雑音推定部
2012、3052 第1の音声推定部
2013 事前SN比推定部
2021、3061s、3061n、4061s、4061n 特徴量変換部
2022、3062、4062 期待値計算部
2023、3063、4063 特徴量逆変換部
4064 雑音モデル作成部
Claims (10)
- 信号と雑音が混在した入力信号から推定される前記信号と前記雑音に関する事前SN比の推定値に対して、事前SN比モデル、又は、信号モデルと雑音モデル、に基づき、補正を施し、前記事前SN比の期待値を取得する事前SN比推定・期待値計算手段と、
前記事前SN比の期待値を用いて雑音抑圧係数を計算する雑音抑圧係数計算手段と、
前記雑音抑圧係数を前記入力信号に乗じて前記入力信号に含まれる雑音を抑圧する雑音抑圧手段と、
を備える雑音抑圧システム。 - 前記事前SN比推定・期待値計算手段は、
前記入力信号を入力し、前記入力信号から前記信号と前記雑音とを推定し、推定した前記信号と前記雑音から前記事前SN比を推定する事前SN比推定手段と、
予め用意された事前SN比モデルを記憶する記憶手段と、
前記事前SN比推定手段で推定された前記事前SN比に対して、前記記憶手段に記憶された前記事前SN比モデルを用いて補正を施し前記事前SN比の期待値を計算する事前SN比の期待値計算手段と、
を備える請求項1記載の雑音抑圧システム。 - 前記事前SN比推定・期待値計算手段は、
前記入力信号を入力し、前記入力信号から信号と雑音とを推定する推定手段と、
予め用意された信号モデルと雑音モデルとを記憶する記憶手段と、
前記推定手段で推定された前記信号と前記雑音とを入力し、前記信号の前記雑音に対する事前SN比に対して、前記記憶手段に記憶された前記信号モデルと前記雑音モデルとを用いて補正を施し前記事前SN比の期待値を計算する事前SN比の期待値計算手段と、
を備える請求項1記載の雑音抑圧システム。 - 前記事前SN比推定・期待値計算手段は、
前記入力信号を入力し、前記入力信号から信号と雑音とを推定する推定手段と、
予め用意された信号モデルを記憶する記憶手段と、
前記推定手段で推定された前記信号と前記雑音とを入力し、前記雑音に基づき雑音モデルを生成し、前記信号の前記雑音に対する事前SN比に対して、前記記憶手段に記憶された前記信号モデルと、生成した前記雑音モデルとを用いて補正を施し前記事前SN比の期待値を計算する事前SN比の期待値計算手段と、
を備える請求項1記載の雑音抑圧システム。 - 前記記憶手段が、前記信号モデルとして、木構造化された信号モデルを記憶保持する請求項3又は4記載の雑音抑圧システム。
- 信号と雑音が混在した入力信号から推定される前記信号と前記雑音に関する事前SN比の推定値に対して、事前SN比モデル、又は、信号モデルと雑音モデル、に基づき、補正を施して前記事前SN比の期待値を取得し、
前記事前SN比の期待値を用いて雑音抑圧係数を計算し、
前記雑音抑圧係数を前記入力信号に乗ずることで、前記入力信号に含まれる雑音成分を抑圧する雑音抑圧方法。 - 記憶手段に、予め用意された事前SN比モデルを記憶しておき、
信号と雑音が混在した前記入力信号を入力し、前記入力信号から信号と雑音とを推定し、推定された前記信号の前記雑音に対する事前SN比を推定し、
前記事前SN比の期待値の取得にあたり、
前記推定された事前SNを、前記記憶手段に記憶された前記事前SN比モデルを用いて補正した値を、前記事前SN比の期待値として出力する請求項6記載の雑音抑圧方法。 - 記憶手段に、予め用意された信号モデルと雑音モデルとを記憶しておき、
信号と雑音が混在した前記入力信号を入力し、前記入力信号から信号と雑音とを推定し、
前記事前SN比の期待値の取得にあたり、
推定された前記信号の前記雑音に対する事前SN比を、前記記憶手段に記憶された前記信号モデルと前記雑音モデルとを用いて補正した値を、前記事前SN比の期待値として出力する請求項6記載の雑音抑圧方法。 - 記憶手段に、予め用意された信号モデルを予め記憶しておき、
前記信号と雑音が混在した前記入力信号を入力し、前記入力信号から信号と雑音とを推定し、
前記事前SN比の期待値の取得にあたり、
前記推定された前記雑音に基づき雑音モデルを生成し、
推定された前記信号の前記雑音に対する事前SN比を、前記記憶手段に記憶された前記信号モデルと、前記生成した雑音モデルと、を用いて補正した値を、前記事前SN比の期待値として出力する請求項6記載の雑音抑圧方法。 - 信号と雑音が混在した入力信号から推定した前記信号と前記雑音に関する事前SN比の推定値に対して、前記事前SN比モデル、又は、信号モデルと雑音モデルに基づき、補正を施して事前SN比の期待値を取得する処理と、
前記事前SN比の期待値を用いて、雑音抑圧係数を計算する処理と、
前記雑音抑圧係数を前記入力信号に乗ずることで、前記入力信号に含まれる雑音成分を抑圧する処理と、
をコンピュータに実行させるプログラムを格納した記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/325,476 US10748551B2 (en) | 2014-07-16 | 2015-07-16 | Noise suppression system, noise suppression method, and recording medium storing program |
JP2016534288A JP6696424B2 (ja) | 2014-07-16 | 2015-07-16 | 雑音抑圧システムと雑音抑圧方法及びプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-145753 | 2014-07-16 | ||
JP2014145753 | 2014-07-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016009654A1 true WO2016009654A1 (ja) | 2016-01-21 |
Family
ID=55078160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/003604 WO2016009654A1 (ja) | 2014-07-16 | 2015-07-16 | 雑音抑圧システムと雑音抑圧方法及びプログラムを格納した記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10748551B2 (ja) |
JP (1) | JP6696424B2 (ja) |
WO (1) | WO2016009654A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102018206689A1 (de) * | 2018-04-30 | 2019-10-31 | Sivantos Pte. Ltd. | Verfahren zur Rauschunterdrückung in einem Audiosignal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003140700A (ja) * | 2001-11-05 | 2003-05-16 | Nec Corp | ノイズ除去方法及び装置 |
JP2005062890A (ja) * | 2003-08-19 | 2005-03-10 | Microsoft Corp | クリーン信号確率変数の推定値を識別する方法 |
JP2006071956A (ja) * | 2004-09-02 | 2006-03-16 | Hitachi Ltd | 音声信号処理装置及びプログラム |
JP2007033920A (ja) * | 2005-07-27 | 2007-02-08 | Nec Corp | 雑音抑圧システムと方法及びプログラム |
JP2013007975A (ja) * | 2011-06-27 | 2013-01-10 | Nippon Telegr & Teleph Corp <Ntt> | 雑音抑圧装置、方法及びプログラム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6252909B1 (en) * | 1992-09-21 | 2001-06-26 | Aware, Inc. | Multi-carrier transmission system utilizing channels of different bandwidth |
KR100355271B1 (ko) * | 2000-10-11 | 2002-10-11 | 한국전자통신연구원 | 적응형 전송기법을 이용한 강우 감쇠 보상방법 |
JP4282227B2 (ja) | 2000-12-28 | 2009-06-17 | 日本電気株式会社 | ノイズ除去の方法及び装置 |
DE112010005895B4 (de) * | 2010-09-21 | 2016-12-15 | Mitsubishi Electric Corporation | Störungsunterdrückungsvorrichtung |
JP6339896B2 (ja) * | 2013-12-27 | 2018-06-06 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 雑音抑圧装置および雑音抑圧方法 |
-
2015
- 2015-07-16 US US15/325,476 patent/US10748551B2/en active Active
- 2015-07-16 WO PCT/JP2015/003604 patent/WO2016009654A1/ja active Application Filing
- 2015-07-16 JP JP2016534288A patent/JP6696424B2/ja active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003140700A (ja) * | 2001-11-05 | 2003-05-16 | Nec Corp | ノイズ除去方法及び装置 |
JP2005062890A (ja) * | 2003-08-19 | 2005-03-10 | Microsoft Corp | クリーン信号確率変数の推定値を識別する方法 |
JP2006071956A (ja) * | 2004-09-02 | 2006-03-16 | Hitachi Ltd | 音声信号処理装置及びプログラム |
JP2007033920A (ja) * | 2005-07-27 | 2007-02-08 | Nec Corp | 雑音抑圧システムと方法及びプログラム |
JP2013007975A (ja) * | 2011-06-27 | 2013-01-10 | Nippon Telegr & Teleph Corp <Ntt> | 雑音抑圧装置、方法及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20170169837A1 (en) | 2017-06-15 |
US10748551B2 (en) | 2020-08-18 |
JP6696424B2 (ja) | 2020-05-20 |
JPWO2016009654A1 (ja) | 2017-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200105287A1 (en) | Deep neural network-based method and apparatus for combining noise and echo removal | |
CN109979476B (zh) | 一种语音去混响的方法及装置 | |
US20050182624A1 (en) | Method and apparatus for constructing a speech filter using estimates of clean speech and noise | |
JP5842056B2 (ja) | 雑音推定装置、雑音推定方法、雑音推定プログラム及び記録媒体 | |
KR101807961B1 (ko) | Lstm 및 심화신경망 기반의 음성 신호 처리 방법 및 장치 | |
JP2006238409A (ja) | 音声信号分離装置及び方法 | |
CN108010536B (zh) | 回声消除方法、装置、***及存储介质 | |
US9858946B2 (en) | Signal processing apparatus, signal processing method, and signal processing program | |
JP5344251B2 (ja) | 雑音除去システム、雑音除去方法および雑音除去プログラム | |
US20160042746A1 (en) | Noise suppressing device, noise suppressing method, and a non-transitory computer-readable recording medium storing noise suppressing program | |
JP5443547B2 (ja) | 信号処理装置 | |
WO2016009654A1 (ja) | 雑音抑圧システムと雑音抑圧方法及びプログラムを格納した記録媒体 | |
CN108806721A (zh) | 信号处理器 | |
JP5807914B2 (ja) | 音響信号解析装置、方法、及びプログラム | |
JP6059072B2 (ja) | モデル推定装置、音源分離装置、モデル推定方法、音源分離方法及びプログラム | |
JP2019035862A (ja) | 入力音マスク処理学習装置、入力データ処理関数学習装置、入力音マスク処理学習方法、入力データ処理関数学習方法、プログラム | |
CN103971697A (zh) | 基于非局部均值滤波的语音增强方法 | |
JP5374845B2 (ja) | 雑音推定装置と方法およびプログラム | |
US9654156B2 (en) | Nonlinear compensating apparatus and method, transmitter and communication system | |
WO2020162188A1 (ja) | 潜在変数最適化装置、フィルタ係数最適化装置、潜在変数最適化方法、フィルタ係数最適化方法、プログラム | |
JP5233330B2 (ja) | 音響分析条件正規化システム、音響分析条件正規化方法および音響分析条件正規化プログラム | |
US11152014B2 (en) | Audio source parameterization | |
JP2010049102A (ja) | 残響除去装置、残響除去方法、コンピュータプログラムおよび記録媒体 | |
WO2016092837A1 (ja) | 音声処理装置、雑音抑圧装置、音声処理方法および記録媒体 | |
CN117690421B (zh) | 降噪识别联合网络的语音识别方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15822687 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15325476 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2016534288 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15822687 Country of ref document: EP Kind code of ref document: A1 |