WO2019223650A1 - 一种波束成形方法、多波束成形方法、装置及电子设备 - Google Patents

一种波束成形方法、多波束成形方法、装置及电子设备 Download PDF

Info

Publication number
WO2019223650A1
WO2019223650A1 PCT/CN2019/087621 CN2019087621W WO2019223650A1 WO 2019223650 A1 WO2019223650 A1 WO 2019223650A1 CN 2019087621 W CN2019087621 W CN 2019087621W WO 2019223650 A1 WO2019223650 A1 WO 2019223650A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
target sound
calculate
product
pointed
Prior art date
Application number
PCT/CN2019/087621
Other languages
English (en)
French (fr)
Inventor
周舒然
李志飞
Original Assignee
出门问问信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810497069.8A external-priority patent/CN108717495A/zh
Priority claimed from CN201810496450.2A external-priority patent/CN108831498B/zh
Priority claimed from CN201810496448.5A external-priority patent/CN108551625A/zh
Application filed by 出门问问信息科技有限公司 filed Critical 出门问问信息科技有限公司
Publication of WO2019223650A1 publication Critical patent/WO2019223650A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the embodiments of the present application relate to the field of speech processing technologies, and in particular, to a beam forming method, a multi-beam forming method, a device, and an electronic device.
  • Beamforming is a signal processing technology (such as a microphone array) used for sensor arrays and used for directional signals. Receive and perform appropriate signal processing on the received sound signals. Beamforming allows the microphone component to receive sound signals in order to achieve the effect of selectively processing electrical signals. For example, the processing of sound information from one sound source is different from the processing of sound information from different sound sources.
  • embodiments of the present application provide a beam forming method, a multi-beam forming method, a device, and an electronic device, so as to ensure that the sound directed by the target space is not distorted, and effectively suppress the sound directed by other target spaces, thereby improving The signal-to-noise ratio of the sound pointed at the target space.
  • an embodiment of the present invention provides a beamforming method, including:
  • the method further includes:
  • the calculating spatial filtering parameters includes:
  • the first limitation condition is specifically a white noise gain limitation
  • the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is a first preset value.
  • calculating the delay time for the sound source to reach the microphone array includes:
  • the delay time is calculated according to a distance between the microphones, a speed at which the sound source propagates sound, and an angle at which the sound source points.
  • calculating the sound source direction according to the signal vector function and the delay time includes:
  • the spatial filtering parameter is a matrix.
  • the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
  • an embodiment of the present invention provides a beamforming apparatus, including:
  • a first obtaining unit configured to obtain spatial filtering parameters, which are different with different angles and subband frequencies
  • a determining unit configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit
  • a second obtaining unit configured to obtain that the sound source determined by the determining unit points to a corresponding original frequency domain signal
  • a first calculation unit is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to perform suppression in a frequency domain signal other than the original frequency domain signal pointed by the sound source. Beamforming.
  • an embodiment of the present invention provides a multi-beam beamforming method, including:
  • calculating the target beamforming output of the target sound source includes:
  • calculating the noise parameters according to the blocking matrix includes:
  • performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter includes:
  • an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
  • a first calculation unit configured to calculate that a target sound source points to a corresponding beamforming output
  • a second calculation unit configured to calculate a noise parameter by using a blocking matrix
  • a noise reduction unit configured to perform, according to the noise parameter calculated by the second calculation unit, a signal pointed by the target sound source calculated by the first calculation unit to a non-target sound source other than a corresponding beamforming output; Noise reduction.
  • the first calculation unit includes:
  • a first acquisition module configured to acquire spatial filtering parameters
  • a determining module configured to determine a target sound source direction corresponding to the spatial filtering parameter obtained by the first obtaining module
  • a second acquisition module configured to acquire a target sound source acquired by the first acquisition module to point to a corresponding original frequency domain signal
  • a calculation module is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain a beamforming output pointed by the target sound source.
  • the second calculation unit includes:
  • a first calculation module configured to calculate a frequency response in which a sound signal reaches the microphone in order
  • a construction module configured to construct the blocking matrix according to the frequency response calculated by the first calculation module
  • a second calculation module is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the non-target sound source pointing to a corresponding original frequency domain signal.
  • the noise reduction unit includes:
  • a noise reduction module configured to point the target sound source to a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter
  • the pointed signal is denoised.
  • the present invention provides a multi-beam beamforming method, including:
  • the spatial filtering parameters vary with the angle of the sound source and the subband frequency.
  • the at least two The sound source pointing includes a target sound source and at least one non-target sound source pointing;
  • a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output.
  • the method further includes:
  • calculating the product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming includes:
  • the calculation of the enhanced speech pointed by the target sound source includes:
  • calculating the energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source includes:
  • performing smoothing processing on a frame-by-frame basis for the current frame and the previous frame by using a smoothing parameter includes:
  • calculating the product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio, and outputting the speech corresponding to the product includes:
  • a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output according to the smoothing processing result.
  • an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
  • a first calculation unit is configured to calculate a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the subband frequency.
  • the at least two sound source points include a target sound source point and at least one non-target sound source point sound source point;
  • a second calculation unit configured to separately calculate the enhanced speech pointed by the target sound source
  • a third calculation unit configured to calculate an energy ratio based on the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
  • a fourth calculation unit is configured to calculate a product of the original frequency domain signal pointed by the target sound source, a corresponding enhanced speech pointed by the target sound source, and the energy ratio, and output a speech corresponding to the product.
  • an embodiment of the present invention provides a storage medium on which a computer program is stored, and the program is executed by a processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the first embodiment of the present invention.
  • an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a bus; the processor and the memory communicate with each other through the bus; And storing program instructions that are executed by the processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the third aspect of the embodiment of the present invention and / or the method according to the present invention The method described in the fifth aspect of the embodiment.
  • the beamforming output of the target sound source is obtained by calculating the product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source direction, and the target is improved by performing noise reduction processing on the non-target sound source direction
  • the signal-to-noise ratio of the beamforming output pointed by the sound source is obtained by calculating the product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source direction, and the target is improved by performing noise reduction processing on the non-target sound source direction. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
  • FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a microphone array according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another microphone array according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a final voice output pointed by a target sound source according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of still another multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention.
  • the beamforming method of the sound source in this embodiment is shown in FIG. 1 and includes the following steps:
  • Step S110 acquiring spatial filtering parameters.
  • the spatial filtering parameters are different with different angles and subband frequencies.
  • the beamforming in a fixed spatial direction can be enhanced by using spatial filtering parameters to ensure that the sound in the pointing direction is substantially unchanged, and the sound in other directions will be suppressed to a certain extent.
  • the spatial filtering parameter in the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to perform corresponding gain or suppression on the subband frequency of the signal of each frame.
  • the spatial filtering parameter in this embodiment is a matrix, and the spatial filtering parameter is calculated by a computer device, and the obtained spatial filtering parameter is stored in the method for performing the method described in the embodiment of the present invention.
  • the electronic devices are directly used by the power supply sub-devices, thereby reducing the time consumption of beamforming.
  • the following embodiments will take the direction of the beam pointing directly in front of 90 ° as an example, that is, the sound source is pointing directly in front of 90 °, but it should be noted that this method is not easy to perform in a limited beam. It is 90 °. In practical applications, the sound source is directed at any angle of plane wave 0 ° -180 °, such as 30 °, 60 °, 120 ° and so on.
  • step S120 the sound source corresponding to the spatial filtering parameter is determined.
  • Step S130 Acquire a sound source pointing to a corresponding original frequency domain signal.
  • the sound source reaches the microphone array from different directions, causing different microphones to receive signals with different degrees of delay time.
  • the delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
  • the microphone array is composed of a certain number of acoustic sensors (usually microphones), which are used to sample the spatial characteristics of the sound field.
  • the number of microphones can be 4 in a linear pattern (as shown in Figure 2), with even spacing. Distribution, 6 evenly spaced evenly spaced lines, 8 evenly spaced evenly spaced circles (as shown in Figure 3), 12 or 14 evenly spaced evenly spaced circles, rectangles, crescents, etc.
  • the number and arrangement of the microphone array are not limited in the embodiment of the present invention. However, for the convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 2 shown in FIG. 2 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • the delay time for the sound source to reach each microphone may be calculated by the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front).
  • tau_0 d * sin ( ⁇ ) / c; the second microphone Mic2
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • step 110 may be performed first, and then step 120 may be performed, or step 110 and step 120 may be performed simultaneously.
  • step 110 may be performed first, and then step 120 may be performed, or step 110 and step 120 may be performed simultaneously. This embodiment of the present invention This is not limited.
  • Step S140 Calculate a product of the obtained spatial filtering parameter and the original frequency domain signal of the sound source to obtain a beamforming output pointed by the sound source.
  • the product performs beamforming in a manner of suppressing the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source.
  • the spatial filtering parameters and the original frequency domain signal are both matrices, and the two matrices are multiplied together, and the product is generated from the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source.
  • the beamforming is performed in a suppression manner, so that sound signals in a fixed direction are not distorted, and sound signals in other directions are suppressed.
  • an electronic device obtains a spatial filtering parameter, and the spatial filtering parameter is different with different angles and subband frequencies; determining a sound source direction corresponding to the spatial filtering parameter, and acquiring the sound The source points to the corresponding original frequency domain signal; the product of the spatial filtering parameter and the original frequency domain signal is calculated, and the product is used to suppress the frequency domain signals other than the original frequency domain signal pointed by the sound source.
  • the present invention can not only save the time of beamforming by presetting the spatial filtering parameters in advance, but also can achieve no distortion of sound signals in a fixed direction.
  • a computer equipment is used to pre-calculate the spatial filtering parameters corresponding to any angle of plane wave 0 ° -180 °, so as to obtain the corresponding spatial filtering parameters when beamforming the sound source.
  • FIG. 4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention.
  • calculating the spatial filtering parameters specifically includes the following steps:
  • Step S1 Calculate the delay time for the sound source to reach the microphone array.
  • the sound source reaches the microphone array from different directions, resulting in different degrees of delay time for signals received by different microphones.
  • the delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
  • the calculation of the delay time from the arrival of the sound source to the microphone array may specifically include, but is not limited to, the following steps: determine the microphone distance d, the sound propagation speed c, and the angle ⁇ that the sound source is pointing at (i.e., you want to receive and focus) Direction angle, such as 90 ° directly in front).
  • the above-mentioned delay time is calculated according to the determined microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source is pointed. For specific methods, refer to step S120, and details are not described herein again.
  • step S2 a signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • Step S3 Calculate a spatial filtering parameter when the loss function approaches a minimum value according to a preset first limitation condition and a second limitation condition.
  • the loss function is constructed according to the spatial filtering parameters and the signal vector function.
  • the preset first limiting condition is a white noise gain limitation.
  • W f ( ⁇ ) is the spatial filtering parameter
  • T is the transpose operation
  • H is the conjugate transpose
  • 2 * ⁇ * f
  • f is the matrix corresponding to all subband frequencies
  • is the direction angle of the sound and focus .
  • g ( ⁇ , ⁇ ) is a signal vector function.
  • is a gain limit of white noise.
  • the embodiment of the present invention does not limit the specific value of ⁇ .
  • the preset second limitation condition is that the product of the spatial filtering parameter and the signal vector function is a first preset value.
  • the first preset value is 1.
  • the spatial filtering parameters and the signal vector function are both matrices, and in general, the matrix of the signal vector function hardly changes.
  • the spatial conditions of beamforming are limited.
  • the first restriction condition and the second restriction condition must be satisfied at the same time.
  • it may also include satisfying a third limiting condition.
  • the third limiting condition is: determining the convexity of the loss function.
  • R nn is the covariance matrix of noise
  • g ( ⁇ , ⁇ ) is a signal vector function
  • H is a conjugate transpose
  • the loss function constructed according to the spatial filtering parameters and the signal vector function is:
  • the loss function b_hat makes the final response at each angle ⁇ :
  • calculating the spatial filtering parameter when the loss function approaches a minimum value is as follows:
  • FIG. 5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 5, the multi-beam beamforming method in this embodiment includes the following steps:
  • Step S210 Calculate that the target sound source points to the corresponding beamforming output.
  • the source angle of the beam forming sound is directed by at least two sound sources, forming a multi-beam beam forming.
  • the sound source is directed at any angle of plane wave 0 ° -180 °, which needs to be explained.
  • the at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction.
  • the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 ° , 120 °, 150 °, and 180 ° directions (a total of 7 directions) are used as an example for explanation.
  • the target sound source is directed at 90 °.
  • the angle can be 53 °, 80 °, and the target sound source can also be 60 °. It is not limited.
  • each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 equal intervals (as shown in Figure 2), uniformly distributed at 6 equal intervals, and 8 in circles. Shapes are uniformly distributed at equal intervals (as shown in FIG.
  • 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc.
  • the specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 3 shown in FIG. 3 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • a GSC Generalized Sidelobe Cancellation
  • the blocking matrix is used to characterize the frequency response of the sound signal.
  • the purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source.
  • the beam pointing is: 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (a total of 7 directions), and the target sound source is 90 °, so the noise parameter is used for the sound Sources are: 0 °, 30 °, 60 °, 120 °, 150 °, 180 ° for noise reduction.
  • Step S230 Perform noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter.
  • the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter is used to reduce the signal pointed by the non-target sound source.
  • Noise in this way, can not only ensure that the target sound source is not distorted by the sound, but also reduce the interference of other sound sources to the sound.
  • the multi-beam beamforming method provided in the embodiment of the present invention calculates a target sound source to point to a corresponding beamforming output; calculates a noise parameter through a blocking matrix; and points the target sound source outside the corresponding beamforming output according to the noise parameter.
  • the embodiments of the present invention can ensure that the sound pointed by the target sound source is not distorted, and perform noise reduction on the sound pointed by other sound sources, which can effectively suppress other sounds.
  • Directional interference can be used to ensure that the sound pointed by the target sound source is not distorted, and perform noise reduction on the sound pointed by other sound sources, which can effectively suppress other sounds.
  • step S210 When step S210 is performed to calculate the target sound source pointing to the corresponding beamforming output, the following methods may be adopted, for example: acquiring spatial filtering parameters, and determining the target sound source corresponding to the spatial filtering parameters, and obtaining the target sound source Pointing to the corresponding original frequency domain signal; calculating the product of the spatial filtering parameter and the target sound source pointing to the corresponding original frequency domain signal to obtain the beamforming pointed by the target sound source.
  • the spatial filtering parameter according to the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame.
  • the spatial filtering parameters described in the embodiments of the present invention are a matrix.
  • the spatial filtering parameters are calculated by computer equipment, and the obtained spatial filtering parameters are stored in an electronic device that executes the method according to the embodiments of the present invention. In the device, the power supply sub-device is used directly, thereby reducing the time consumption of beamforming.
  • determining the spatial filtering parameters W f ( ⁇ ) corresponding to the target sound source directed at the direction of the beam focused by the delay time positioning i.e., determining the spatial filter parameters W f ( ⁇ ) corresponding to the target sound source point.
  • the following method can be adopted, but not limited to, the delay time of the sound source reaching each microphone can be calculated through the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front).
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the noise parameter is calculated through the blocking matrix, which can be adopted, but not limited to, for example, by calculating the frequency response of the sound signal reaching the microphone in sequence, and constructing the blocking matrix based on the frequency response, according to the blocking matrix and the non-target sound
  • the source points to the corresponding original frequency domain signal, and the noise parameters are calculated.
  • the purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source.
  • the noise parameter is calculated according to the blocking matrix H (e j ⁇ ) and the non-target sound source pointing to the corresponding original frequency domain signal Z (t, e j ⁇ ):
  • t represents the input time of each frame signal.
  • the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter U (t, e j ⁇ )
  • the signal pointed by the target sound source is denoised, so that it can not only ensure that the target sound source is not distorted, but also reduce the interference of non-target sound sources.
  • step S230 noise reduction is performed according to the noise parameter U (t, e j ⁇ ) on the signals pointed by the sound source other than the corresponding beamforming output, which can be adopted, but not limited to
  • the following methods include: calculating multi-channel optimal filtering parameters through a multi-channel filtering algorithm and an iterative algorithm; and pointing the target sound source to the corresponding beam forming output according to the beam forming output, optimal filtering parameters, and noise parameters of the target sound source. Signals pointed to by other sound sources are denoised.
  • the embodiment of the present invention is described by taking a multi-channel filtering algorithm as a multi-channel Wiener filtering as an example.
  • the optimal filtering parameter G is calculated by using a multi-channel Wiener filter and an NLMS iterative method (Normalized Least Mean Square).
  • FIG. 6 shows a schematic diagram of the final voice output pointed to by a target sound source according to an embodiment of the present invention, where Y ( ⁇ , ⁇ ) in FIG. 7 It is expressed as Y FBF (t, e j ⁇ ), and G (t, e j ⁇ ) * U (t, e j ⁇ ) is expressed as Y NC (t, e j ⁇ ).
  • this can further ensure that the sound pointed by the target sound source is not distorted and further suppress the non- The interference pointed by the target sound source.
  • FIG. 7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 7, the multi-beam beamforming method in this embodiment includes the following steps:
  • Step S340 Calculate a product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the frequency of the subband.
  • At least two sound source directions include a target sound source and at least one non-target sound source direction.
  • the spatial filtering parameter described in this embodiment is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame.
  • the spatial filtering parameters described in the embodiments of the present invention are a matrix.
  • the spatial filtering parameters are obtained through calculation by a computer device. After the calculation results are obtained, the spatial filtering parameters are stored in the electronic device according to the embodiments of the present invention. In the use of power supply equipment, the time consumption of beamforming is shortened.
  • the method described in steps S1 to S3 in FIG. 4 may be used to calculate the spatial filtering parameters, and details are not described herein again.
  • the beam angle of the sound source in this embodiment is directed by at least two sound sources, which constitutes multi-beam beamforming.
  • the sound source is directed at any angle of plane wave 0 ° -180 °.
  • the at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction.
  • the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 °, 120
  • the directions of °, 150 °, and 180 ° are used as an example for description.
  • the target sound source is pointed at 90 °.
  • this method is not easy to perform in a limited beam.
  • the above angle can also point to 53 °, 80 °, and the target sound source can also be 60 °, etc., which is not specifically limited.
  • each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 linear shapes and evenly spaced (as shown in Figure 2), uniformly distributed at 6 linear shapes and evenly spaced, and 8 circled.
  • the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 linear shapes and evenly spaced (as shown in Figure 2), uniformly distributed at 6 linear shapes and evenly spaced, and 8 circled.
  • Shapes are uniformly distributed at equal intervals (as shown in FIG. 3), 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc.
  • the specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for the convenience of description, the embodiments of the present invention will be described later using the microphone array style and quantity in FIG. 2 as an example, but it should be clear that this description manner does not specifically limit the microphone array.
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • Step S320 Calculate the enhanced speech pointed by the target sound source.
  • the microphone array 2 in FIG. 2 is used as an example.
  • the 7 segments of sound are subjected to Fourier transform to obtain 7 4 * 512 matrices, where 4 represents the number of microphones.
  • 512 represents that the spectrum corresponding to different directions is decomposed into 512 subbands respectively.
  • the purpose of this step is to perform filtering processing from the perspective of the subbands, and determine the proportion of all subbands corresponding to the target sound source on each subband.
  • the frequency spectrum corresponding to the target sound source corresponds to ⁇ 1: 4 * 512 subbands
  • the 0 ° sound source points to the corresponding spectrum corresponding to ⁇ 2: 4 * 512 subbands
  • the 30 ° sound source points to the corresponding frequency spectrum.
  • 60 ° sound source points to the corresponding spectrum corresponds to ⁇ 4: 4 * 512 subbands
  • 120 ° sound source points to the corresponding spectrum corresponds to ⁇ 5: 4 * 512 subbands
  • 150 ° sound source points to the corresponding spectrum corresponds to ⁇ 6: 4 * 512 subbands
  • a 180 ° sound source pointing to the corresponding spectrum corresponds to ⁇ 7: 4 * 512 subbands.
  • calculating the ratio gain of the target sound source pointing is: ⁇ 1 / ( ⁇ 1 + ⁇ 2 + ⁇ 3 + ⁇ 4 + ⁇ 5 + ⁇ 6 + ⁇ 7); in another implementation, calculating the target sound source pointing corresponding The ratio gain is: ⁇ 1 / ( ⁇ 2 + ⁇ 3 + ⁇ 4 + ⁇ 5 + ⁇ 6 + ⁇ 7).
  • Step S330 Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • multiple subbands of the current frame spectrum decomposition are combined, and the energy of the combined subbands is obtained.
  • the current frame includes a target sound source and a non-target sound source.
  • the 512 subbands corresponding to the target sound source are combined first, and the combined subband energy is determined.
  • calculate 6 The sum of the energy of all subbands pointed by the sound source (or 7 sound sources, including the target sound source). The energy sum is a matrix.
  • the energy ratio is calculated based on the sum of the energy of the subbands corresponding to the target sound source and the energy of all the subbands pointed by the 6 sound sources (or 7 sound sources, including the target sound source).
  • Step S340 Calculate a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech and the energy ratio pointed by the target sound source to reduce noise to the non-target sound source, and output the speech corresponding to the product.
  • the shaping can ensure that the sound pointed by the target sound source is not distorted, and at the same time, can suppress the noise generated in the direction of the non-target sound source.
  • multi-beam beamforming is obtained by calculating a product of a spatial filtering parameter and at least two sound source directions corresponding to original frequency domain signals, respectively.
  • the product of the voice, energy ratio, and the original frequency domain signal pointed by the target sound source to output the speech corresponding to the product, thereby achieving noise reduction processing for non-target sound sources and ensuring that the sound pointed by the target sound source is not distorted.
  • FIG. 8 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention.
  • an embodiment of the present invention also provides another method for multi-beam beamforming.
  • the method for multi-beam beamforming in this embodiment includes the following steps:
  • Step S410 Calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the frequency of the subband.
  • At least two sound source directions include a target sound source and at least one non-target sound source direction.
  • the spatial filter is determined parameter W f ( ⁇ ) corresponding to at least two points when the sound source direction of the beam focused by positioning the delay time, i.e., determining the spatial filter parameters W f ( ⁇ ) corresponding to the target sound source
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the beam directions calculated by the above method are respectively : 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (7 directions in total) for single beam forming.
  • 7 7 * 512 matrices are obtained, 4 represents the number of microphones, and 512 represents the frequency spectrum corresponding to different directions is decomposed into 512 subbands respectively.
  • Step S420 Calculate the enhanced speech pointed by the target sound source.
  • the following methods are used to calculate the enhanced speech pointed by the target sound source, including:
  • each subband calculates the ratio gain between the energy pointed by the target sound source and the energy sum directed by all sound sources; calculate the product of the first product B ( ⁇ , ⁇ ) and the ratio gain to obtain enhanced speech, where:
  • the first product is a product between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering.
  • the essence is to merge the 4 microphones, that is, to obtain 7 1 * 512 matrices, and obtain the energy sums pointed by all sound sources as Spectrum power of other directions, continue Obtain the energy pointed by the target sound source and record it as: Spectrum power of target directions. Calculate the ratio of the energy pointed by the target sound source to the energy of the target direction and the energy pointed by all sound sources and the Spectrum power of other directions to get the ratio gain Gain- mask.
  • Step S430 Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • the energies of all subbands in the current frame are combined, and the energy sum of all subbands in the current frame is calculated; the energy of the subband corresponding to the target sound source and the energy of all subbands pointed to by the non-target sound source are calculated.
  • the ratio between and to get the energy ratio is calculated.
  • the ratio between the energy of the subband corresponding to the target sound source and the energy sum of all subbands in the current frame is calculated to obtain the energy ratio.
  • the current frame contains all subbands in the direction of the 7 sound sources.
  • the energy corresponding to all subbands in the current frame is combined.
  • all the subbands pointed by each sound source are combined to obtain the spectrum corresponding to the different directions.
  • 7 * 1 matrix where 7 is the direction of 7 sound sources and 1 is the combined subband (spectrum).
  • all subbands corresponding to different directions are combined to obtain a 1 * 1 matrix, that is, the energy sum of all subbands is obtained according to the matrix, and it is denoted as Energy of each bin direction.
  • Step S240 Perform frame-by-frame smoothing processing on the current frame and the previous frame through the smoothing parameters.
  • the purpose of performing the smoothing process is to enable a smooth transition of speech before two consecutive frames. Therefore, when smoothing the current frame and the previous frame frame by frame using the smoothing parameter, the following manners can be adopted but not limited to:
  • the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is the second preset value.
  • the second preset value is 1.
  • the frame-by-frame smoothing process is performed on the sound source in the current frame according to the sum of the second product and the third product.
  • the smoothing parameter ⁇ is an empirical value
  • the smoothing parameter ⁇ of the current frame can be set to 0.8
  • the present invention does not limit this.
  • Step S450 Calculate a product of the corresponding enhanced speech and energy ratio pointed by the target sound source and the original frequency domain signal pointed by the target sound source, and output the speech corresponding to the product according to the smoothing result.
  • multi-beam beamforming is obtained by calculating a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions respectively, and by calculating an enhanced voice, an energy ratio, and a target sound source direction of the target sound source, The product of the original frequency domain signal, while smoothing the current frame and the previous frame by smoothing parameters, and outputting the speech corresponding to the product according to the smoothing processing result, further reducing the noise of non-target sound sources, and further ensuring The sound pointed by the target sound source is not distorted.
  • another embodiment of the present invention further provides a voice processing apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • another embodiment of the present invention further provides a beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention.
  • the beamforming apparatus 9 of this embodiment includes a first obtaining unit 91, a determining unit 92, a second obtaining unit 93, and a first calculating unit 94.
  • the first obtaining unit 91 is configured to obtain spatial filtering parameters, and the spatial filtering parameters are different according to different angles and subband frequencies.
  • the determining unit 92 is configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit 91.
  • the second obtaining unit 93 is configured to obtain that the sound source determined by the determining unit 92 points to a corresponding original frequency domain signal.
  • the first calculation unit 94 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to suppress other frequency domain signals except the original frequency domain signal pointed by the sound source.
  • the beamforming device 9 further includes:
  • the second calculation unit 95 is configured to calculate the spatial filtering parameters before the first obtaining unit 93 obtains the spatial filtering parameters.
  • the second calculation unit 95 includes:
  • the first calculation module 951 is configured to calculate a delay time when the sound source reaches the microphone array.
  • a building module 952 is used to build a signal vector function.
  • a second calculation module 953 is configured to calculate a sound source direction according to the signal vector function constructed by the construction module 952 and the delay time calculated by the first calculation module 951.
  • the first setting module 954 is configured to set a first limiting condition, where the first limiting condition is a white noise gain limitation.
  • the second setting module 955 is configured to set a second limitation condition, where the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is 1.
  • a construction module 956 is configured to construct a loss function according to the spatial filtering parameter and the signal vector function.
  • a third calculation module 957 is configured to calculate the loss function according to the first restriction condition set by the first setting module 954 and the second restriction condition set by the second setting module 955. Spatial filtering parameters towards the minimum.
  • the first calculation module 951 includes:
  • the first determining sub-module 951a is configured to determine a distance between microphones in the microphone array, and a speed at which a sound source propagates sound.
  • the second determining sub-module 951b is configured to determine an angle pointed by the sound source.
  • a calculation sub-module 951c is configured to calculate a delay time according to a distance, a speed, and an angle between the microphones.
  • the second calculation module 953 includes:
  • a determining sub-module 953a is configured to determine a matrix corresponding to all sub-band frequencies.
  • a calculation sub-module 953b is configured to calculate a sound source direction according to the matrices corresponding to all the sub-band frequencies, the signal vector function, and the delay time determined by the determination sub-module.
  • the spatial filtering parameter is a matrix.
  • the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
  • the beamforming device described in this embodiment is a device that can execute the beamforming method in the embodiment of the present invention, based on the beamforming method described in the embodiment of the present invention, those skilled in the art can understand the The specific implementations of the beamforming device and its various variations, so how to implement the beamforming method in the embodiment of the present invention with the beamforming device will not be described in detail here. As long as a device used by a person skilled in the art to implement the beamforming method in the embodiment of the present invention falls within the protection scope of the present application.
  • another embodiment of the present invention further provides a multi-beam beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • the multi-beam beamforming apparatus 11 of this embodiment includes a first calculation unit 111, a second calculation unit 112, and a noise reduction unit 113.
  • the first calculation unit 111 is configured to calculate that the target sound source points to a corresponding beamforming output.
  • the second calculation unit 112 is configured to calculate a noise parameter by using a blocking matrix.
  • the noise reduction unit 113 is configured to perform, according to the noise parameter calculated by the second calculation unit 112, a signal pointed by the target sound source calculated by the first calculation unit 111 to a non-target sound source other than the corresponding beamforming output. Noise reduction.
  • the first calculation unit 111 includes:
  • the first obtaining module 1111 is configured to obtain spatial filtering parameters.
  • a determining module 1112 is configured to determine a target sound source corresponding to the spatial filtering parameter obtained by the first obtaining module 1111.
  • the second acquisition module 1113 is configured to acquire the target sound source acquired by the first acquisition module 1111 to point to the corresponding original frequency domain signal.
  • a calculation module 1114 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain the beamforming pointed by the target sound source.
  • the second calculation unit 112 includes:
  • the first calculation module 1121 is configured to calculate a frequency response of the sound signal reaching the microphone in order.
  • a construction module 1122 is configured to construct the blocking matrix according to the frequency response calculated by the first calculation module.
  • a second calculation module 1123 is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the other sound sources pointing to corresponding original frequency domain signals.
  • the noise reduction unit 113 includes:
  • a calculation module 1131 is configured to calculate a multi-channel optimal filtering parameter by using a multi-channel filtering algorithm and an iterative algorithm.
  • the noise reduction module 1132 is configured to perform, according to the beamforming output of the target sound source, an optimal filtering parameter, and the noise parameter, a signal directed by the sound source directed by a sound source other than the corresponding beamforming output. Noise reduction.
  • the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art
  • the specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here.
  • a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
  • another embodiment of the present invention further provides a multi-beam beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • the multi-beam beamforming apparatus 13 in this embodiment includes a first calculation unit 131, a second calculation unit 132, a third calculation unit 133, and a fourth calculation unit 134.
  • the first calculation unit 131 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming.
  • the spatial filtering parameter varies with the angle of the sound source and the subband frequency.
  • Each sound source point includes a target sound source point and at least one non-target sound source point.
  • the second calculation unit 132 is configured to separately calculate the enhanced speech pointed by the target sound source.
  • the third calculation unit 133 is configured to calculate an energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • a fourth calculation unit 134 is configured to calculate a product of the original frequency domain signal pointed by the target sound source and the enhanced speech and energy ratio corresponding to the target sound source direction, and output the speech corresponding to the product.
  • the multi-beam beamforming device 13 further includes:
  • a processing unit 135, configured to: before the fourth calculation unit 134 calculates a product of the original frequency domain signal pointed by the target sound source and the target speech source pointed by the corresponding enhanced speech and energy ratio, smoothing the current frame One frame is smoothed frame by frame.
  • the first calculation unit 131 includes:
  • the first obtaining module 1311 is configured to obtain spatial filtering parameters.
  • a determining module 1312 is configured to determine at least two sound source directions respectively corresponding to the spatial filtering parameters obtained by the first obtaining module 1311.
  • a second acquisition module 1313 is configured to acquire at least two sound sources determined by the determination module to point to corresponding original frequency domain signals.
  • a calculation module 1314 is configured to calculate products of the spatial filtering parameters and original frequency domain signals corresponding to different sound source directions, respectively.
  • the second calculation unit 132 includes:
  • the first calculation module 1321 is configured to calculate a ratio gain between the energy pointed by the target sound source and the energy sum pointed by all the sound sources by using each subband as a unit.
  • a second calculation module 1322 is configured to calculate a product of a first product and a ratio gain to obtain enhanced speech, where the first product is a signal between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering. product.
  • the third calculation unit 133 includes:
  • a combining module 1331 is configured to combine the energy corresponding to all subbands in the current frame.
  • the first calculation module 1332 is configured to calculate energy sums of all subbands in the current frame.
  • a second calculation module 1333 is configured to calculate a ratio between the energy of the sub-band corresponding to the target sound source and the energy of all the sub-bands pointed to by at least one non-target sound source to obtain an energy ratio.
  • the processing unit 135 includes:
  • a setting module 1351 is used to set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is 1.
  • a calculation module 1352 is configured to calculate a product of a ratio gain of a previous frame and a corresponding smoothing parameter to obtain a second product, and calculate a product of a smoothing parameter of the current frame and the ratio gain to obtain a third product.
  • the processing module 1353 is configured to perform frame-by-frame smoothing processing on the current frame according to the sum of the first product and the second product.
  • the fourth calculation unit 134 is further configured to calculate a product of the target sound source pointing to a corresponding enhanced voice, an energy ratio, and the original frequency domain signal pointed to by the target sound source, and output a smoothing result according to a smoothing result.
  • the speech corresponding to the product is described.
  • the apparatus for multi-beam beamforming calculates a product of a spatial filtering parameter and at least two sound sources pointing to corresponding original frequency-domain signals to obtain multi-beam beamforming.
  • the spatial filtering parameter varies with the angle of the sound source.
  • the at least two sound source directions include a target sound source and at least one other sound source direction; calculating an enhanced speech pointed by the target sound source; and according to the sub-band energy corresponding to the target sound source and at least one Sum of the energy of all subbands pointed by other sound sources, calculate the energy ratio; calculate the product of the original frequency-domain signal pointed by the target sound source and the target sound source pointed to the corresponding enhanced speech, energy ratio, and output the product corresponding to the product
  • the embodiment of the present invention can ensure that the sound pointed by the target sound source is not distorted, and can effectively suppress interference from other sound directions.
  • the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art
  • the specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here.
  • a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
  • Each of the foregoing devices includes a processor and a memory.
  • Each unit in the device is stored in the memory as a program unit, and the processor executes the program unit stored in the memory to implement a corresponding function.
  • the processor contains a kernel, and the kernel retrieves the corresponding program unit from the memory.
  • the kernel can set one or more, and when adjusting the kernel parameters to implement the above method, ensure that the sound pointed by the target space is not distorted, and the sound pointed by other spaces is effectively suppressed.
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM).
  • Memory includes at least one storage chip.
  • An embodiment of the present invention provides a storage medium on which a program is stored, and when the program is executed by a processor, the foregoing voice processing method is implemented.
  • An embodiment of the present invention provides a processor, where the processor is configured to run a program, and when the program runs, the foregoing voice processing method is performed.
  • FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 15, the electronic device 17 includes:
  • At least one processor 151 At least one processor 151;
  • the processor 151 and the memory 152 complete communication with each other through the bus 153;
  • the processor 151 is configured to call program instructions in the memory 152 to execute any one of the foregoing methods.
  • the electronic devices in this article can be servers, PCs, PADs, mobile phones, smart TVs, and other smart devices that include microphones.
  • the electronic device obtained by the embodiment of the present invention obtains a beamforming output pointed by the target sound source by calculating a product of a spatial filtering parameter and a target original sound source signal corresponding to the target sound source pointing, and performs noise reduction by pointing to a non-target sound source
  • the processing improves the signal-to-noise ratio of the beamforming output pointed by the target sound source. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
  • An embodiment of the present invention further provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute any one of the foregoing voice processing methods.
  • This application also provides a computer program product that, when executed on a data processing device, implements the functions of any of the above-mentioned speech processing methods.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • flashRAM flash memory
  • Computer-readable media includes permanent and non-persistent, removable and non-removable media.
  • Information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Geometry (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

本发明实施例公开了一种波束成形方法、多波束成形方法、装置及电子设备,通过计算空间滤波参数与目标声音源指向对应的原始频域信号的乘积获取所述目标声音源指向的波束成形输出,并通过对非目标声音源指向进行降噪处理提高所述目标声音源指向的波束成形输出的信噪比。由此,可以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。

Description

一种波束成形方法、多波束成形方法、装置及电子设备
本申请要求了2018年05月22日提交的、申请号为2018104970698、发明名称为“多波束波束成形的方法、装置及电子设备”,2018年05月22日提交的、申请号为2018104964502、发明名称为“多波束波束成形的方法、装置及电子设备”,以及2018年05月22日提交的、申请号为2018104964485、发明名称为“波束成形的方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及语音处理技术领域,特别是涉及一种波束成形方法、多波束成形方法、装置及电子设备。
背景技术
随着智能终端技术的快速普及,用户对于智能终端的功能以及智能化的要求越来越高,如何使智能终端更加智能化,专业化,已经成为了当前研究方向之一。
比如:基本上所有的智能终端都标配录音功能,而录音功能大多数会使用波束成形(Beamforming),波束成形是一种用于传感器阵列的信号处理技术(例如麦克风阵列),用于定向信号接收和对接收到的声音信号进行适当的信号处理。波束成形允许麦克风组件接收声音信号以便达到选择性处理电信号的效果,例如,对从一个声源发出的声音信息的处理不同于从不同的声源发出的声音信息的处理。
目前,通常通过融合时域滤波器和频域中的波束成形驱动权重的计算来进行语音处理,但这并不能降低不需要的环境噪音。
发明内容
有鉴于此,本申请实施例提供了一种波束成形方法、多波束成形方法、装置及电子设备,以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。
第一方面,本发明实施例提供一种波束成形的方法,包括:
获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;
计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。
进一步地,在获取空间滤波参数之前,所述方法还包括:
计算所述空间滤波参数。
进一步地,所述计算空间滤波参数包括:
计算声音源到达麦克风阵列的延迟时间;
根据所述延迟时间构建信号矢量函数,并根据所述信号矢量函数及所述延迟时间计算声音源指向;
根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数,所述损失函数根据所述空间滤波参数和所述信号矢量函数构造;
其中,所述第一限制条件具体为白噪音增益限制;所述第二限制条件具体为使得所述空间滤波参数与所述信号矢量函数的乘积为第一预设值。
进一步地,计算声音源到达麦克风阵列的延迟时间包括:
确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度;
确定所述声音源指向的角度;
根据所述麦克风之间的间距、所述声音源传播声音的速度及所述声音源指向的角度计算延迟时间。
进一步地,根据所述信号矢量函数及所述延迟时间计算声音源指向包括:
确定所有子带频率对应的矩阵;
根据所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。
进一步地于,所述空间滤波参数为一矩阵。
进一步地,所述声音源指向为平面波0°-180°的任意角度。
第二方面,本发明实施例提供一种波束成形的装置,包括:
第一获取单元,用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;
确定单元,用于确定所述第一获取单元获取的所述空间滤波参数对应的声音源指向;
第二获取单元,用于获取所述确定单元确定的所述声音源指向对应的原始频域信号;
第一计算单元,用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。
第三方面,本发明实施例提供一种多波束波束成形的方法,包括:
计算目标声音源指向对应的波束成形输出;
根据阻塞矩阵计算噪音参数;
根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
进一步地,计算目标声音源指向对应的波束成形输出包括:
获取空间滤波参数,确定所述空间滤波参数对应的目标声音源指向;
获取所述目标声音源指向对应的原始频域信号;
计算所述空间滤波参数与所述目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。
进一步地,根据阻塞矩阵计算噪音参数包括:
计算声音信号依次达到麦克风的频率响应;
根据所述频率响应构建所述阻塞矩阵;
根据所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。
进一步地,根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪包括:
通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;
根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
第四方面,本发明实施例提供一种多波束波束成形的装置,包括:
第一计算单元,用于计算目标声音源指向对应的波束成形输出;
第二计算单元,用于通过阻塞矩阵计算噪音参数;
降噪单元,用于根据所述第二计算单元计算的所述噪音参数对所述第一计算单元计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
进一步地,所述第一计算单元包括:
第一获取模块,用于获取空间滤波参数;
确定模块,用于确定所述第一获取模块获取的所述空间滤波参数对应的目标声音源指向;
第二获取模块,用于获取所述第一获取模块获取的目标声音源指向对应的原始频域信号;
计算模块,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。
进一步地,第二计算单元包括:
第一计算模块,用于计算声音信号依次达到麦克风的频率响应;
构建模块,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵;
第二计算模块,用于根据所述构建模块构建的所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。
进一步地,所述降噪单元包括:
计算模块,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;
降噪模块,用于根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
第五方面,本发明提供一种多波束波束成形的方法,包括:
计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向;
计算所述目标声音源指向的增强语音;
根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;
计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。
进一步地,在计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积之前,所述方法还包括:
通过平滑参数对当前帧与前一帧进行逐帧平滑处理。
进一步地,所述计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形包括:
获取空间滤波参数,并确定所述空间滤波参数分别对应的至少两个声音源指向;
获取至少两个声音源指向分别对应的原始频域信号;
计算所述空间滤波参数分别与至少两个声音源指向对应的原始频域信号的乘积。
进一步地,所述计算目标声音源指向的增强语音包括:
以每个子带为单位,计算所述目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;
计算第一乘积与所述比值增益的乘积,以获取所述增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波参数之间的乘积。
进一步地,根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值包括:
将当前帧中所有子带对应的能量进行合并,计算当前帧所有子带的能量和;
计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。
进一步地,通过平滑参数对当前帧与前一帧进行逐帧平滑处理包括:
设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值;
计算前一帧的比值增益与前一帧的平滑参数以获取第二乘积;
计算当前帧的比值增益与当前帧的平滑参数的乘积以获取第三乘积;
根据所述第二乘积与第三乘积之和对当前帧进行逐帧平滑处理。
进一步地,计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音包括:
计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,根据平滑处理结果输出所述乘积对应的语音。
第六方面,本发明实施例提供一种多波束波束成形的装置,包括:
第一计算单元,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源声音源指向;
第二计算单元,用于分别计算目标声音源指向的增强语音;
第三计算单元,用于根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;
第四计算单元,用于计算所述目标声音源指向的所述原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。
第七方面,本发明实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行以实现如本发明实施例第一方面所述的方法和/或如本发明实施例第三方面所述的方法和/或如本发明实施例第五方面所述的方法。
第八方面,本发明实施例提供一种电子设备,所述电子设备中包括处理器、存储器和总线;所述处理器、所述存储器通过所述总线完成相互间的通信;所述存储器中用于存储程序指令,所述程序指令被所述处理器执行以实现如本发明实施例第一方面所述的方法和/或如本发明实施例第三方面所述的方法和/或如本发明实施例第五方面所述的方法。
本发明实施例通过计算空间滤波参数与目标声音源指向对应的原始频域信号的乘积获取所述目标声音源指向的波束成形输出,并通过对非目标声音源指向进行降噪处理提高所述目标声音源指向的波束成形输出的信噪比。由此,可以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请实施例的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1是本发明实施例的一种波束成形方法的流程图;
图2是本发明实施例的一种麦克风阵列的示意图;
图3是本发明实施例的另一种麦克风阵列的示意图;
图4是本发明实施例的一种计算空间滤波参数的方法流程图;
图5是本发明实施例的一种多波束波束成形方法的流程图;
图6是本发明实施例的一种目标声音源指向的最终语音输出的示意图;
图7是本发明实施例的另一种多波束波束成形方法的流程图;
图8是本发明实施例的又一种多波束波束成形方法的流程图;
图9是本发明实施例的一种波束成形装置的示意图;
图10是本发明实施例的另一种波束成形装置的示意图;
图11是本发明实施例的一种多波束波束成形装置的示意图;
图12是本发明实施例的另一种多波束波束成形装置的示意图;
图13是本发明实施例的又一种多波束波束成形装置的示意图;
图14是本发明实施例的又一种多波束波束成形装置的示意图;
图15是本发明实施例的一种电子设备的结构框图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图1是本发明实施例的一种波束成形方法的流程图。本实施例的声音源的波束成形方法如图1所示,包括以下步骤:
步骤S110,获取空间滤波参数。其中,空间滤波参数随角度和子带频率的不同而不同。
在本实施例中,可以通过空间滤波参数增强固定空间指向(声音源指向)的波束成形,以确保指向方向的声音大致不变,其他方向的声音会在一定程度上有抑制。
本发明实施例中的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益或者抑制。在一种可选的实现方式中,本实施例中的空间滤波参数为一矩阵,该空间滤波参数经过计算机设备的计算得到,将获取的空间滤波参数存储于执行本发明实施例所述的方法的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。
为了便于说明,后续实施例会以波束指向正前方90°方向为例进行说明,即声音源指向为正前方90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为90°,实际应用中,所述声音源指向为平面波0°-180°的任意角度,如30°、60°、120°等。
步骤S120,确定空间滤波参数对应的声音源指向。
步骤S130,获取声音源指向对应的原始频域信号。
声音源从不同的方向达到麦克风阵列,导致不同麦克风接收到信号会有不同程度的延迟时间,可通过延迟时间进行波束聚焦的方向定位,并确定出与空间滤波参数一致的声音源指向(如正前方90°)。
所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型(如图2所示)等间距均匀分布、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图2所示的4个成线型的麦克风阵列2为例进行说明,但是应当明确的是,该种说明方式并不是对麦克风阵列的具体限定。
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。
在本实施例中,在通过延迟时间进行波束聚焦的方向定位时,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。以声音源指向的角度Ω为90°为例,通常第一个麦克风Mic1为参考麦克风,延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:
Figure PCTCN2019087621-appb-000001
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。
需要说明的是,步骤110及步骤120之间并没有先后执行的限定,在实际应用中,也可先执行步骤110,再执行步骤120,或者,步骤110和步骤120同步执行,本发明实施例对此不做限定。
步骤S140、计算获取的空间滤波参数与声音源的原始频域信号的乘积以获取该声音源指向的波束成形输出。其中,所述乘积会以对除声音源指向的原始频域信号之外的非目标声音源对应的原始频域信号产生抑制的方式进行波束成形。
其中,空间滤波参数和原始频域信号均为矩阵,将两个矩阵相乘,所述乘积会以对除声音源指向的原始频域信号之外的非目标声音源对应的原始频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真,并且,对其他方向的声音信号产生抑制。
本发明实施例提供的波束成形的方法,电子设备获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积会以对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形;与现有技术相比,本发明不仅能够通过空间滤波参数的提前预置节省波束成形的时间,而且还能够实现对固定方向的声音信号不失真。
在本实施例中,通过计算机设备预先计算平面波0°-180°的任意角度对应的空间 滤波参数,以便对声音源进行波束成形时获取对应的空间滤波参数。
图4是本发明实施例的一种计算空间滤波参数的方法流程图。在一种可选的实现方式中,如图4所示,计算空间滤波参数具体包括以下步骤:
步骤S1,计算声音源到达麦克风阵列的延迟时间。声音源从不同的方向到达麦克风阵列,导致不同麦克风接收到信号会有不同程度的延迟时间,可通过延迟时间进行波束聚焦的方向定位,并确定出与空间滤波参数一致的声音源指向(如正前方90°)。
在本实施例中,计算声音源到达麦克风阵列的延迟时间具体可以采用但不限于以下步骤:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。根据确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω计算上述延迟时间。具体方法请参考步骤S120,在此不再赘述。
步骤S2,根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:
Figure PCTCN2019087621-appb-000002
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。具体解释请参考步骤S120,在此不再赘述。
步骤S3,根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数。其中,损失函数根据空间滤波参数和信号矢量函数构造。
在一种可选的实现方式中,预设的第一限制条件为白噪音增益限制。
Figure PCTCN2019087621-appb-000003
W f(ω)为空间滤波参数,T为转置运算,H为共轭转置,ω=2*π*f,f为所有子带频率对应的矩阵,Ω为收声和聚焦的方向角度。g(ω,Ω)为信号矢量函数。γ为白噪音的增益限制,可选的,白噪音的增益限制为gamma_db=-20db,γ具体为exp(gamma_db/10),具体的,本发明实施例对γ的具体数值不做限定。
在一种可选的实现方式中,预设的第二限制条件为使得空间滤波参数与信号矢量 函数的乘积为第一预设值。优选地,第一预设值为1。也就是说,第二限制条件为:W f(ω)*g(ω,Ω)=1。其中,空间滤波参数与信号矢量函数均为矩阵,并且,在一般情况下信号矢量函数的矩阵几乎不会变化。
本发明实施例要对波束形成的空间条件进行限定。在具体实现过程中,必须要同时满足所述第一限制条件和第二限制条件。可选的,除了满足上述两个限制条件外,还可以包含满足第三限制条件,第三限制条件为:确定损失函数的凸面性。
Figure PCTCN2019087621-appb-000004
其中,R nn是噪声的协方差矩阵,g(ω,Ω)为信号矢量函数,H为共轭转置。
根据空间滤波参数及信号矢量函数构造的损失函数为:
Figure PCTCN2019087621-appb-000005
其中,损失函数b_hat使得最终得到在每个角度Ω上的响应response:
Figure PCTCN2019087621-appb-000006
根据所述第一限制条件及所述第二限制条件,计算所述损失函数趋向最小值时的空间滤波参数具体如下:
Figure PCTCN2019087621-appb-000007
在计算损失函数趋向最小值时的空间滤波参数,还需要与第一限制条件、第二限制条件与第三限制条件建立方程式,采用数学解方程的方式解空间滤波参数,有关数学解答方程的算法本发明实施例在此不再进行赘述。
图5是本发明实施例的一种多波束波束成形方法的流程图。如图5所示,本实施例的多波束波束成形方法包括以下步骤:
步骤S210,计算目标声音源指向对应的波束成形输出。
本发明实施所述的波束成形的声音角度来源为至少两个声音源指向,构成多波束波束成形,在实际应用中,所述声音源指向为平面波0°-180°的任意角度,需要说明的是,本发明实施例所述的至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向,为了便于说明,后续实施例会以波束指向:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)为例进行说明,其中,目标声音源为指向90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为上述角度,还可以指向53°、80°,目标声音源还可以为60°等等,具体不做限定。
分别计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积,得到各个 单波束成形,该结果也为一个矩阵,其表现形式为频谱。在计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积时,需要通过麦克风阵列确定各个声音源指向,具体包括:所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型(如图2所示)等间距均匀分布、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图3所示的4个成线型的麦克风阵列3为例进行说明,但是应当明确的是,该种说明方式并不是对麦克风阵列的具体限定。
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。
作为本发明实施例的另一种实现方法,在计算目标声音源指向对应的波束成形输出时,还可以采用GSC(Generalized Sidelobe Cancellation)等计算单一声音源指向的波束成形的算法,本发明实施例对计算单一声音源指向的波束成形算法不做限定。
S220,根据阻塞矩阵计算噪音参数。其中,阻塞矩阵用于表征声音信号的频率响应。计算噪音参数的目的在于对非目标声音源指向的声音进行降噪。例如,波束指向分别为:0°、30°、60°、90°、120°、150°、180°方向(共7个方向),目标声音源指向为90°,则噪音参数用于对声音源指向为:0°、30°、60°、120°、150°、180°的声音进行降噪。
步骤S230,根据噪声参数对目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
在具体实施过程中,从步骤S210计算的目标声音源指向对应的波束成形输出信号中,滤除步骤S220中非目标声音源指向的信号,即采用噪音参数对非目标声音源指向的信号进行降噪,如此一来既能确保目标声音源指向声音的不失真,又能降低其他声音源指向声音的干扰。
本发明实施例提供的多波束波束成形的方法,计算目标声音源指向对应的波束成形输出;通过阻塞矩阵计算噪音参数;根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪;与现有技术相比,本发明实 施例能够确保目标声音源指向的声音不失真,并且对其他声音源指向的声音进行降噪,能够有效抑制其他声音方向的干扰。
进一步的,作为对上述实施例的进一步扩展及细化,下面依次说明每个步骤的具体实现方法。
在执行步骤S210计算目标声音源指向对应的波束成形输出时,可以采用但不局限于以下方法,例如:获取空间滤波参数,并确定所述空间滤波参数对应的目标声音源指向,获取目标声音源指向对应的原始频域信号;计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形。
其中,本发明实施例所述的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益。在实际应用中,本发明实施例中所述的空间滤波参数为一矩阵,该空间滤波参数经过电脑设备的计算得到,将获取的空间滤波参数存储于执行本发明实施例所述的方法的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。
获取空间滤波参数W f(ω),并确定所述空间滤波参数W f(ω)对应的目标声音源指向,并分别获取目标声音源指向对应的原始频域信号;计算所述空间滤波参数W f(ω)分别与不同声音源指向对应的原始频域信号的乘积。
在本实施例中,确定空间滤波参数W f(ω)对应的目标声音源指向在通过延迟时间进行波束聚焦的方向定位时,即确定空间滤波参数W f(ω)对应的目标声音源指向,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。通常第一个麦克风Mic1为参考麦克风,所以延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信 号矢量函数为:
Figure PCTCN2019087621-appb-000008
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。
进一步地,空间滤波参数W f(ω)和原始频域信号Z(t,e )均为矩阵,将两个矩阵相乘:得到Y(ω,Ω)=W f(ω)Z(t,e ),乘积Y(ω,Ω)会对除目标声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真。
在执行步骤S220通过阻塞矩阵计算噪音参数,可以采用但不局限于以下方式,例如:通过计算声音信号依次到达麦克风的频率响应,并根据该频率响应构建阻塞矩阵,根据该阻塞矩阵和非目标声音源指向对应的原始频域信号,计算噪声参数。计算噪音参数的目的在于对非目标声音源指向的声音进行降噪。
在一种可选的实现方式中,首先计算声音信号到达第一个麦克风的频率响应:A-1(e ),到达第二个麦克风的频率响应:A-2(e ),…,声音信号到达第M个麦克风的频率响应:A-M(e ),A用于表征麦克风的频率响应函数。
根据上述频率响应构建组阻塞矩阵:
Figure PCTCN2019087621-appb-000009
阻塞矩阵H(e )构建完毕后,根据阻塞矩阵H(e )及非目标声音源指向对应的原始频域信号Z(t,e ),计算所述噪音参数:
U(t,e )=H(t,e )Z(t,e )
其中,t表征每帧信号的输入时间。
在具体实施过程中,从步骤S210中计算的目标声音源指向对应的波束成形输出信号中,滤除步骤S220中非目标声音源指向的信号,即采用噪音参数U(t,e )对非目标声音源指向的信号进行降噪,如此一来既能确保目标声音源指向声音的不失真,又能降低非目标声音源指向声音的干扰。
在实际应用过程中,声音信号在传播过程中,会包含一些风扇、空调等比较稳定、微弱的噪声。为了降低该些噪声,在步骤S230执行根据噪音参数U(t,e )对目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪时,可以采用但不局限于以下方法,包括:通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;根据目标声音源的波束成形输出、最优滤波参数以及噪音参数,对目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪。
本发明实施例以多通道滤波算法为多通道维纳滤波为例进行说明。为了使得目标声音源指向输出的能量收到的影响最小,通过多通道维纳滤波和NLMS迭代的方法(Normalized Least Mean Square,归一化最小均方自适应滤波算法),计算最优滤波参数G(t,e ),进一步滤掉稳定的背景噪音,计算最优滤波参数G(t,e ),必须使得E{||Y(t,e )-G(t,e )U(t,e )|| 2}最小,进而得到最优滤波参数G(t,e )。
计算出最优滤波参数G(t,e )、噪音参数U(t,e )之后,输出最终目标声音源指向的语音输出:
Y=Y(ω,Ω)-G(t,e )*U(t,e )
为了便于对最终的语音输出的理解,如图6所示,图6示出了本发明实施例的一种目标声音源指向的最终语音输出的示意图,其中,图7中Y(ω,Ω)表示为Y FBF(t,e ),G(t,e )*U(t,e )表示为Y NC(t,e )。
本实施例通过计算目标声音源指向对应的波束成形输出,并根据噪音参数对非目标声音源指向的信号进行降噪,由此,可以进一步确保目标声音源指向的声音不失真,并进一步抑制非目标声音源指向的干扰。
图7是本发明实施例的另一种多波束波束成形方法的流程图。如图7所示,本实施例的多波束波束成形方法包括以下步骤:
步骤S340,计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形。其中,空间滤波参数随声音源的角度和子带频率的不同而不同,至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向。
本实施例所述的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益。在实际应用中,本发明实施例中所述的空间滤波参数为一矩阵,该空间滤波参数经过电脑设备的计算得到,计算得到结果后将空间滤波参数存储于本发明实施例所述的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。在一种可选的实现方式中,本实施例可采用图4中的步骤S1-S3所述的方法来计算空间滤波参数,在此不再赘述。
本实施例的波束成形的声音角度来源为至少两个声音源指向,构成多波束波束成形,在实际应用中,所述声音源指向为平面波0°-180°的任意角度,需要说明的是,本发明实施例所述的至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向,为了便于说明,后续实施例会以波束指向:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)为例进行说明,其中,目标声音源为指向90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为上述角度,还可以指向53°、80°,目标声音源还可以为60°等等,具体不做限定。
分别计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积,得到各个单波束成形,该结果也为一个矩阵,其表现形式为频谱。在计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积时,需要通过麦克风阵列确定各个声音源指向,具体包括:所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型等间距均匀分布(如图2所示)、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图2中的麦克风阵列样式和数量为例进行说明,但是应当明确的是,该种说明方式并 不是对麦克风阵列的具体限定。
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。
步骤S320,计算目标声音源指向的增强语音。
本实施例以图2中的麦克风阵列2为例,在获取到7个方向的声音,将7段声音经过傅里叶变换后,得到7个4*512的矩阵,其中,4代表麦克风的数量,512代表将不同方向对应的频谱分别分解为512个子带。本步骤的目的在于从子带的角度进行滤波处理,确定目标声音源对应的所有子带,在每个子带上的占比。
假设目标声音源指向为90°,目标声音源对应的频谱对应α1:4*512个子带,0°声音源指向对应的频谱对应α2:4*512个子带,30°声音源指向对应的频谱对应α3:4*512个子带,60°声音源指向对应的频谱对应α4:4*512个子带,120°声音源指向对应的频谱对应α5:4*512个子带,150°声音源指向对应的频谱对应α6:4*512个子带,180°声音源指向对应的频谱对应α7:4*512个子带。在一种实现方式中,计算目标声音源指向对应的比值增益为:α1/(α1+α2+α3+α4+α5+α6+α7);在另一种实现方式中,计算目标声音源指向对应的比值增益为:α1/(α2+α3+α4+α5+α6+α7)。得到目标声音源对应的比值增益后,根据比值增益与步骤S310计算的多波束波束成形输出(也即空间滤波参数与至少两个声音源指向对应的原始频域信号的乘积)获得目标声音源指向的增强语音。可选的,计算第一乘积与目标声音源对应的比值增益的乘积。其中,第一乘积为目标声音源指向对应的原始频域信号与空间滤波参数之间的乘积。
步骤S330,根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。
在一种可选的实现方式中,将当前帧频谱分解的多个子带进行合并,并获取合并后的子带的能量。其中,当前帧中包括目标声音源和非目标声音源。在具体实施过程中,先将目标声音源对应的512个子带进行合并,并确定合并后的子带能量。其次,依次将其他6个声音源指向(或7个声音源指向,包含目标声音源)的512个子带进行合并,分别确定每个合并后的声音源指向的子带能量,最后,计算6个声音源指向(或7个声音源指向,包含目标声音源)的所有子带的能量和,该能量和为一矩阵。
根据目标声音源对应的子带能量与6个声音源指向(或7个声音源指向,包含目 标声音源)的所有子带的能量和,计算能量比值。
步骤S340,计算目标声音源指向的原始频域信号、目标声音源指向对应的增强语音以及能量比值的乘积,以对非目标声音源指向降噪,并输出该乘积对应的语音。
获取目标声音源指向对应的原始频域信号,并计算原始频域信号与步骤S320得到的目标声音源指向对应的增强语音、步骤S330计算得到的能量比值之间的乘积,根据该乘积得到的波束成形能够确保目标声音源指向的声音不失真,同时,能够抑制非目标声音源方向产生的噪音。
本发明实施例提供的多波束波束成形的方法,通过计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,并通过计算目标声音源指向的增强语音、能量比值和目标声音源指向的所述原始频域信号的乘积,以输出该乘积对应的语音,从而实现对非目标声音源的降噪处理,确保目标声音源指向的声音不失真。
图8是本发明实施例的又一种多波束波束成形方法的流程图。作为对上述实施例的细化和扩展,本发明实施例还提供另一种多波束波束成形的方法,如图8所示,本实施例的多波束波束成形方法包括以下步骤:
步骤S410,计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形。其中,空间滤波参数随声音源的角度和子带频率的不同而不同,至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向。
在计算空间滤波参数W f(ω)与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形时,可以采用但不局限于以下方法:
获取空间滤波参数W f(ω),并确定所述空间滤波参数W f(ω)分别对应的各声音源指向,并分别获取各声音源指向对应的原始频域信号;计算所述空间滤波参数W f(ω)分别与不同声音源指向对应的原始频域信号的乘积。
在本实施例中,确定空间滤波参数W f(ω)对应的至少两个声音源指向在通过延迟时间进行波束聚焦的方向定位时,即确定空间滤波参数W f(ω)对应的目标声音源指向,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三 个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。通常第一个麦克风Mic1为参考麦克风,所以延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:
Figure PCTCN2019087621-appb-000010
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。
进一步地,空间滤波参数W f(ω)和原始频域信号Z(t,e )均为矩阵,将两个矩阵相乘:得到Y(ω,Ω)=W f(ω)Z(t,e ),乘积Y(ω,Ω)会对除目标声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真,并且,对其他方向的声音信号产生抑制。
在本实施例中,假设有7个声音源指向(包含一个90°的目标声音源指向)、4个麦克风(如图2所示的麦克风阵列3)采集声音,通过上述方法计算波束指向分别为:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)的单波束成形。得到7个4*512的矩阵,4代表麦克风的数量,512代表将不同方向对应的频谱分别分解为512个子带。
步骤S420,计算目标声音源指向的增强语音。
在实际应用中采用以下方式计算目标声音源指向的增强语音,包括:
以每个子带为单位,计算目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;计算第一乘积B(ω,Ω)与比值增益的乘积,得到增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波之间的乘积。
在计算所有声音源指向的能量和时,其实质为将4个麦克风进行合并,即合并后得到7个1*512的矩阵,得到所有声音源指向的能量和记作Spectrum power of other directions,继续获取目标声音源指向的能量,记作:Spectrum power of target directions,计算目标声音源指向的能量Spectrum power of target directions与所有声音源指向的能量和Spectrum power of other directions的比值,得到比值增益Gain-mask。
继续计算第一乘积B(ω,Ω)与比值增益Gain-mask的乘积,得到增强语音Gain-mask-frame=B(ω,Ω)*Gain-mask。
步骤S430,根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。
具体包括:将当前帧中所有子带对应的能量进行合并,并计算当前帧所有子带的能量和;计算所述目标声音源对应的子带能量与非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。或者,计算所述目标声音源对应的子带能量与当前帧中所有子带的能量和之间的比值,得到能量比值。
当前帧中包含7个声音源方向的所有子带,将当前帧中所有子带对应的能量进行合并,首先,将每个声音源指向的所有子带进行合并,得到不同方向对应的频谱,得到7*1的矩阵,其中,7为7个声音源方向,1为合并后的子带(频谱)。其次,将不同方向对应的所有子带进行合并,得到1*1的矩阵,即根据该矩阵获取所有子带的能量和,记作Energy of each bin in all directions。第三,获取目标声音源对应的子带能量,记作:Energy of each bin in target directions,最后,计算所述目标声音源对应的子带能量与非目标声音源指向的所有子带的能量和(当前帧所有声音源指向对应的所有子带的能量和)之间的比值,得到能量比值,记作:Gain-mask-frame-bin。
步骤S240,通过平滑参数对当前帧与前一帧进行逐帧平滑处理。
本发明实施例中,进行平滑处理的目的在于,使连续两帧之前的语音能够平滑过渡。因此,在通过平滑参数对当前帧与前一帧进行逐帧平滑处理时,可以采用但不局限于以下方式实现:
设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值。优选地,第二预设值为1。计算前一帧的比值增益与前一帧对应的平滑参数的乘积以获取第二乘积,计算上述比值增益与当前帧对应的平滑参数的乘积以获取第三乘积。根据第二乘积与第三乘积之和对当前帧中的声音源进行逐帧平滑处理。
在一种可选的实现方式中,平滑参数γ为一经验值,可设置当前帧的平滑参数γ为0.8,那么前一帧的平滑参数为(1-γ)=0.2,具体的,本发明实施例对此不做限定。由此,可以获取当前帧的比值增益以对当前帧中的声音源进行逐帧平滑处理。假设前一帧的比值增益为前一帧的比值增益为Previous Gain。则当前帧的比值增益Current Gain=Previous Gain*(1-γ)+γ*Gain-mask=Previous Gain*(1-γ)+γ*Spectrum power of target directions/Spectrum power of other directions。
步骤S450,计算所述目标声音源指向对应的增强语音、能量比值与目标声音源指向的所述原始频域信号的乘积,并根据上述平滑处理结果输出所述乘积对应的语音。
本实施例通过计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,并通过计算目标声音源指向的增强语音、能量比值和目标声音源指向的所述原始频域信号的乘积,同时通过平滑参数对当前帧与前一帧进行逐帧平滑,根据平滑处理结果输出该乘积对应的语音,进一步对非目标声音源的降噪处理,并进一步确保目标声音源指向的声音不失真。
进一步的,作为对上述图1所示方法的实现,本发明另一实施例还提供了一种语音处理装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。
进一步的,作为对上述图1所示方法的实现,本发明另一实施例还提供了一种波束成形装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。
图9是本发明实施例的一种波束成形装置的示意图。图10是本发明实施例的另一种波束成形装置的示意图。如图9所示,本实施例的波束成形装置9包括第一获取单元91、确定单元92、第二获取单元93和第一计算单元94。
其中,第一获取单元91用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同。确定单元92用于确定所述第一获取单元91获取的所述空间滤 波参数对应的声音源指向。第二获取单元93用于获取所述确定单元92确定的所述声音源指向对应的原始频域信号。第一计算单元94用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号进行抑制。
进一步的,如图10所示,波束成形装置9还包括:
第二计算单元95用于在第一获取单元93获取空间滤波参数之前,计算所述空间滤波参数。
进一步的,如图10所示,所述第二计算单元95包括:
第一计算模块951,用于计算声音源到达麦克风阵列的延迟时间。构建模块952,用于构建信号矢量函数。第二计算模块953,用于根据所述构建模块952构建的所述信号矢量函数及所述第一计算模块951计算的所述延迟时间计算声音源指向。第一设定模块954,用于设定第一限制条件,所述第一限制条件为白噪音增益限制。第二设定模块955,用于设定第二限制条件,所述第二限制条件为所述空间滤波参数与所述信号矢量函数的乘积为1。构造模块956,用于根据所述空间滤波参数及所述信号矢量函数构造损失函数。第三计算模块957,用于根据所述第一设定模块954设定的所述第一限制条件及所述第二设定模块设955定的所述第二限制条件,计算所述损失函数趋向最小值时的空间滤波参数。
进一步的,如图10所示,所述第一计算模块951包括:
第一确定子模块951a,用于确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度。第二确定子模块951b,用于确定所述声音源指向的角度。计算子模块951c,用于根据所述麦克风之间的间距、速度及角度计算延迟时间。
进一步的,如图12所示,所述第二计算模块953包括:
确定子模块953a,用于确定所有子带频率对应的矩阵。计算子模块953b,用于根据所述确定子模块确定的所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。
进一步的,所述空间滤波参数为一矩阵。
进一步的,所述声音源指向为平面波0°-180°的任意角度。
由于本实施例所介绍的波束成形装置为可以执行本发明实施例中的波束成形方法的装置,故而基于本发明实施例中所介绍的波束成形方法,本领域所属技术人员能够了解本实施例的波束成形装置的具体实施方式以及其各种变化形式,所以在此对于 该波束成形装置如何实现本发明实施例中的波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中波束成形方法所采用的装置,都属于本申请所欲保护的范围。
进一步的,作为对上述图5所示方法的实现,本发明另一实施例还提供了一种多波束波束成形的装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。
图11是本发明实施例的一种多波束波束成形装置的示意图。图12是本发明实施例的另一种多波束波束成形装置的示意图。如图11所示,本实施例的多波束波束成形装置11包括第一计算单元111、第二计算单元112和降噪单元113。
其中,第一计算单元111用于计算目标声音源指向对应的波束成形输出。第二计算单元112用于通过阻塞矩阵计算噪音参数。降噪单元113用于根据所述第二计算单元112计算的所述噪音参数对第一计算单元111计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
进一步的,如图12所示,所述第一计算单元111包括:
第一获取模块1111,用于获取空间滤波参数。
确定模块1112,用于确定所述第一获取模块1111获取的所述空间滤波参数对应的目标声音源指向。
第二获取模块1113,用于获取所述第一获取模块1111获取的目标声音源指向对应的原始频域信号。
计算模块1114,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形。
进一步的,如图12所示,第二计算单元112包括:
第一计算模块1121,用于计算声音信号依次达到麦克风的频率响应。
构建模块1122,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵。
第二计算模块1123,用于根据所述构建模块构建的所述阻塞矩阵及所述其他声音源指向对应的原始频域信号,计算所述噪音参数。
进一步的,如图12所示,所述降噪单元113包括:
计算模块1131,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数。
降噪模块1132,用于根据所述目标声音源的波束成形输出、最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪。
由于本实施例所介绍的多波束波束成形装置为可以执行本发明实施例中的多波束波束成形方法的装置,故而基于本发明实施例中所介绍的多波束波束成形方法,本领域所属技术人员能够了解本实施例的多波束波束成形装置的具体实施方式以及其各种变化形式,所以在此对于该多波束波束成形装置如何实现本发明实施例中的多波束波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中多波束波束成形方法所采用的装置,都属于本申请所欲保护的范围。
进一步的,作为对上述图7所示方法的实现,本发明另一实施例还提供了一种多波束波束成形的装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。
图13是本发明实施例的又一种多波束波束成形装置的示意图。图14是本发明实施例的又一种多波束波束成形装置的示意图。如图13所示,本实施例中的多波束波束成形装置13包括第一计算单元131、第二计算单元132、第三计算单元133和第四计算单元134。
其中,第一计算单元131,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述各声音源指向包含一个目标声音源及至少一个非目标声音源指向。
第二计算单元132,用于分别计算目标声音源指向的增强语音。
第三计算单元133,用于根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。
第四计算单元134,用于计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积,输出所述乘积对应的语音。
进一步的,如图14所示,多波束波束成形装置13还包括:
处理单元135,用于在所述第四计算单元134计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积之前,通过平滑参数对当前帧与前一帧进行逐帧平滑处理。
进一步的,如图14所示,所述第一计算单元131包括:
第一获取模块1311,用于获取空间滤波参数。
确定模块1312,用于确定所述第一获取模块1311获取的所述空间滤波参数分别对应的至少两个声音源指向。
第二获取模块1313,用于分别获取所述确定模块确定的至少两个声音源指向对应的原始频域信号。
计算模块1314,用于计算所述空间滤波参数分别与不同声音源指向对应的原始频域信号的乘积。
进一步的,如图14所示,所述第二计算单元132包括:
第一计算模块1321,用于以每个子带为单位,计算目标声音源指向的能量与所有声音源指向的能量和之间的比值增益。
第二计算模块1322,用于计算第一乘积与比值增益的乘积,得到增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波之间的乘积。
进一步的,如图14所示,所述第三计算单元133包括:
合并模块1331,用于将当前帧中所有子带对应的能量进行合并。
第一计算模块1332,用于计算当前帧所有子带的能量和。
第二计算模块1333,用于计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。
进一步的,如图14所示,所述处理单元135包括:
设置模块1351,用于设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为1。
计算模块1352,用于计算前一帧的比值增益与对应的平滑参数的乘积以获取第二乘积,计算当前帧的平滑参数与所述比值增益的乘积以获取第三乘积。
处理模块1353,用于根据所述第一乘积与第二乘积之和对当前帧进行逐帧平滑处理。
进一步的,所述第四计算单元134,还用于计算所述目标声音源指向对应的增强语音、能量比值与目标声音源指向的所述原始频域信号的乘积,并根据平滑处理结果输出所述乘积对应的语音。
本发明实施例提供的多波束波束成形的装置,计算空间滤波参数与至少两个声音 源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向;计算目标声音源指向的增强语音;根据目标声音源对应的子带能量与至少一个其他声音源指向的所有子带的能量和,计算能量比值;计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积,并输出所述乘积对应的语音,与现有技术相比,本发明实施例能够确保目标声音源指向的声音不失真,并且能够有效抑制其他声音方向的干扰。
由于本实施例所介绍的多波束波束成形装置为可以执行本发明实施例中的多波束波束成形方法的装置,故而基于本发明实施例中所介绍的多波束波束成形方法,本领域所属技术人员能够了解本实施例的多波束波束成形装置的具体实施方式以及其各种变化形式,所以在此对于该多波束波束成形装置如何实现本发明实施例中的多波束波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中多波束波束成形方法所采用的装置,都属于本申请所欲保护的范围。
上述各装置均包括处理器和存储器,装置中的各个单元均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来实现上述方法时,确保目标空间指向的声音不失真,并对其他空间指向的声音进行有效抑制。存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM),存储器包括至少一个存储芯片。
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现上述语音处理方法。
本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述语音处理方法。
图15是本发明实施例的一种电子设备的结构框图。如图15所示,电子设备17包括:
至少一个处理器151;
以及与所述处理器151连接的至少一个存储器152、总线153;其中,
所述处理器151、存储器152通过所述总线153完成相互间的通信;
所述处理器151用于调用所述存储器152中的程序指令,以执行上述方法的任一 实施例。
本文中的电子设备可以是服务器、PC、PAD、手机、智能电视等一切包含麦克风的智能设备。
本发明实施例提供的电子设备,通过计算空间滤波参数与目标声音源指向对应的原始频域信号的乘积获取所述目标声音源指向的波束成形输出,并通过对非目标声音源指向进行降噪处理提高所述目标声音源指向的波束成形输出的信噪比。由此,可以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。
本发明实施例还提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述任一种语音处理方法。
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时实现上述任一种语音处理方法的功能。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算 机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitorymedia),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、***或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (26)

  1. 一种波束成形的方法,其特征在于,包括:
    获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;
    计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。
  2. 根据权利要求1所述的方法,其特征在于,在获取空间滤波参数之前,所述方法还包括:
    计算所述空间滤波参数。
  3. 根据权利要求2所述的方法,其特征在于,所述计算空间滤波参数包括:
    计算声音源到达麦克风阵列的延迟时间;
    根据所述延迟时间构建信号矢量函数,并根据所述信号矢量函数及所述延迟时间计算声音源指向;
    根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数,所述损失函数根据所述空间滤波参数和所述信号矢量函数构造;
    其中,所述第一限制条件具体为白噪音增益限制;所述第二限制条件具体为使得所述空间滤波参数与所述信号矢量函数的乘积为第一预设值。
  4. 根据权利要求3所述的方法,其特征在于,计算声音源到达麦克风阵列的延迟时间包括:
    确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度;
    确定所述声音源指向的角度;
    根据所述麦克风之间的间距、所述声音源传播声音的速度及所述声音源指向的角度计算延迟时间。
  5. 根据权利要求3所述的方法,其特征在于,根据所述信号矢量函数及所述延迟时间计算声音源指向包括:
    确定所有子带频率对应的矩阵;
    根据所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述空间滤波参数为 一矩阵。
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述声音源指向为平面波0°-180°的任意角度。
  8. 一种波束成形的装置,其特征在于,包括:
    第一获取单元,用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;
    确定单元,用于确定所述第一获取单元获取的所述空间滤波参数对应的声音源指向;
    第二获取单元,用于获取所述确定单元确定的所述声音源指向对应的原始频域信号;
    第一计算单元,用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。
  9. 一种多波束波束成形的方法,其特征在于,包括:
    计算目标声音源指向对应的波束成形输出;
    根据阻塞矩阵计算噪音参数;
    根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
  10. 根据权利要求9所述的方法,其特征在于,计算目标声音源指向对应的波束成形输出包括:
    获取空间滤波参数,确定所述空间滤波参数对应的目标声音源指向;
    获取所述目标声音源指向对应的原始频域信号;
    计算所述空间滤波参数与所述目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。
  11. 根据权利要求10所述的方法,其特征在于,根据阻塞矩阵计算噪音参数包括:
    计算声音信号依次达到麦克风的频率响应;
    根据所述频率响应构建所述阻塞矩阵;
    根据所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。
  12. 根据权利要求11所述的方法,其特征在于,根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪包括:
    通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;
    根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
  13. 一种多波束波束成形的装置,其特征在于,包括:
    第一计算单元,用于计算目标声音源指向对应的波束成形输出;
    第二计算单元,用于通过阻塞矩阵计算噪音参数;
    降噪单元,用于根据所述第二计算单元计算的所述噪音参数对所述第一计算单元计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
  14. 根据权利要求13所述的装置,其特征在于,所述第一计算单元包括:
    第一获取模块,用于获取空间滤波参数;
    确定模块,用于确定所述第一获取模块获取的所述空间滤波参数对应的目标声音源指向;
    第二获取模块,用于获取所述第一获取模块获取的目标声音源指向对应的原始频域信号;
    计算模块,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。
  15. 根据权利要求14所述的装置,其特征在于,第二计算单元包括:
    第一计算模块,用于计算声音信号依次达到麦克风的频率响应;
    构建模块,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵;
    第二计算模块,用于根据所述构建模块构建的所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。
  16. 根据权利要求14所述的装置,其特征在于,所述降噪单元包括:
    计算模块,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;
    降噪模块,用于根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。
  17. 一种多波束波束成形的方法,其特征在于,包括:
    计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向;
    计算所述目标声音源指向的增强语音;
    根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;
    计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。
  18. 根据权利要求17所述的方法,其特征在于,在计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积之前,所述方法还包括:
    通过平滑参数对当前帧与前一帧进行逐帧平滑处理。
  19. 根据权利要求18所述的方法,其特征在于,所述计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形包括:
    获取空间滤波参数,并确定所述空间滤波参数分别对应的至少两个声音源指向;
    获取至少两个声音源指向分别对应的原始频域信号;
    计算所述空间滤波参数分别与至少两个声音源指向对应的原始频域信号的乘积。
  20. 根据权利要求19所述的方法,其特征在于,所述计算目标声音源指向的增强语音包括:
    以每个子带为单位,计算所述目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;
    计算第一乘积与所述比值增益的乘积,以获取所述增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波参数之间的乘积。
  21. 根据权利要求20所述的方法,其特征在于,根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值包括:
    将当前帧中所有子带对应的能量进行合并,计算当前帧所有子带的能量和;
    计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。
  22. 根据权利要求21所述的方法,其特征在于,通过平滑参数对当前帧与前一 帧进行逐帧平滑处理包括:
    设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值;
    计算前一帧的比值增益与前一帧的平滑参数以获取第二乘积;
    计算当前帧的比值增益与当前帧的平滑参数的乘积以获取第三乘积;
    根据所述第二乘积与第三乘积之和对当前帧进行逐帧平滑处理。
  23. 根据权利要求18-22中任一项所述的方法,其特征在于,计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音包括:
    计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,根据平滑处理结果输出所述乘积对应的语音。
  24. 一种多波束波束成形的装置,其特征在于,包括:
    第一计算单元,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源声音源指向;
    第二计算单元,用于分别计算目标声音源指向的增强语音;
    第三计算单元,用于根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;
    第四计算单元,用于计算所述目标声音源指向的所述原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。
  25. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行以实现如权利要求1-7中任一项所述的方法和/或如权利要求9-12中任一项所述的方法和/或如权利要求17-23中任一项所述的方法。
  26. 一种电子设备,其特征在于,所述电子设备中包括处理器、存储器和总线;所述处理器、所述存储器通过所述总线完成相互间的通信;所述存储器中用于存储程序指令,所述程序指令被所述处理器执行以实现如权利要求1-7中任一项所述的方法和/或如权利要求9-12中任一项所述的方法和/或如权利要求17-23中任一项所述的方法。
PCT/CN2019/087621 2018-05-22 2019-05-20 一种波束成形方法、多波束成形方法、装置及电子设备 WO2019223650A1 (zh)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN201810497069.8 2018-05-22
CN201810497069.8A CN108717495A (zh) 2018-05-22 2018-05-22 多波束波束成形的方法、装置及电子设备
CN201810496450.2 2018-05-22
CN201810496448.5 2018-05-22
CN201810496450.2A CN108831498B (zh) 2018-05-22 2018-05-22 多波束波束成形的方法、装置及电子设备
CN201810496448.5A CN108551625A (zh) 2018-05-22 2018-05-22 波束成形的方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2019223650A1 true WO2019223650A1 (zh) 2019-11-28

Family

ID=68617121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087621 WO2019223650A1 (zh) 2018-05-22 2019-05-20 一种波束成形方法、多波束成形方法、装置及电子设备

Country Status (1)

Country Link
WO (1) WO2019223650A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127736A1 (en) * 2003-06-30 2007-06-07 Markus Christoph Handsfree system for use in a vehicle
CN101369427A (zh) * 2007-08-13 2009-02-18 哈曼贝克自动***股份有限公司 通过组合的波束形成和后滤波的降噪
CN106023996A (zh) * 2016-06-12 2016-10-12 杭州电子科技大学 基于十字形声阵列宽带波束形成的声识别方法
CN108551625A (zh) * 2018-05-22 2018-09-18 出门问问信息科技有限公司 波束成形的方法、装置及电子设备
CN108717495A (zh) * 2018-05-22 2018-10-30 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备
CN108831498A (zh) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127736A1 (en) * 2003-06-30 2007-06-07 Markus Christoph Handsfree system for use in a vehicle
CN101369427A (zh) * 2007-08-13 2009-02-18 哈曼贝克自动***股份有限公司 通过组合的波束形成和后滤波的降噪
CN106023996A (zh) * 2016-06-12 2016-10-12 杭州电子科技大学 基于十字形声阵列宽带波束形成的声识别方法
CN108551625A (zh) * 2018-05-22 2018-09-18 出门问问信息科技有限公司 波束成形的方法、装置及电子设备
CN108717495A (zh) * 2018-05-22 2018-10-30 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备
CN108831498A (zh) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备

Similar Documents

Publication Publication Date Title
CN108831498B (zh) 多波束波束成形的方法、装置及电子设备
CN109102822B (zh) 一种基于固定波束形成的滤波方法及装置
JP7011075B2 (ja) マイク・アレイに基づく対象音声取得方法及び装置
US10080088B1 (en) Sound zone reproduction system
CN109616136B (zh) 一种自适应波束形成方法、装置及***
Salvati et al. Incoherent frequency fusion for broadband steered response power algorithms in noisy environments
US9628905B2 (en) Adaptive beamforming for eigenbeamforming microphone arrays
JP6939786B2 (ja) 音場形成装置および方法、並びにプログラム
CN104699445A (zh) 一种音频信息处理方法及装置
US11651772B2 (en) Narrowband direction of arrival for full band beamformer
Hassani et al. Cooperative integrated noise reduction and node-specific direction-of-arrival estimation in a fully connected wireless acoustic sensor network
KR20160026652A (ko) 사운드 신호 처리 방법 및 장치
CN108717495A (zh) 多波束波束成形的方法、装置及电子设备
CN102421050A (zh) 使用麦克风的非均匀布局来增强音频质量的设备和方法
JP6987075B2 (ja) オーディオ源分離
CN107369460B (zh) 基于声学矢量传感器空间锐化技术的语音增强装置及方法
US11483646B1 (en) Beamforming using filter coefficients corresponding to virtual microphones
CN111681665A (zh) 一种全向降噪方法、设备及存储介质
CN113299307A (zh) 麦克风阵列信号处理方法、***、计算机设备及存储介质
CN108551625A (zh) 波束成形的方法、装置及电子设备
WO2019223650A1 (zh) 一种波束成形方法、多波束成形方法、装置及电子设备
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Zhang et al. Performance comparison of UCA and UCCA based real-time sound source localization systems using circular harmonics SRP method
CN106448693A (zh) 一种语音信号处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19807611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19807611

Country of ref document: EP

Kind code of ref document: A1