CN105719657A

CN105719657A - Human voice extracting method and device based on microphone

Info

Publication number: CN105719657A
Application number: CN201610098307.9A
Authority: CN
Inventors: 肖观送; 黄锦昌
Original assignee: Huizhou Desay SV Automotive Co Ltd
Current assignee: Huizhou Desay SV Automotive Co Ltd
Priority date: 2016-02-23
Filing date: 2016-02-23
Publication date: 2016-06-29

Abstract

The invention discloses a human voice extracting method and device based on a microphone. The human voice extracting device including at least one microphone is provided. A collection system further comprises an audio signal processor used for processing voice signals obtained by the microphones, and a voice identification core. The method specifically comprises the step of: carrying out analog-to-digital conversion on at least one path of obtained voice signals, and obtaining original voice signals; carrying out analysis and statistics on each time frequency point of the voice signals, and extracting initial human voice signals according to preset user voice characteristics obtained by a human voice extracting method; and then extracting human voice signals by means of phase reversal and addition operation. According to the invention, the human voice signals are sampled and quantified and then are compared with acoustic models, obtained by the system, with the user voice characteristics; in this way, the user voice signals are extracted, the extracted human voice signals are purer, and the voices of the user can be extracted to the maximum extent; in addition, the voice characteristics of different persons are different, and according to the characteristic, voices emitted by the people around can be filtered out.

Description

Voice extracting method and device based on single microphone

Technical field

The present invention relates to acoustic processing field, particularly to a kind of voice extracting method based on single microphone and device.

Background technology

At present, noise reduction schemes general in speech recognition is to add independent noise reduction module, this noise reduction module is generally adopted the active noise reduction techniques of dual microphone, namely the noise signal phase place of secondary mike is through being reversely added with the noise signal in main mike again, thus noise signal plays the effect of suppression.But the program needs independent noise reduction module and two mikes, relatively costly.The installation of dual microphone also there is is certain requirement, adds the complexity of installation.And it is difficult to differentiate between out actual user under many people speak environment, cause low discrimination.The acoustical signal that algorithm two mikes of guarantee that module developer needs exploitation complicated when exploitation the enter sequential when processing is consistent.

Summary of the invention

The invention aims to overcome the defect of above-mentioned background technology, it is provided that a kind of voice extracting method based on single microphone and device.

A kind of voice extracting method based on single microphone, including the voice extraction element with at least one mike, described acquisition system also includes the audio signal processor for processing the acoustical signal that described mike obtains and speech recognizer kernel, and described audio signal processor extracts voice concrete steps and includes as follows:

S10, to obtain at least one road acoustical signal do analog digital conversion, it is thus achieved that original sound signal；

S20, each time frequency point of acoustical signal is analyzed statistics, according in advance voice preextraction method obtain the preliminary human voice signal of user voice feature extraction；

S30, described preliminary human voice signal is carried out opposite in phase, and be added with described original sound signal, it is thus achieved that noise signal；

S40, described noise signal is carried out opposite in phase, and be added with described original sound signal, it is thus achieved that final human voice signal；

Described voice and extracting method are carry out speech characteristic parameter extraction method in the environment of low noise.

Further, also include:

S50, final voice is done signal gain process；

S60, the final human voice signal after gain process is sent to speech recognizer kernel.

Wherein, described characteristic parameter extraction method comprises the steps:

S201, acoustical signal is carried out anti-aliasing filter；

S202, to step S201 obtain signal carry out analog digital conversion；

S203, to step S202 obtain signal carry out high-pass filtering；

S204, to step S203 obtain signal do sub-frame processing；

Every frame data that step S204 is obtained by S205, employing hamming code window mouth carry out windowing process；

S206, to step S205 obtain signal do frequency domain conversion；

S207, to step S206 obtain signal carry out quarter window filtering；

S208, to step S207 obtain signal carry out logarithm operation；

S209, to step S208 obtain signal do discrete cosine transform；

S210, to step S209 obtain signal carry out spectrum weighting；

S211, the step S210 signal obtained is done cepstral mean subtracts process；

S212, step S211 obtain signal add characterize non-speech dynamic characteristics differential parameter, it is thus achieved that user voice feature.

Preferably, described voice extraction element adopts a mike.

Additionally, the present invention also provides for a kind of single microphone voice extraction element based on above-mentioned voice extracting method and includes a mike, the audio signal processor that is connected with described mike and for identifying the speech recognizer kernel of voice, described audio signal processor includes the module for the acoustical signal obtained does analog digital conversion, for each time frequency point of acoustical signal is analyzed the module of statistics, for doing the module of voice preextraction method in advance and for acoustical signal being carried out reversely and/or the module being added.

Preferably, described Sound Processor Unit also includes the module of doing gain process for many acoustical signals.

Human voice signal is carried out sample quantization by the present invention, then gets the acoustic model contrast with user voice feature with system, extracts user voice signal, and again extracts human voice signal in the signal filtered noise signal.Owing to have passed through a noise suppressed, the human voice signal extracted is purer, it is possible to extract user voice to greatest extent, and everyone sound characteristic property of there are differences, and can also filter, according to this feature, the sound that people around sends.

Accompanying drawing explanation

Fig. 1 is the method flow diagram of the voice extracting method of the present invention.

Fig. 2 is the flow chart of steps of inventive feature parameter extraction method.

Fig. 3 is the single microphone voice extraction element framework schematic diagram of the present invention.

Detailed description of the invention

It is further described below in conjunction with the accompanying drawing voice extracting method based on single microphone to the present invention and device.

A kind of voice extracting method based on single microphone, including the voice extraction element with a mike, acquisition system also includes the audio signal processor for processing the acoustical signal that mike obtains and speech recognizer kernel, as shown in Figure 1.Audio signal processor extracts voice concrete steps and includes as follows:

S10, to obtain single channel acoustical signal do analog digital conversion, convert original analoging sound signal to digital signal, thus obtaining pending original sound signal；

S20, each time frequency point of acoustical signal is analyzed statistics, in this process, the sound characteristic of each time frequency point Yu user is compared calculating, obtains out the part identical with sound characteristic, finally extract preliminary human voice signal.Wherein the sound characteristic of user is in advance relatively low noise ratio or does not have noise part to adopt speech characteristic parameter extraction method to obtain.

S30, preliminary human voice signal carrying out opposite in phase, and be added with original sound signal, now, the human voice signal in original sound signal is tentatively removed, it is thus achieved that a noise signal.

S40, noise signal carrying out opposite in phase, and be added with original sound signal, now the noise in original sound signal is then filtered, it is thus achieved that final human voice signal.

S50, in order to increase the discrimination of voice, it is possible to optionally final voice is done signal gain process.

After voice signal enters into system by single microphone, can efficiently suppress environment noise, due to only with a mike, whole processing procedure is under same sequential, ensure that the sequential taken when reverse signal is added with primary signal is consistent, also achieve single mike and reach the effect that dual microphone processes.And single microphone has saved cost, install simple.

In preferred embodiment, as in figure 2 it is shown, characteristic parameter extraction method comprises the steps:

S201, acoustical signal is carried out anti-aliasing filter, a frequency overlapped-resistable filter can be adopted to be reduced by aliasing frequency component.

S202, the step S201 signal obtained is carried out analog digital conversion, speech simulation signal is converted to digital signal, convenient process.

S203, to step S202 obtain signal carry out high-pass filtering, namely data are done preemphasis process, high pass filter can be passed through, make the frequency spectrum of signal become smooth, be not easily susceptible to the impact of finite word length effect.

S204, to step S203 obtain signal do sub-frame processing, the short-term stationarity characteristic according to voice, voice can divide in units of frame, facilitates the follow-up process to signal.

Every frame data that step S204 is obtained by S205, employing hamming code window mouth carry out windowing process, its role is to reduce the impact of Gibbs' effect.

S206, to step S205 obtain signal do frequency domain conversion, it is preferred that embodiment can be done fast Fourier transform.

S207, to step S206 obtain signal carry out quarter window filtering, concrete, available quarter window wave filter, the power spectrum of signal is filtered, the scope that each quarter window wave filter covers is similar to a critical bandwidth of human ear, simulates the masking effect of human ear with this.

S208, to step S207 obtain signal carry out logarithm operation, it is possible to obtain being similar to the result of isomorphic transformation.

S209, to step S208 obtain signal do discrete cosine transform, remove the dependency between each dimensional signal, signal be mapped to lower dimensional space.

S210, to step S209 obtain signal carry out spectrum weighting, owing to the low order parameter of cepstrum is subject to the impact of speaker's characteristic, the characteristic of channel etc., and the resolution capability of high order parameters is relatively low, thus need carry out spectrum weighting, it is suppressed that its low order and high order parameters.

S211, the step S210 signal obtained being done cepstral mean and subtracts process, this process can reduce the impact on characteristic parameter of the phonetic entry channel effectively.

S212, add, in the step S211 signal obtained, the differential parameter characterizing non-speech dynamic characteristics, it is possible to increase the recognition performance of system, final obtain user voice feature.

Additionally, the present invention also provides for a kind of single microphone voice extraction element based on above-mentioned voice extracting method and includes a mike, the audio signal processor being connected with mike and the speech recognizer kernel being used for identifying voice, as shown in Figure 3, wherein in audio signal processor, just like lower module parts:

Convert analog signals into the analog-digital converter of digital signal.Digital signal enters sound characteristic extraction module, is extracted the preliminary human voice signal of user, then enters the first phase inverter, it is thus achieved that the inversion signal of preliminary human voice signal.Now this inversion signal is added by first adder with original sound signal, it is thus achieved that noise signal.Noise signal is admitted in the second phase inverter, it is thus achieved that the inversion signal of noise signal.Finally by second adder, the inversion signal fish original sound signal of noise signal is added, extracts final human voice signal.In order to facilitate speech recognizer kernel identification voice, it is preferable to carry out signal gain at the amplifier that advanced that human voice signal is sent to speech recognizer kernel.

Versatility of the present invention is high, and after developer designs noise reduction framework, system can actively complete noise suppressed and voice extracts.In reality is tested, we acquire the noise signal processed without this system, with the noise signal processed through system, find after comparison, in the bigger situation of real vehicle environment noise, system still can the acoustical signal of average more than the 100db of output signal-to-noise ratio, discrimination is risen to 90% by initial 30%, complies fully with the requirement that vehicle-mounted voice identification controls.

Above in conjunction with accompanying drawing, embodiments of the present invention are explained in detail, but the present invention is not limited to above-mentioned embodiment, in the ken that those of ordinary skill in the art possess, it is also possible under the premise without departing from present inventive concept, make various change.

Claims

1. the voice extracting method based on single microphone, it is characterized in that: include having the voice extraction element of at least one mike, described acquisition system also includes the audio signal processor for processing the acoustical signal that described mike obtains and speech recognizer kernel, and described audio signal processor extracts voice concrete steps and includes as follows:

2. voice extracting method as claimed in claim 1, it is characterised in that also include:

S50, final voice is done signal gain process；

3. voice extracting method as claimed in claim 1, it is characterised in that described characteristic parameter extraction method comprises the steps:

S201, acoustical signal is carried out anti-aliasing filter；

S202, to step S201 obtain signal carry out analog digital conversion；

S203, to step S202 obtain signal carry out high-pass filtering；

S204, to step S203 obtain signal do sub-frame processing；

S206, to step S205 obtain signal do frequency domain conversion；

S207, to step S206 obtain signal carry out quarter window filtering；

S208, to step S207 obtain signal carry out logarithm operation；

S209, to step S208 obtain signal do discrete cosine transform；

S210, to step S209 obtain signal carry out spectrum weighting；

S211, the step S210 signal obtained is done cepstral mean subtracts process；

4. the voice extracting method as according to any one of claim 1 ~ 3, it is characterised in that described voice extraction element adopts a mike.

5. the single microphone voice extraction element based on the voice extracting method described in claim 1, including a mike, the audio signal processor that is connected with described mike and for identifying the speech recognizer kernel of voice, it is characterized in that, described audio signal processor includes the module for the acoustical signal obtained does analog digital conversion, for each time frequency point of acoustical signal is analyzed the module of statistics, for doing the module of voice preextraction method in advance and the module for acoustical signal being carried out reversely and/or be added.

6. single microphone voice extraction element as claimed in claim 5, it is characterised in that described Sound Processor Unit also includes the module doing gain process for many acoustical signals.