CN105989834A - Voice recognition apparatus and voice recognition method - Google Patents
Voice recognition apparatus and voice recognition method Download PDFInfo
- Publication number
- CN105989834A CN105989834A CN201510059977.5A CN201510059977A CN105989834A CN 105989834 A CN105989834 A CN 105989834A CN 201510059977 A CN201510059977 A CN 201510059977A CN 105989834 A CN105989834 A CN 105989834A
- Authority
- CN
- China
- Prior art keywords
- energy
- signal
- raw tone
- sampled signal
- consonant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention provides a voice recognition apparatus and a voice recognition method. The voice recognition apparatus and the voice recognition method can determine whether an original voice sampling signal corresponding to a target voice frame is noise according to at least one of a ratio between the first consonant frequency range signal energy and the second consonant frequency range signal energy, a ratio between the first consonant frequency range signal energy and the original voice sampling signal energy, and a ratio between the second consonant frequency range signal energy and the original voice sampling signal energy. The voice recognition apparatus and the voice recognition method can effectively recognize whether a voice signal is a consonant signal.
Description
Technical field
The invention relates to a kind of device for identifying, and in particular to a kind of voice identification apparatus and language
Sound discrimination method.
Background technology
For Hearing Impaired, it often cannot clearly receive the voice signal of higher-frequency, example
Such as consonant signal, but the voice signal for low frequency can clearly be heard.Existing consonant signal is sentenced
Disconnected mode is for carrying out signal processing in a frequency domain, it is judged that mode mainly has two kinds, and non-instant consonant signal is sentenced
Disconnected and instant consonant judges.Non-instant consonant signal judges, is mainly judged by energy and zero-crossing rate.
Whether instant consonant signal judges, mainly fix more than one according to the ratio of high-frequency signal with gross energy
Value and the ratio of low frequency signal and gross energy whether determine that whether voice signal is less than fixing value
Consonant signal.Though existing consonant signal judgment mode can distinguish consonant signal and noise, but its accuracy
Still cannot meet the demand of reality.
Summary of the invention
The present invention provides a kind of voice identification apparatus and speech identifying method, can effectively pick out voice letter
Number whether it is consonant signal.
The voice identification apparatus of the present invention, including bandpass filtering unit and processing unit.The wherein logical filter of band
Ripple unit carries out the first consonant frequency range and the bandpass filtering of the second consonant frequency range to voice signal, with respectively
Produce the first bandpass filtered signal and the second bandpass filtered signal.Processing unit coupled belt pass filtering unit,
Voice signal, the first bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frame,
The most each speech frame includes N number of sampled signal, and N is positive integer, and processing unit also calculates target voice frame
The energy of middle sampled signal, to obtain raw tone sampled signal energy, the first consonant frequency band signals energy
And the second consonant frequency band signals energy, believe with the second consonant frequency range according to the first consonant frequency band signals energy
The ratio of number ratio of energy, the first consonant frequency band signals energy and raw tone sampled signal energy and
The ratio in judgement corresponding target voice frame of the second consonant frequency band signals energy and raw tone sampled signal energy
Raw tone sampled signal whether be noise.
In one embodiment of this invention, above-mentioned processing unit judges the first consonant frequency band signals energy and
The ratio of two consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy
Whether the ratio of amount and the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy divide
Do not fall within the default ratio range of correspondence, if the first consonant frequency band signals energy and the second consonant frequency band signals
The ratio of the ratio of energy, the first consonant frequency band signals energy and raw tone sampled signal energy and the
Two consonant frequency band signals energy fall within corresponding default ratio respectively with the ratio of raw tone sampled signal energy
Value scope, then the raw tone sampled signal of target voice frame is noise signal.
In one embodiment of this invention, above-mentioned processing unit also calculate multiple before be judged as noise letter
Number the energy weighted mean of speech frame of raw tone sampled signal, add obtaining noise signal energy
Weight average value, and whether make an uproar more than this according to the raw tone sampled signal energy corresponding to target voice frame
Acoustical signal energy weighted mean judges that whether the raw tone sampled signal corresponding to target voice frame is
Consonant signal.
In one embodiment of this invention, above-mentioned correspondence each be judged as noise signal raw tone sampling
The weighted value of the speech frame of signal is judged as the raw tone sampled signal of noise signal with corresponding each
Interval length between speech frame and target voice frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also according to the second consonant frequency band signals energy with
The ratio of raw tone sampled signal energy and the first consonant frequency band signals energy are believed with raw tone sampling
Whether the ratio sum of number energy is more than or equal to presetting and value judges the original language corresponding to target voice frame
Whether sound sampled signal is consonant signal.
In one embodiment of this invention, above-mentioned processing unit also calculate multiple before be judged as noise letter
Number first consonant frequency band signals energy and the raw tone corresponding to speech frame of raw tone sampled signal
The weighted mean of the ratio of sampled signal energy, to obtain the first consonant energy proportion weighted mean,
And according to the first consonant frequency band signals energy corresponding to target voice frame and raw tone sampled signal energy
Ratio whether former less than what the first consonant energy proportion weighted mean judged corresponding to target voice frame
Whether beginning phonetic sampling signal is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes
The first consonant frequency band signals energy corresponding to the speech frame of sample signal and raw tone sampled signal energy
The weighted value of ratio with corresponding each be judged as noise signal raw tone sampled signal speech frame with
Interval length between target voice frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also according to the second consonant frequency band signals energy with
More than or equal to default ratio, whether the ratio of raw tone sampled signal energy judges that target voice frame institute is right
Whether the raw tone sampled signal answered is consonant signal.
In one embodiment of this invention, above-mentioned processing unit according to raw tone sampled signal energy is also
No judge whether the raw tone sampled signal corresponding to target voice frame is consonant more than or equal to lower limit
Signal.
In one embodiment of this invention, above-mentioned processing unit also calculates the first of raw tone sampled signal
Zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate, and before calculating target voice frame and target voice frame
The average zero-crossing rate of raw tone sampled signal of multiple speech frames, with obtain the first average zero-crossing rate,
Second average zero-crossing rate and the 3rd average zero-crossing rate, and according to the first average zero-crossing rate, the second average mistake
Whether zero rate and the 3rd average zero-crossing rate are respectively greater than and judge equal to the default average zero-crossing rate of its correspondence
Whether the raw tone sampled signal corresponding to target voice frame is consonant signal, wherein the first zero-crossing rate,
Second zero-crossing rate and the 3rd zero-crossing rate are respectively in target voice frame original phonetic sampling signal by the
One preset value, the second preset value and the number of times of the 3rd preset value, the second preset value is less than the first preset value
And more than the 3rd preset value.
In one embodiment of this invention, whether above-mentioned processing unit is also more than or equal to according to the second zero-crossing rate
Default zero-crossing rate judges whether the raw tone sampled signal corresponding to target voice frame is consonant signal.
The speech identifying method of the present invention comprises the following steps.Voice signal is carried out the first consonant frequency range with
And second bandpass filtering of consonant frequency range, to produce the first bandpass filtered signal and the second bandpass filtering respectively
Signal.Voice signal, the first bandpass filtered signal are divided into multiple speech frame with the second bandpass filtered signal,
The most each speech frame includes N number of sampled signal, and N is positive integer.Calculate sampled signal in target voice frame
Energy, auxiliary to obtain raw tone sampled signal energy, the first consonant frequency band signals energy and second
Audio band signals energy.According to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy
The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency
The raw tone of segment signal energy and the ratio in judgement corresponding target voice frame of raw tone sampled signal energy
Whether sampled signal is noise.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Judge first
Consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy
Sample with raw tone with ratio and the second consonant frequency band signals energy of raw tone sampled signal energy
The ratio of signal energy falls within the default ratio range of correspondence the most respectively.If the first consonant frequency band signals energy
Amount samples with raw tone with the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy
The ratio of the ratio of signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy
Fall within the default ratio range of correspondence respectively, then the raw tone sampled signal of target voice frame is noise letter
Number.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate multiple
It is judged as the energy weighted mean of the speech frame of the raw tone sampled signal of noise signal before, with
Obtain noise signal energy weighted mean.According to the raw tone sampled signal corresponding to target voice frame
Whether energy judges the raw tone corresponding to target voice frame more than noise signal energy weighted mean
Whether sampled signal is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes
The weighted value of the speech frame of sample signal is judged as the raw tone sampled signal of noise signal with corresponding each
Speech frame and target voice frame between interval length different and change.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second consonant frequency range
The ratio of signal energy and raw tone sampled signal energy and the first consonant frequency band signals energy are with original
More than or equal to default and value, whether the ratio sum of phonetic sampling signal energy judges that target voice frame institute is right
Whether the raw tone sampled signal answered is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate multiple
It is judged as the first consonant frequency range corresponding to speech frame of the raw tone sampled signal of noise signal before
The weighted mean of the ratio of signal energy and raw tone sampled signal energy, to obtain the first consonant energy
Amount proportion weighted meansigma methods.According to the first consonant frequency band signals energy corresponding to target voice frame with original
Whether the ratio of phonetic sampling signal energy judges target less than the first consonant energy proportion weighted mean
Whether the raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes
Adding of the ratio of the first consonant frequency band signals energy corresponding to sample signal and raw tone sampled signal energy
Weights are with corresponding each speech frame of raw tone sampled signal being judged as noise signal and target voice
Interval length between frame is different and changes.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second consonant frequency range
Whether signal energy judges target more than or equal to default ratio with the ratio of raw tone sampled signal energy
Whether the raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also includes, samples according to raw tone
Whether signal energy judges the raw tone sampled signal corresponding to target voice frame more than or equal to lower limit
Whether it is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate original
First zero-crossing rate of phonetic sampling signal, the second zero-crossing rate and the 3rd zero-crossing rate, and calculate target voice
The average zero-crossing rate of the raw tone sampled signal of the most N number of speech frame before frame and target voice frame, with
To the first average zero-crossing rate, the second average zero-crossing rate and the 3rd average zero-crossing rate, wherein N is positive integer,
Wherein the first zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate are respectively original language in target voice frame
Sound sampled signal is by the first preset value, the second preset value and the number of times of the 3rd preset value, and second presets
Value is less than the first preset value and more than the 3rd preset value.According to the first average zero-crossing rate, the second average zero passage
Whether rate and the 3rd average zero-crossing rate are respectively greater than and judge mesh equal to the default average zero-crossing rate of its correspondence
Whether mark raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second zero-crossing rate is
No judge that whether the raw tone sampled signal corresponding to target voice frame is more than or equal to presetting zero-crossing rate
Consonant signal.
Based on above-mentioned, embodiments of the invention are according to the first consonant frequency band signals energy and the second consonant frequency range
The ratio of the ratio of signal energy, the first consonant frequency band signals energy and raw tone sampled signal energy with
And the second corresponding target voice of ratio in judgement of consonant frequency band signals energy and raw tone sampled signal energy
Whether the raw tone sampled signal of frame is noise, to lower, raw tone sampled signal is mistaken for consonant
The situation of signal occurs, and then improves the identification precision of consonant signal.
For the features described above of the present invention and advantage can be become apparent, special embodiment below, and coordinate
Accompanying drawing is described in detail below.
Accompanying drawing explanation
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention;
Fig. 2 A~2B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention;
Fig. 3 A~3B illustrates the schematic flow sheet of the speech identifying method of another embodiment of the present invention.
Description of reference numerals:
102: bandpass filtering unit;
104: processing unit;
S1: voice signal;
S2: the first bandpass filtered signal;
S3: the second bandpass filtered signal;
The process step of S202~S230, S302: speech identifying method.
Detailed description of the invention
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention, refer to Fig. 1.Language
Sound device for identifying includes bandpass filtering unit 102 and processing unit 104, bandpass filtering unit 102 coupling
Connecing processing unit 104, bandpass filtering unit 102 can such as be implemented with band filter, and processes list
Unit 104 can such as implement with CPU, but is not limited.Bandpass filtering unit 102 can
Voice signal S1 is carried out the first consonant frequency range and the bandpass filtering of the second consonant frequency range, to produce respectively
First bandpass filtered signal S2 and the second bandpass filtered signal S3, in the present embodiment the first consonant frequency
Section and the second consonant frequency range are respectively 2kHz~4kHz and 4kHz~10kHz, but are not limited.
Processing unit 104 can be to voice signal S1, the first bandpass filtered signal S2 and the logical filter of the second band
Ripple signal S3 is sampled, and by logical to voice signal S1, the first bandpass filtered signal S2 and the second band
Filtering signal S3 is divided into multiple speech frame, and wherein each speech frame can include N number of voice signal S1's
Sampled signal, the sampled signal of N number of first bandpass filtered signal S2 and N number of second bandpass filtering letter
The sampled signal of number S3.Processing unit 104 also can calculate the energy of sampled signal in each speech frame, with
Obtain raw tone sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals
Energy, wherein raw tone sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency
The sampled signal of voice signal S1, the first bandpass filtered signal in the most corresponding speech frame of segment signal energy
The sampled signal of S2 and the energy of the sampled signal of the second bandpass filtered signal S3.Obtaining original language
After sound sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy, place
Reason unit 104 just can be according to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy
The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency
The raw tone of segment signal energy and ratio in judgement each speech frame corresponding of raw tone sampled signal energy
Whether sampled signal is noise.
Specifically, processing unit 104 can determine whether the first consonant frequency band signals energy and the second consonant frequency range
The ratio of the ratio of signal energy, the first consonant frequency band signals energy and raw tone sampled signal energy with
And second consonant frequency band signals energy and the ratio of raw tone sampled signal energy to fall within it the most respectively right
The default ratio range answered, if the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy
The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency
Segment signal energy falls within its corresponding default ratio model respectively with the ratio of raw tone sampled signal energy
Enclose, then the raw tone sampled signal of target voice frame is noise signal.
For example, processing unit 104 judges corresponding a target voice frame (such as m-th speech frame, m
For positive integer) raw tone sampled signal be whether the mode of noise, can judge with following formula:
Wherein EB1mIt is the first consonant frequency band signals energy, EB2mIt is the second consonant frequency band signals energy, and Em
For raw tone sampled signal energy, when formula (1), (2), (3) all meet, processing unit 104 judges
The raw tone sampled signal of m speech frame is noise signal.
After the raw tone sampled signal judging target voice frame is noise signal, processing unit 104
Also calculate multiple voices of the raw tone sampled signal being judged as noise signal before target voice frame
The energy weighted mean of frame, to obtain noise signal energy weighted mean, and according to target voice frame
Whether corresponding raw tone sampled signal energy judges mesh more than noise signal energy weighted mean
Whether mark raw tone sampled signal corresponding to speech frame is consonant signal.
For example, noise signal energy weighted mean can be to calculate to be judged before target voice frame
Obtain for the energy weighted mean of 3 speech frames of raw tone sampled signal of noise signal, false
Before being located at m-th speech frame, three speech frames being judged as noise recently are respectively m-10
Speech frame, m-12 speech frame and the m-20 speech frame, then the making an uproar of corresponding m-th speech frame
Acoustical signal energy weighted mean AKmCan be as follows shown in formula:
Wherein Em-10、Em-12、Em-20Be respectively the m-10 speech frame, m-12 speech frame and
The raw tone sampled signal energy of m-20 speech frame, and a0, a1, a2 are respectively m-10
The weighted value that speech frame, m-12 speech frame and the m-20 speech frame are corresponding.Wherein weighted value
A0, a1, a2 can be fixed value or variation value.For example, corresponding each is judged as noise letter
Number the weighted value of speech frame of raw tone sampled signal can be judged as noise signal with corresponding each
Interval length between speech frame and the target voice frame of raw tone sampled signal is different and changes.As
In the present embodiment, weighted value a0, a1, a2 can be with the interval length between speech frame and m-th speech frame
Different and change.As noise signal energy weighted mean AKmMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether corresponding the
The raw tone sampled signal of m speech frame is consonant signal:
Em>AKm (5)
It addition, processing unit can calculate multiple raw tone sampled signal being judged as noise signal before
The ratio of the first consonant frequency band signals energy corresponding to speech frame and raw tone sampled signal energy
Weighted mean is to obtain the first consonant energy proportion weighted mean and right according to target voice frame
The the first consonant frequency band signals energy answered is the most auxiliary less than first with the ratio of raw tone sampled signal energy
Sound energy proportion weighted mean judges that whether the raw tone sampled signal corresponding to target voice frame is
Consonant signal.For example, the first consonant energy proportion weighted mean can be to calculate at target voice frame
It is judged as the first consonant frequency range letter of 3 speech frames of the raw tone sampled signal of noise signal before
Number energy obtains with the weighted mean of the ratio of raw tone sampled signal energy, it is assumed that in m-th
Before speech frame, be judged as recently three speech frames of noise be respectively the m-10 speech frame,
M-12 speech frame and the m-20 speech frame, then the first consonant energy of corresponding m-th speech frame
Proportion weighted meansigma methods AFmCan be as follows shown in formula:
Wherein EB1m-10、EB1m-12、EB1m-20It is respectively the m-10 speech frame, the m-12 speech frame
And the first consonant frequency band signals energy of m-20 speech frame, Em-10、Em-12、Em-20It is respectively the
M-10 speech frame, m-12 speech frame and the raw tone sampled signal of m-20 speech frame
Energy, and c0, c1, c2 are respectively the m-10 speech frame, m-12 speech frame and m-20
The weighted value that individual speech frame is corresponding.Wherein weighted value c0, c1, c2 can be fixed value or variation value.
For example, corresponding to the speech frame of each raw tone sampled signal being judged as noise signal corresponding
The weighted value of the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy can be with corresponding
Between between speech frame and the target voice frame of each raw tone sampled signal being judged as noise signal
Change every length difference.As in the present embodiment, weighted value c0, c1, c2 can be with speech frame and m
Interval length between individual speech frame is different and changes.As the first consonant energy proportion weighted mean AFmFull
The foot column period of the day from 11 p.m. to 1 a.m, can determine whether that the raw tone sampled signal of corresponding m-th speech frame is consonant signal:
Additionally, processing unit 104 can be according to the second consonant frequency band signals energy and raw tone sampled signal
The ratio of energy and the first consonant frequency band signals energy with the ratio sum of raw tone sampled signal energy are
No more than or equal to preset and value judge whether the raw tone sampled signal corresponding to target voice frame is auxiliary
Tone signal.Such as, for m-th speech frame, above-mentioned judgment mode can be with following formula subrepresentation:
In the present embodiment, preset and value is 1, but be not limited thereto, preset and value also can be according to reality
Situation is adjusted to other values.
Also, processing unit 104 also can be according to the second consonant frequency band signals energy and raw tone sampled signal
Whether the ratio of energy judges the raw tone sampling corresponding to target voice frame more than or equal to default ratio
Whether signal is consonant signal.Such as, for m-th speech frame, above-mentioned judgment mode can be following
Formula represents:
In the present embodiment, default ratio is 0.8, but is not limited, and presets ratio in some embodiments
Value is alternatively other values, is shown below:
In formula (7), default ratio is 0.35.
It addition, whether processing unit 104 also can be according to raw tone sampled signal energy more than or equal to lower limit
Value judges whether the raw tone sampled signal corresponding to target voice frame is consonant signal.Such as, right
For m-th speech frame, above-mentioned judgment mode can be with following formula subrepresentation:
Em≥50 (11)
In the present embodiment, lower limit is 50, but is not limited, and lower limit is also in some embodiments
Can be adjusted according to practical situation.
Owing to consonant signal there may be the situation appearance that energy varies in size, in the part that energy comparison is little
Can may be considered noise, for avoiding this situation, judge that raw tone takes except above-mentioned according to energy
Whether sample signal is outside consonant signal, according to zero-crossing rate, processing unit 104 also can judge that raw tone takes
Whether sample signal is consonant signal.Processing unit 104 can calculate the first zero passage of raw tone sampled signal
Rate, the second zero-crossing rate and the 3rd zero-crossing rate, and many before calculating target voice frame and target voice frame
The average zero-crossing rate of the raw tone sampled signal of individual speech frame, with obtain the first average zero-crossing rate, second
Average zero-crossing rate and the 3rd average zero-crossing rate, and according to the first average zero-crossing rate, the second average zero-crossing rate
And the 3rd average zero-crossing rate whether be respectively greater than and judge target equal to the default average zero-crossing rate of its correspondence
Whether the raw tone sampled signal corresponding to speech frame is consonant signal.Wherein the first zero-crossing rate, second
It is pre-by first that zero-crossing rate and the 3rd zero-crossing rate are respectively original phonetic sampling signal in target voice frame
If value, the second preset value and the number of times of the 3rd preset value, wherein the second preset value is less than the first preset value
And more than the 3rd preset value.
For m-th speech frame, original zero-crossing rateCan be shown below:
Wherein N is positive integer, and it represents the number of the sampled signal in m-th speech frame, and mL is
Amplitude threshold value, andFor the raw tone sampled signal in m-th speech frame.Processing unit 104
Can foundationWhether preset zero-crossing rate to judge whether raw tone sampled signal is consonant letter more than or equal to one
Number, such as can judge according to following formula:
Wherein presetting zero-crossing rate not to be limited with 22, its value also can be entered according to practical situation in some embodiments
Row sum-equal matrix.Additionally, processing unit 104 additionally can comprise energy condition according to raw tone sampled signal
Zero-crossing rateJudge whether raw tone sampled signal is consonant signal, zero-crossing rateCan
It is shown below:
WhereinCan represent with following formula:
In the present embodiment, αxValue be 0.5, but be not limited, its value also may be used in some embodiments
It is adjusted according to practical situation.Thus by the benchmark of Adjustable calculation zero-crossing rate, can judge former more accurately
Whether beginning phonetic sampling signal is consonant signal.Processing unit 104 also can average according to multiple speech frames
Zero-crossing rate judges whether raw tone sampled signal is consonant signal, for example, to m-th voice
For frame, can be according to its zero-crossing rate with nearest two speech frames (namely m-1, m-2 speech frame)
Meansigma methods judges whether raw tone sampled signal is consonant signal, and it judges that formula can be as follows:
Described in example performed as described above, processing unit 104 can judge former according to energy or zero-crossing rate at least one
Whether beginning phonetic sampling signal is consonant signal, namely processing unit 104 can the condition of summary formula
At least one judges whether the raw tone sampled signal of corresponding target voice frame is consonant signal.Citing
For, processing unit 104 can determine whether that formula (5), (7), (9), (11), (13), (18), (19), (20) are
No meet simultaneously, just judge that the raw tone sampled signal of correspondence target voice frame is consonant if meet simultaneously
Signal.The most such as, processing unit 104 also can determine whether formula (5), (8), (10), (11), (13), (18),
(19), (20) meet, if meet the raw tone sampling just judging corresponding target voice frame simultaneously the most simultaneously
Signal is consonant signal.
Fig. 2 A~2B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention, refer to figure
2A~2B.From above-described embodiment, the speech identifying method of voice identification apparatus can comprise the following steps.
First, voice signal is carried out the first consonant frequency range and the bandpass filtering of the second consonant frequency range, with respectively
Produce the first bandpass filtered signal and the second bandpass filtered signal (step S202).Then, by voice signal,
First bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frame (step S204), respectively
Speech frame includes N number of sampled signal, and N is positive integer.Then, sampled signal in target voice frame is calculated
Energy, to obtain a raw tone sampled signal energy, one first consonant frequency band signals energy and
Second consonant frequency band signals energy (step S206).Afterwards, according to the first consonant frequency band signals energy and the
The ratio of two consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy
The ratio of amount and the ratio in judgement pair of the second consonant frequency band signals energy and raw tone sampled signal energy
Whether the raw tone sampled signal answering target voice frame is noise (step S208).Such as, can determine whether
One consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy
Measure ratio and the second consonant frequency band signals energy with raw tone sampled signal energy to take with raw tone
The ratio of sample signal energy falls within the default ratio range of correspondence the most respectively, if the first consonant frequency band signals
Energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy take with raw tone
The ratio of sample signal energy and the second consonant frequency band signals energy and the ratio of raw tone sampled signal energy
Value falls within the default ratio range of correspondence respectively, then the raw tone sampled signal of target voice frame is noise
Signal.
Afterwards, the speech frame of multiple raw tone sampled signal being judged as noise signal before is calculated
Energy weighted mean, to obtain noise signal energy weighted mean (step S210).Then mesh is judged
Whether mark raw tone sampled signal energy corresponding to speech frame is more than noise signal energy weighted mean
(step S212), wherein corresponding each is judged as the speech frame of raw tone sampled signal of noise signal
Weighted value can be with corresponding each speech frame of raw tone sampled signal being judged as noise signal and mesh
Interval length between mark speech frame is different and changes.If the raw tone sampling corresponding to target voice frame
Signal energy is more than noise signal energy weighted mean, then judge corresponding to target voice frame is original
Phonetic sampling signal non-consonant signal (step S214).If on the contrary, original corresponding to target voice frame
Phonetic sampling signal energy be more than noise signal energy weighted mean, then calculate multiple before be judged as
The first consonant frequency band signals energy corresponding to raw tone sampled signal of noise signal takes with raw tone
The weighted mean of the ratio of sample signal energy, to obtain the first consonant energy proportion weighted mean (step
S216).Judge that the first consonant frequency band signals energy corresponding to target voice frame takes with raw tone the most again
Whether the ratio of sample signal energy is less than the first consonant energy proportion weighted mean (step S218), wherein
Corresponding each is judged as the first consonant frequency band signals corresponding to raw tone sampled signal of noise signal
Energy is judged as noise signal with the weighted value of the ratio of raw tone sampled signal energy with corresponding each
Raw tone sampled signal speech frame and target voice frame between interval length different and change.
If the first consonant frequency band signals energy corresponding to target voice frame and raw tone sampled signal energy
Ratio less than the first consonant energy proportion weighted mean, the then original language corresponding to target voice frame
Sound sampled signal non-consonant signal (step S214).If on the contrary, first auxiliary corresponding to target voice frame
Audio band signals energy weights less than the first consonant energy proportion with the ratio of raw tone sampled signal energy
Meansigma methods, the most then judges the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy
Whether more than or equal to presetting ratio (step S220).If the second consonant frequency band signals energy takes with raw tone
The ratio of sample signal energy is not more than or equal to presetting ratio, then the raw tone corresponding to target voice frame takes
Sample signal non-consonant signal (step S214).If on the contrary, the second consonant frequency band signals energy and original language
The ratio of sound sampled signal energy more than or equal to presetting ratio, then judges that raw tone sampled signal energy is
No more than or equal to lower limit (step S222).If raw tone sampled signal energy is not more than or equal to lower limit,
The then raw tone sampled signal non-consonant signal (step S214) corresponding to target voice frame.
If on the contrary, raw tone sampled signal energy is more than or equal to lower limit, the most then calculating this original
First zero-crossing rate of phonetic sampling signal, the second zero-crossing rate and the 3rd zero-crossing rate, and calculate target voice
The average zero-crossing rate of the raw tone sampled signal of the multiple speech frames before frame and target voice frame, with
To one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd average zero-crossing rate (step S224).
Wherein the first zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate are respectively original language in target voice frame
Sound sampled signal by the first preset value, the second preset value and the number of times of the 3rd preset value, wherein second
Preset value is less than the first preset value and more than the 3rd preset value.Judge the most again the first average zero-crossing rate,
Whether two average zero-crossing rates and the 3rd average zero-crossing rate are respectively greater than the default average zero passage equal to its correspondence
Rate (step S226).If the first average zero-crossing rate, the second average zero-crossing rate and the 3rd average zero-crossing rate are not
All more than or equal to the default average zero-crossing rate of its correspondence, then the raw tone sampling corresponding to target voice frame
Signal non-consonant signal (step S214).If on the contrary, the first average zero-crossing rate, the second average zero-crossing rate
And the 3rd average zero-crossing rate more than or equal to the default average zero-crossing rate of its correspondence, the most then judge the second mistake
Whether zero rate is more than or equal to presetting zero-crossing rate (step S228).If the second zero-crossing rate is more than or equal to presetting
Zero rate, then the raw tone sampled signal non-consonant signal (step S214) corresponding to target voice frame.Phase
Instead, if the second zero-crossing rate is more than or equal to presetting zero-crossing rate, the then raw tone corresponding to target voice frame
Sampled signal is consonant signal (step S230).
Fig. 3 A~3B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention, refer to figure
3A~3B.The present embodiment is with the difference of Fig. 2 A~2B embodiment, and the present embodiment is in step S212
Judge that the raw tone sampled signal energy corresponding to target voice frame is flat more than noise signal energy weighting
After average, then judge the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy with
And first the ratio sum of consonant frequency band signals energy and raw tone sampled signal energy whether be more than or equal to
Preset and value (step S302), if the second consonant frequency band signals energy and raw tone sampled signal energy
The ratio sum of ratio and the first consonant frequency band signals energy and raw tone sampled signal energy is not more than
In presetting and value, then the raw tone sampled signal non-consonant signal (step corresponding to target voice frame
S214).If on the contrary, the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy with
The ratio sum of the first consonant frequency band signals energy and raw tone sampled signal energy more than or equal to presetting and
Value, then be directly entered step S220, it is judged that the second consonant frequency band signals energy and raw tone sampled signal
Whether the ratio of energy is more than or equal to preset ratio, and as Fig. 2 A~2B embodiment continue executing with voice below
The step of discrimination method.
In sum, embodiments of the invention the condition at least one of summary formula can judge correspondence
Whether the raw tone sampled signal of target voice frame is consonant signal, accurate to improve the identification of consonant signal
Exactness.Such as can according to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy,
First consonant frequency band signals energy and the ratio of raw tone sampled signal energy and the second consonant frequency range are believed
Number energy samples with the raw tone of the ratio in judgement corresponding target voice frame of raw tone sampled signal energy
Whether signal is noise, to lower the situation generation that raw tone sampled signal is mistaken for consonant signal,
And then improve the identification precision of consonant signal.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, rather than right
It limits;Although the present invention being described in detail with reference to foregoing embodiments, this area common
Skilled artisans appreciate that the technical scheme described in foregoing embodiments still can be modified by it,
Or the most some or all of technical characteristic is carried out equivalent;And these amendments or replacement, and
The essence not making appropriate technical solution departs from the scope of various embodiments of the present invention technical scheme.
Claims (22)
1. a voice identification apparatus, it is characterised in that including:
One bandpass filtering unit, carries out one first consonant frequency range and one second consonant frequency to a voice signal
The bandpass filtering of section, to produce one first bandpass filtered signal and one second bandpass filtered signal respectively;
And
One processing unit, couples this bandpass filtering unit, by this voice signal, this first bandpass filtering letter
Number and this second bandpass filtered signal be divided into multiple speech frame, respectively this speech frame includes N number of taking
Sample signal, N is positive integer, calculates the energy of sampled signal in target voice frame, to obtain an original language
Sound sampled signal energy, one first consonant frequency band signals energy and one second consonant frequency band signals energy,
According to this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this is first auxiliary
The ratio of audio band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals
The ratio in judgement of energy and this raw tone sampled signal energy is to should the raw tone of target voice frame take
Whether sample signal is noise.
Voice identification apparatus the most according to claim 1, it is characterised in that this processing unit is also
Judge this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this is first auxiliary
The ratio of audio band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals
Energy falls within corresponding default ratio range the most respectively with the ratio of this raw tone sampled signal energy,
If this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this first consonant
The ratio of frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals energy
Amount falls within corresponding default ratio range, then this mesh respectively with the ratio of this raw tone sampled signal energy
The raw tone sampled signal of mark speech frame is noise signal.
Voice identification apparatus the most according to claim 1, it is characterised in that this processing unit is also
The energy weighting of the speech frame calculating multiple raw tone sampled signal being judged as noise signal before is flat
Average, to obtain a noise signal energy weighted mean, and former according to corresponding to this target voice frame
Whether beginning phonetic sampling signal energy judges this target voice more than this noise signal energy weighted mean
Whether the raw tone sampled signal corresponding to frame is consonant signal.
Voice identification apparatus the most according to claim 3, it is characterised in that respectively this is judged to correspondence
With correspondence, respectively this is judged as making an uproar the weighted value of speech frame of the raw tone sampled signal for noise signal of breaking
Interval length between speech frame and this target voice frame of the raw tone sampled signal of acoustical signal different and
Change.
Voice identification apparatus the most according to claim 3, it is characterised in that this processing unit is also
First auxiliary with this according to this second consonant frequency band signals energy and the ratio of this raw tone sampled signal energy
Audio band signals energy is the most default more than or equal to one with the ratio sum of this raw tone sampled signal energy
Judge whether the raw tone sampled signal corresponding to this target voice frame is consonant signal with value.
Voice identification apparatus the most according to claim 5, it is characterised in that this processing unit is also
Calculate multiple before be judged as noise signal raw tone sampled signal speech frame corresponding to this
The weighted mean of the ratio of one consonant frequency band signals energy and this raw tone sampled signal energy, with
To one first consonant energy proportion weighted mean and first auxiliary according to this corresponding to this target voice frame
Whether audio band signals energy is less than this first consonant energy with the ratio of this raw tone sampled signal energy
Proportion weighted meansigma methods judges whether this raw tone sampled signal corresponding to target voice frame is consonant
Signal.
Voice identification apparatus the most according to claim 6, it is characterised in that respectively this is judged to correspondence
Break this first consonant frequency band signals energy corresponding to the speech frame of the raw tone sampled signal of noise signal
Measure with the weighted value of the ratio of this raw tone sampled signal energy that respectively this is judged as noise signal with corresponding
Raw tone sampled signal speech frame and this target voice frame between interval length different and change.
Voice identification apparatus the most according to claim 6, it is characterised in that this processing unit is also
Whether it is more than according to the ratio of this second consonant frequency band signals energy Yu this raw tone sampled signal energy
Whether preset ratio to judge the raw tone sampled signal corresponding to this target voice frame in one is consonant letter
Number.
Voice identification apparatus the most according to claim 8, it is characterised in that this processing unit is also
Whether this target voice frame institute is judged more than or equal to a lower limit according to this raw tone sampled signal energy
Whether corresponding raw tone sampled signal is consonant signal.
Voice identification apparatus the most according to claim 9, it is characterised in that this processing unit is also
Calculate the first zero-crossing rate of this raw tone sampled signal, the second zero-crossing rate and the 3rd zero-crossing rate, and count
Calculate raw tone sampled signal flat of multiple speech frames before this target voice frame and this target voice frame
All zero-crossing rates, to obtain one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd average mistake
Zero rate, and according to this first average zero-crossing rate, this second average zero-crossing rate and the 3rd average zero-crossing rate
Whether it is respectively greater than former equal to what the default average zero-crossing rate of its correspondence judged corresponding to this target voice frame
Whether beginning phonetic sampling signal is consonant signal, this first zero-crossing rate, this second zero-crossing rate and the 3rd
Zero-crossing rate be respectively in this target voice frame this raw tone sampled signal by one first preset value, one
Second preset value and the number of times of one the 3rd preset value, this second preset value is less than this first preset value and big
In the 3rd preset value.
11. voice identification apparatus according to claim 10, it is characterised in that this processing unit
Also whether preset zero-crossing rate to judge corresponding to this target voice frame more than or equal to one according to this second zero-crossing rate
Raw tone sampled signal whether be consonant signal.
12. 1 kinds of speech identifying methods, it is characterised in that including:
One voice signal is carried out one first consonant frequency range and the bandpass filtering of one second consonant frequency range, with
Produce one first bandpass filtered signal and one second bandpass filtered signal respectively;
This voice signal, this first bandpass filtered signal are divided into multiple language with this second bandpass filtered signal
Sound frame, respectively this speech frame includes N number of sampled signal, and N is positive integer;
Calculate the energy of sampled signal in target voice frame, with obtain a raw tone sampled signal energy,
One first consonant frequency band signals energy and one second consonant frequency band signals energy;And
According to this first consonant frequency band signals energy and this second consonant frequency band signals energy ratio, this
The ratio of one consonant frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency range
The ratio in judgement of signal energy and this raw tone sampled signal energy is to should the original language of target voice frame
Whether sound sampled signal is noise.
13. speech identifying methods according to claim 12, it is characterised in that also include:
Judge this first consonant frequency band signals energy and this second consonant frequency band signals energy ratio, this
The ratio of one consonant frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency range
Signal energy falls within corresponding default ratio model the most respectively with the ratio of this raw tone sampled signal energy
Enclose;And
If the ratio of this first consonant frequency band signals energy and this second consonant frequency band signals energy, this first
Consonant frequency band signals energy and the ratio of this raw tone sampled signal energy and this second consonant frequency range are believed
Number energy falls within corresponding default ratio range respectively with the ratio of this raw tone sampled signal energy, then
The raw tone sampled signal of this target voice frame is noise signal.
14. speech identifying methods according to claim 12, it is characterised in that also include:
The energy of the speech frame calculating multiple raw tone sampled signal being judged as noise signal before adds
Weight average value, to obtain a noise signal energy weighted mean;And
According to the raw tone sampled signal energy corresponding to this target voice frame whether more than this noise signal
Energy weighted mean judges whether this raw tone sampled signal corresponding to target voice frame is consonant
Signal.
15. speech identifying methods according to claim 14, it is characterised in that correspondence respectively this quilt
It is judged as that respectively this is judged as with correspondence for the weighted value of speech frame of the raw tone sampled signal of noise signal
Interval length between speech frame and this target voice frame of the raw tone sampled signal of noise signal is different
And change.
16. speech identifying methods according to claim 14, it is characterised in that also include:
According to this second consonant frequency band signals energy and this raw tone sampled signal energy ratio with this
Whether one consonant frequency band signals energy is more than or equal to one with the ratio sum of this raw tone sampled signal energy
Preset and value judges whether this raw tone sampled signal corresponding to target voice frame is consonant signal.
17. speech identifying methods according to claim 16, it is characterised in that also include:
Calculate corresponding to the speech frame of multiple raw tone sampled signal being judged as noise signal before
The weighted mean of the ratio of this first consonant frequency band signals energy and this raw tone sampled signal energy,
To obtain one first consonant energy proportion weighted mean;And
Sample with this raw tone according to this first consonant frequency band signals energy corresponding to this target voice frame
Whether the ratio of signal energy judges this target voice less than this first consonant energy proportion weighted mean
Whether the raw tone sampled signal corresponding to frame is consonant signal.
18. speech identifying methods according to claim 17, it is characterised in that correspondence respectively this quilt
It is judged as this first consonant frequency band signals energy corresponding to the raw tone sampled signal of noise signal and is somebody's turn to do
With correspondence, respectively this is judged as the original of noise signal to the weighted value of the ratio of raw tone sampled signal energy
Interval length between speech frame and this target voice frame of phonetic sampling signal is different and changes.
19. speech identifying methods according to claim 17, it is characterised in that also include:
The biggest with the ratio of this raw tone sampled signal energy according to this second consonant frequency band signals energy
Whether it is auxiliary in presetting ratio equal to one to judge the raw tone sampled signal corresponding to this target voice frame
Tone signal.
20. speech identifying methods according to claim 19, it is characterised in that also include:
Whether this target voice is judged more than or equal to a lower limit according to this raw tone sampled signal energy
Whether the raw tone sampled signal corresponding to frame is consonant signal.
21. speech identifying methods according to claim 20, it is characterised in that also include:
Calculate the first zero-crossing rate of this raw tone sampled signal, the second zero-crossing rate and the 3rd zero-crossing rate,
And calculate the raw tone sampled signal of the most N number of speech frame before this target voice frame and this target voice frame
Average zero-crossing rate, flat to obtain one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd
All zero-crossing rates, wherein N is positive integer, this first zero-crossing rate, this second zero-crossing rate and the 3rd zero passage
Rate be respectively in this target voice frame this raw tone sampled signal by one first preset value, one second
Preset value and the number of times of one the 3rd preset value, this second preset value is less than this first preset value and more than being somebody's turn to do
3rd preset value;And
According to this first average zero-crossing rate, this second average zero-crossing rate and the 3rd average zero-crossing rate whether
It is respectively greater than and judges the original language corresponding to this target voice frame equal to the default average zero-crossing rate of its correspondence
Whether sound sampled signal is consonant signal.
22. speech identifying methods according to claim 21, it is characterised in that also include:
Zero-crossing rate whether is preset more than or equal to one right to judge this target voice frame institute according to this second zero-crossing rate
Whether the raw tone sampled signal answered is consonant signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510059977.5A CN105989834B (en) | 2015-02-05 | 2015-02-05 | Voice recognition device and voice recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510059977.5A CN105989834B (en) | 2015-02-05 | 2015-02-05 | Voice recognition device and voice recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105989834A true CN105989834A (en) | 2016-10-05 |
CN105989834B CN105989834B (en) | 2019-12-24 |
Family
ID=57036196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510059977.5A Active CN105989834B (en) | 2015-02-05 | 2015-02-05 | Voice recognition device and voice recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105989834B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108461090A (en) * | 2017-02-21 | 2018-08-28 | 宏碁股份有限公司 | Speech signal processing device and audio signal processing method |
CN108648760A (en) * | 2018-04-17 | 2018-10-12 | 四川长虹电器股份有限公司 | Real-time sound-groove identification System and method for |
CN108922558A (en) * | 2018-08-20 | 2018-11-30 | 广东小天才科技有限公司 | Voice processing method, voice processing device and mobile terminal |
CN113936694A (en) * | 2021-12-17 | 2022-01-14 | 珠海普林芯驰科技有限公司 | Real-time human voice detection method, computer device and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1189664A (en) * | 1997-01-29 | 1998-08-05 | 合泰半导体股份有限公司 | Sub-voice discrimination method of voice coding |
KR100322731B1 (en) * | 1995-11-27 | 2002-06-20 | 윤종용 | Voice recognition method and method of normalizing time of voice pattern adapted therefor |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN1598927A (en) * | 2004-08-31 | 2005-03-23 | 四川微迪数字技术有限公司 | Chinese voice signal process method for digital deaf-aid |
US20070225972A1 (en) * | 2006-03-18 | 2007-09-27 | Samsung Electronics Co., Ltd. | Speech signal classification system and method |
-
2015
- 2015-02-05 CN CN201510059977.5A patent/CN105989834B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100322731B1 (en) * | 1995-11-27 | 2002-06-20 | 윤종용 | Voice recognition method and method of normalizing time of voice pattern adapted therefor |
CN1189664A (en) * | 1997-01-29 | 1998-08-05 | 合泰半导体股份有限公司 | Sub-voice discrimination method of voice coding |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN1598927A (en) * | 2004-08-31 | 2005-03-23 | 四川微迪数字技术有限公司 | Chinese voice signal process method for digital deaf-aid |
US20070225972A1 (en) * | 2006-03-18 | 2007-09-27 | Samsung Electronics Co., Ltd. | Speech signal classification system and method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108461090A (en) * | 2017-02-21 | 2018-08-28 | 宏碁股份有限公司 | Speech signal processing device and audio signal processing method |
CN108461090B (en) * | 2017-02-21 | 2021-07-06 | 宏碁股份有限公司 | Speech signal processing apparatus and speech signal processing method |
CN108648760A (en) * | 2018-04-17 | 2018-10-12 | 四川长虹电器股份有限公司 | Real-time sound-groove identification System and method for |
CN108922558A (en) * | 2018-08-20 | 2018-11-30 | 广东小天才科技有限公司 | Voice processing method, voice processing device and mobile terminal |
CN113936694A (en) * | 2021-12-17 | 2022-01-14 | 珠海普林芯驰科技有限公司 | Real-time human voice detection method, computer device and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105989834B (en) | 2019-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103578468B (en) | The method of adjustment and electronic equipment of a kind of confidence coefficient threshold of voice recognition | |
EP2413313B1 (en) | Method and device for audio signal classification | |
CN105989834A (en) | Voice recognition apparatus and voice recognition method | |
KR20190045278A (en) | A voice quality evaluation method and a voice quality evaluation apparatus | |
CN105321528A (en) | Microphone array voice detection method and device | |
CN111429932A (en) | Voice noise reduction method, device, equipment and medium | |
CN110070888A (en) | A kind of Parkinson's audio recognition method based on convolutional neural networks | |
CN106504760B (en) | Broadband ambient noise and speech Separation detection system and method | |
CN106448696A (en) | Adaptive high-pass filtering speech noise reduction method based on background noise estimation | |
JP2017027076A (en) | Method and apparatus for detecting correctness of pitch period | |
US20160217787A1 (en) | Speech recognition apparatus and speech recognition method | |
CN105916090A (en) | Hearing aid system based on intelligent speech recognition technology | |
CN107690034A (en) | Intelligent scene mode switching system and method based on environmental background sound | |
US9495973B2 (en) | Speech recognition apparatus and speech recognition method | |
CN110211596A (en) | One kind composing entropy cetacean whistle signal detection method based on Mel subband | |
CN111599372B (en) | Stable on-line multi-channel voice dereverberation method and system | |
CN111489763A (en) | Adaptive method for speaker recognition in complex environment based on GMM model | |
CN113763966B (en) | End-to-end text irrelevant voiceprint recognition method and system | |
CN116935894B (en) | Micro-motor abnormal sound identification method and system based on time-frequency domain mutation characteristics | |
CN105989835A (en) | Voice recognition apparatus and voice recognition method | |
CN104424954B (en) | noise estimation method and device | |
CN109377982A (en) | A kind of efficient voice acquisition methods | |
CN108717851B (en) | Voice recognition method and device | |
Fraile et al. | Mfcc-based remote pathology detection on speech transmitted through the telephone channel-impact of linear distortions: Band limitation, frequency response and noise | |
CN116230018A (en) | Synthetic voice quality evaluation method for voice synthesis system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |