CN105989834A - Voice recognition apparatus and voice recognition method - Google Patents

Voice recognition apparatus and voice recognition method Download PDF

Info

Publication number
CN105989834A
CN105989834A CN201510059977.5A CN201510059977A CN105989834A CN 105989834 A CN105989834 A CN 105989834A CN 201510059977 A CN201510059977 A CN 201510059977A CN 105989834 A CN105989834 A CN 105989834A
Authority
CN
China
Prior art keywords
energy
signal
raw tone
sampled signal
consonant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510059977.5A
Other languages
Chinese (zh)
Other versions
CN105989834B (en
Inventor
杜博仁
张嘉仁
曾凯盟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acer Inc
Original Assignee
Acer Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Inc filed Critical Acer Inc
Priority to CN201510059977.5A priority Critical patent/CN105989834B/en
Publication of CN105989834A publication Critical patent/CN105989834A/en
Application granted granted Critical
Publication of CN105989834B publication Critical patent/CN105989834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice recognition apparatus and a voice recognition method. The voice recognition apparatus and the voice recognition method can determine whether an original voice sampling signal corresponding to a target voice frame is noise according to at least one of a ratio between the first consonant frequency range signal energy and the second consonant frequency range signal energy, a ratio between the first consonant frequency range signal energy and the original voice sampling signal energy, and a ratio between the second consonant frequency range signal energy and the original voice sampling signal energy. The voice recognition apparatus and the voice recognition method can effectively recognize whether a voice signal is a consonant signal.

Description

Voice identification apparatus and speech identifying method
Technical field
The invention relates to a kind of device for identifying, and in particular to a kind of voice identification apparatus and language Sound discrimination method.
Background technology
For Hearing Impaired, it often cannot clearly receive the voice signal of higher-frequency, example Such as consonant signal, but the voice signal for low frequency can clearly be heard.Existing consonant signal is sentenced Disconnected mode is for carrying out signal processing in a frequency domain, it is judged that mode mainly has two kinds, and non-instant consonant signal is sentenced Disconnected and instant consonant judges.Non-instant consonant signal judges, is mainly judged by energy and zero-crossing rate. Whether instant consonant signal judges, mainly fix more than one according to the ratio of high-frequency signal with gross energy Value and the ratio of low frequency signal and gross energy whether determine that whether voice signal is less than fixing value Consonant signal.Though existing consonant signal judgment mode can distinguish consonant signal and noise, but its accuracy Still cannot meet the demand of reality.
Summary of the invention
The present invention provides a kind of voice identification apparatus and speech identifying method, can effectively pick out voice letter Number whether it is consonant signal.
The voice identification apparatus of the present invention, including bandpass filtering unit and processing unit.The wherein logical filter of band Ripple unit carries out the first consonant frequency range and the bandpass filtering of the second consonant frequency range to voice signal, with respectively Produce the first bandpass filtered signal and the second bandpass filtered signal.Processing unit coupled belt pass filtering unit, Voice signal, the first bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frame, The most each speech frame includes N number of sampled signal, and N is positive integer, and processing unit also calculates target voice frame The energy of middle sampled signal, to obtain raw tone sampled signal energy, the first consonant frequency band signals energy And the second consonant frequency band signals energy, believe with the second consonant frequency range according to the first consonant frequency band signals energy The ratio of number ratio of energy, the first consonant frequency band signals energy and raw tone sampled signal energy and The ratio in judgement corresponding target voice frame of the second consonant frequency band signals energy and raw tone sampled signal energy Raw tone sampled signal whether be noise.
In one embodiment of this invention, above-mentioned processing unit judges the first consonant frequency band signals energy and The ratio of two consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy Whether the ratio of amount and the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy divide Do not fall within the default ratio range of correspondence, if the first consonant frequency band signals energy and the second consonant frequency band signals The ratio of the ratio of energy, the first consonant frequency band signals energy and raw tone sampled signal energy and the Two consonant frequency band signals energy fall within corresponding default ratio respectively with the ratio of raw tone sampled signal energy Value scope, then the raw tone sampled signal of target voice frame is noise signal.
In one embodiment of this invention, above-mentioned processing unit also calculate multiple before be judged as noise letter Number the energy weighted mean of speech frame of raw tone sampled signal, add obtaining noise signal energy Weight average value, and whether make an uproar more than this according to the raw tone sampled signal energy corresponding to target voice frame Acoustical signal energy weighted mean judges that whether the raw tone sampled signal corresponding to target voice frame is Consonant signal.
In one embodiment of this invention, above-mentioned correspondence each be judged as noise signal raw tone sampling The weighted value of the speech frame of signal is judged as the raw tone sampled signal of noise signal with corresponding each Interval length between speech frame and target voice frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also according to the second consonant frequency band signals energy with The ratio of raw tone sampled signal energy and the first consonant frequency band signals energy are believed with raw tone sampling Whether the ratio sum of number energy is more than or equal to presetting and value judges the original language corresponding to target voice frame Whether sound sampled signal is consonant signal.
In one embodiment of this invention, above-mentioned processing unit also calculate multiple before be judged as noise letter Number first consonant frequency band signals energy and the raw tone corresponding to speech frame of raw tone sampled signal The weighted mean of the ratio of sampled signal energy, to obtain the first consonant energy proportion weighted mean, And according to the first consonant frequency band signals energy corresponding to target voice frame and raw tone sampled signal energy Ratio whether former less than what the first consonant energy proportion weighted mean judged corresponding to target voice frame Whether beginning phonetic sampling signal is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes The first consonant frequency band signals energy corresponding to the speech frame of sample signal and raw tone sampled signal energy The weighted value of ratio with corresponding each be judged as noise signal raw tone sampled signal speech frame with Interval length between target voice frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also according to the second consonant frequency band signals energy with More than or equal to default ratio, whether the ratio of raw tone sampled signal energy judges that target voice frame institute is right Whether the raw tone sampled signal answered is consonant signal.
In one embodiment of this invention, above-mentioned processing unit according to raw tone sampled signal energy is also No judge whether the raw tone sampled signal corresponding to target voice frame is consonant more than or equal to lower limit Signal.
In one embodiment of this invention, above-mentioned processing unit also calculates the first of raw tone sampled signal Zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate, and before calculating target voice frame and target voice frame The average zero-crossing rate of raw tone sampled signal of multiple speech frames, with obtain the first average zero-crossing rate, Second average zero-crossing rate and the 3rd average zero-crossing rate, and according to the first average zero-crossing rate, the second average mistake Whether zero rate and the 3rd average zero-crossing rate are respectively greater than and judge equal to the default average zero-crossing rate of its correspondence Whether the raw tone sampled signal corresponding to target voice frame is consonant signal, wherein the first zero-crossing rate, Second zero-crossing rate and the 3rd zero-crossing rate are respectively in target voice frame original phonetic sampling signal by the One preset value, the second preset value and the number of times of the 3rd preset value, the second preset value is less than the first preset value And more than the 3rd preset value.
In one embodiment of this invention, whether above-mentioned processing unit is also more than or equal to according to the second zero-crossing rate Default zero-crossing rate judges whether the raw tone sampled signal corresponding to target voice frame is consonant signal.
The speech identifying method of the present invention comprises the following steps.Voice signal is carried out the first consonant frequency range with And second bandpass filtering of consonant frequency range, to produce the first bandpass filtered signal and the second bandpass filtering respectively Signal.Voice signal, the first bandpass filtered signal are divided into multiple speech frame with the second bandpass filtered signal, The most each speech frame includes N number of sampled signal, and N is positive integer.Calculate sampled signal in target voice frame Energy, auxiliary to obtain raw tone sampled signal energy, the first consonant frequency band signals energy and second Audio band signals energy.According to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency The raw tone of segment signal energy and the ratio in judgement corresponding target voice frame of raw tone sampled signal energy Whether sampled signal is noise.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Judge first Consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy Sample with raw tone with ratio and the second consonant frequency band signals energy of raw tone sampled signal energy The ratio of signal energy falls within the default ratio range of correspondence the most respectively.If the first consonant frequency band signals energy Amount samples with raw tone with the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy The ratio of the ratio of signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy Fall within the default ratio range of correspondence respectively, then the raw tone sampled signal of target voice frame is noise letter Number.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate multiple It is judged as the energy weighted mean of the speech frame of the raw tone sampled signal of noise signal before, with Obtain noise signal energy weighted mean.According to the raw tone sampled signal corresponding to target voice frame Whether energy judges the raw tone corresponding to target voice frame more than noise signal energy weighted mean Whether sampled signal is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes The weighted value of the speech frame of sample signal is judged as the raw tone sampled signal of noise signal with corresponding each Speech frame and target voice frame between interval length different and change.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second consonant frequency range The ratio of signal energy and raw tone sampled signal energy and the first consonant frequency band signals energy are with original More than or equal to default and value, whether the ratio sum of phonetic sampling signal energy judges that target voice frame institute is right Whether the raw tone sampled signal answered is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate multiple It is judged as the first consonant frequency range corresponding to speech frame of the raw tone sampled signal of noise signal before The weighted mean of the ratio of signal energy and raw tone sampled signal energy, to obtain the first consonant energy Amount proportion weighted meansigma methods.According to the first consonant frequency band signals energy corresponding to target voice frame with original Whether the ratio of phonetic sampling signal energy judges target less than the first consonant energy proportion weighted mean Whether the raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, each raw tone being judged as noise signal of above-mentioned correspondence takes Adding of the ratio of the first consonant frequency band signals energy corresponding to sample signal and raw tone sampled signal energy Weights are with corresponding each speech frame of raw tone sampled signal being judged as noise signal and target voice Interval length between frame is different and changes.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second consonant frequency range Whether signal energy judges target more than or equal to default ratio with the ratio of raw tone sampled signal energy Whether the raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also includes, samples according to raw tone Whether signal energy judges the raw tone sampled signal corresponding to target voice frame more than or equal to lower limit Whether it is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also comprises the following steps.Calculate original First zero-crossing rate of phonetic sampling signal, the second zero-crossing rate and the 3rd zero-crossing rate, and calculate target voice The average zero-crossing rate of the raw tone sampled signal of the most N number of speech frame before frame and target voice frame, with To the first average zero-crossing rate, the second average zero-crossing rate and the 3rd average zero-crossing rate, wherein N is positive integer, Wherein the first zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate are respectively original language in target voice frame Sound sampled signal is by the first preset value, the second preset value and the number of times of the 3rd preset value, and second presets Value is less than the first preset value and more than the 3rd preset value.According to the first average zero-crossing rate, the second average zero passage Whether rate and the 3rd average zero-crossing rate are respectively greater than and judge mesh equal to the default average zero-crossing rate of its correspondence Whether mark raw tone sampled signal corresponding to speech frame is consonant signal.
In one embodiment of this invention, above-mentioned speech identifying method also includes, according to the second zero-crossing rate is No judge that whether the raw tone sampled signal corresponding to target voice frame is more than or equal to presetting zero-crossing rate Consonant signal.
Based on above-mentioned, embodiments of the invention are according to the first consonant frequency band signals energy and the second consonant frequency range The ratio of the ratio of signal energy, the first consonant frequency band signals energy and raw tone sampled signal energy with And the second corresponding target voice of ratio in judgement of consonant frequency band signals energy and raw tone sampled signal energy Whether the raw tone sampled signal of frame is noise, to lower, raw tone sampled signal is mistaken for consonant The situation of signal occurs, and then improves the identification precision of consonant signal.
For the features described above of the present invention and advantage can be become apparent, special embodiment below, and coordinate Accompanying drawing is described in detail below.
Accompanying drawing explanation
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention;
Fig. 2 A~2B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention;
Fig. 3 A~3B illustrates the schematic flow sheet of the speech identifying method of another embodiment of the present invention.
Description of reference numerals:
102: bandpass filtering unit;
104: processing unit;
S1: voice signal;
S2: the first bandpass filtered signal;
S3: the second bandpass filtered signal;
The process step of S202~S230, S302: speech identifying method.
Detailed description of the invention
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention, refer to Fig. 1.Language Sound device for identifying includes bandpass filtering unit 102 and processing unit 104, bandpass filtering unit 102 coupling Connecing processing unit 104, bandpass filtering unit 102 can such as be implemented with band filter, and processes list Unit 104 can such as implement with CPU, but is not limited.Bandpass filtering unit 102 can Voice signal S1 is carried out the first consonant frequency range and the bandpass filtering of the second consonant frequency range, to produce respectively First bandpass filtered signal S2 and the second bandpass filtered signal S3, in the present embodiment the first consonant frequency Section and the second consonant frequency range are respectively 2kHz~4kHz and 4kHz~10kHz, but are not limited.
Processing unit 104 can be to voice signal S1, the first bandpass filtered signal S2 and the logical filter of the second band Ripple signal S3 is sampled, and by logical to voice signal S1, the first bandpass filtered signal S2 and the second band Filtering signal S3 is divided into multiple speech frame, and wherein each speech frame can include N number of voice signal S1's Sampled signal, the sampled signal of N number of first bandpass filtered signal S2 and N number of second bandpass filtering letter The sampled signal of number S3.Processing unit 104 also can calculate the energy of sampled signal in each speech frame, with Obtain raw tone sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals Energy, wherein raw tone sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency The sampled signal of voice signal S1, the first bandpass filtered signal in the most corresponding speech frame of segment signal energy The sampled signal of S2 and the energy of the sampled signal of the second bandpass filtered signal S3.Obtaining original language After sound sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy, place Reason unit 104 just can be according to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency The raw tone of segment signal energy and ratio in judgement each speech frame corresponding of raw tone sampled signal energy Whether sampled signal is noise.
Specifically, processing unit 104 can determine whether the first consonant frequency band signals energy and the second consonant frequency range The ratio of the ratio of signal energy, the first consonant frequency band signals energy and raw tone sampled signal energy with And second consonant frequency band signals energy and the ratio of raw tone sampled signal energy to fall within it the most respectively right The default ratio range answered, if the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy The ratio of value, the first consonant frequency band signals energy and raw tone sampled signal energy and the second consonant frequency Segment signal energy falls within its corresponding default ratio model respectively with the ratio of raw tone sampled signal energy Enclose, then the raw tone sampled signal of target voice frame is noise signal.
For example, processing unit 104 judges corresponding a target voice frame (such as m-th speech frame, m For positive integer) raw tone sampled signal be whether the mode of noise, can judge with following formula:
0.7 < EB 1 m EB 2 m < 1 . 3 - - - ( 1 )
0.25 < EB 2 m E m < 0.5 - - - ( 2 )
0.25 < EB 1 m E m < 0.5 - - - ( 3 )
Wherein EB1mIt is the first consonant frequency band signals energy, EB2mIt is the second consonant frequency band signals energy, and Em For raw tone sampled signal energy, when formula (1), (2), (3) all meet, processing unit 104 judges The raw tone sampled signal of m speech frame is noise signal.
After the raw tone sampled signal judging target voice frame is noise signal, processing unit 104 Also calculate multiple voices of the raw tone sampled signal being judged as noise signal before target voice frame The energy weighted mean of frame, to obtain noise signal energy weighted mean, and according to target voice frame Whether corresponding raw tone sampled signal energy judges mesh more than noise signal energy weighted mean Whether mark raw tone sampled signal corresponding to speech frame is consonant signal.
For example, noise signal energy weighted mean can be to calculate to be judged before target voice frame Obtain for the energy weighted mean of 3 speech frames of raw tone sampled signal of noise signal, false Before being located at m-th speech frame, three speech frames being judged as noise recently are respectively m-10 Speech frame, m-12 speech frame and the m-20 speech frame, then the making an uproar of corresponding m-th speech frame Acoustical signal energy weighted mean AKmCan be as follows shown in formula:
AK m = a 0 &times; E m - 10 + a 1 &times; E m - 12 + a 2 &times; E m - 20 a 0 + a 1 + a 2 - - - ( 4 )
Wherein Em-10、Em-12、Em-20Be respectively the m-10 speech frame, m-12 speech frame and The raw tone sampled signal energy of m-20 speech frame, and a0, a1, a2 are respectively m-10 The weighted value that speech frame, m-12 speech frame and the m-20 speech frame are corresponding.Wherein weighted value A0, a1, a2 can be fixed value or variation value.For example, corresponding each is judged as noise letter Number the weighted value of speech frame of raw tone sampled signal can be judged as noise signal with corresponding each Interval length between speech frame and the target voice frame of raw tone sampled signal is different and changes.As In the present embodiment, weighted value a0, a1, a2 can be with the interval length between speech frame and m-th speech frame Different and change.As noise signal energy weighted mean AKmMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether corresponding the The raw tone sampled signal of m speech frame is consonant signal:
Em>AKm (5)
It addition, processing unit can calculate multiple raw tone sampled signal being judged as noise signal before The ratio of the first consonant frequency band signals energy corresponding to speech frame and raw tone sampled signal energy Weighted mean is to obtain the first consonant energy proportion weighted mean and right according to target voice frame The the first consonant frequency band signals energy answered is the most auxiliary less than first with the ratio of raw tone sampled signal energy Sound energy proportion weighted mean judges that whether the raw tone sampled signal corresponding to target voice frame is Consonant signal.For example, the first consonant energy proportion weighted mean can be to calculate at target voice frame It is judged as the first consonant frequency range letter of 3 speech frames of the raw tone sampled signal of noise signal before Number energy obtains with the weighted mean of the ratio of raw tone sampled signal energy, it is assumed that in m-th Before speech frame, be judged as recently three speech frames of noise be respectively the m-10 speech frame, M-12 speech frame and the m-20 speech frame, then the first consonant energy of corresponding m-th speech frame Proportion weighted meansigma methods AFmCan be as follows shown in formula:
AK m = c 0 &times; EB 1 m - 10 E m - 10 + c 1 &times; EB 1 m - 12 E m - 12 + c 2 &times; EB 1 m - 20 E m - 20 c 0 + c 1 + c 2 - - - ( 6 )
Wherein EB1m-10、EB1m-12、EB1m-20It is respectively the m-10 speech frame, the m-12 speech frame And the first consonant frequency band signals energy of m-20 speech frame, Em-10、Em-12、Em-20It is respectively the M-10 speech frame, m-12 speech frame and the raw tone sampled signal of m-20 speech frame Energy, and c0, c1, c2 are respectively the m-10 speech frame, m-12 speech frame and m-20 The weighted value that individual speech frame is corresponding.Wherein weighted value c0, c1, c2 can be fixed value or variation value. For example, corresponding to the speech frame of each raw tone sampled signal being judged as noise signal corresponding The weighted value of the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy can be with corresponding Between between speech frame and the target voice frame of each raw tone sampled signal being judged as noise signal Change every length difference.As in the present embodiment, weighted value c0, c1, c2 can be with speech frame and m Interval length between individual speech frame is different and changes.As the first consonant energy proportion weighted mean AFmFull The foot column period of the day from 11 p.m. to 1 a.m, can determine whether that the raw tone sampled signal of corresponding m-th speech frame is consonant signal:
EB 1 m E m < AF m - - - ( 7 )
Additionally, processing unit 104 can be according to the second consonant frequency band signals energy and raw tone sampled signal The ratio of energy and the first consonant frequency band signals energy with the ratio sum of raw tone sampled signal energy are No more than or equal to preset and value judge whether the raw tone sampled signal corresponding to target voice frame is auxiliary Tone signal.Such as, for m-th speech frame, above-mentioned judgment mode can be with following formula subrepresentation:
EB 1 m E m + - EB 2 m E m &GreaterEqual; 1 - - - ( 8 )
In the present embodiment, preset and value is 1, but be not limited thereto, preset and value also can be according to reality Situation is adjusted to other values.
Also, processing unit 104 also can be according to the second consonant frequency band signals energy and raw tone sampled signal Whether the ratio of energy judges the raw tone sampling corresponding to target voice frame more than or equal to default ratio Whether signal is consonant signal.Such as, for m-th speech frame, above-mentioned judgment mode can be following Formula represents:
EB 2 m E m &GreaterEqual; 0.8 - - - ( 9 )
In the present embodiment, default ratio is 0.8, but is not limited, and presets ratio in some embodiments Value is alternatively other values, is shown below:
EB 2 m E m &GreaterEqual; 0 . 35 - - - ( 10 )
In formula (7), default ratio is 0.35.
It addition, whether processing unit 104 also can be according to raw tone sampled signal energy more than or equal to lower limit Value judges whether the raw tone sampled signal corresponding to target voice frame is consonant signal.Such as, right For m-th speech frame, above-mentioned judgment mode can be with following formula subrepresentation:
Em≥50 (11)
In the present embodiment, lower limit is 50, but is not limited, and lower limit is also in some embodiments Can be adjusted according to practical situation.
Owing to consonant signal there may be the situation appearance that energy varies in size, in the part that energy comparison is little Can may be considered noise, for avoiding this situation, judge that raw tone takes except above-mentioned according to energy Whether sample signal is outside consonant signal, according to zero-crossing rate, processing unit 104 also can judge that raw tone takes Whether sample signal is consonant signal.Processing unit 104 can calculate the first zero passage of raw tone sampled signal Rate, the second zero-crossing rate and the 3rd zero-crossing rate, and many before calculating target voice frame and target voice frame The average zero-crossing rate of the raw tone sampled signal of individual speech frame, with obtain the first average zero-crossing rate, second Average zero-crossing rate and the 3rd average zero-crossing rate, and according to the first average zero-crossing rate, the second average zero-crossing rate And the 3rd average zero-crossing rate whether be respectively greater than and judge target equal to the default average zero-crossing rate of its correspondence Whether the raw tone sampled signal corresponding to speech frame is consonant signal.Wherein the first zero-crossing rate, second It is pre-by first that zero-crossing rate and the 3rd zero-crossing rate are respectively original phonetic sampling signal in target voice frame If value, the second preset value and the number of times of the 3rd preset value, wherein the second preset value is less than the first preset value And more than the 3rd preset value.
For m-th speech frame, original zero-crossing rateCan be shown below:
Z m 0 = &Sigma; j = 1 N - 1 0.5 { sgn [ x ^ m ( mL + j ) ] - sgn [ x ^ m ( mL + j - 1 ) ] } - - - ( 12 )
Wherein N is positive integer, and it represents the number of the sampled signal in m-th speech frame, and mL is Amplitude threshold value, andFor the raw tone sampled signal in m-th speech frame.Processing unit 104 Can foundationWhether preset zero-crossing rate to judge whether raw tone sampled signal is consonant letter more than or equal to one Number, such as can judge according to following formula:
Z m 0 &GreaterEqual; 22 - - - ( 13 )
Wherein presetting zero-crossing rate not to be limited with 22, its value also can be entered according to practical situation in some embodiments Row sum-equal matrix.Additionally, processing unit 104 additionally can comprise energy condition according to raw tone sampled signal Zero-crossing rateJudge whether raw tone sampled signal is consonant signal, zero-crossing rateCan It is shown below:
Z m + = &Sigma; j = 1 N - 1 0.5 { sgn [ x m + ( mL + j ) ] - sgn [ x m + ( mL + j - 1 ) ] } - - - ( 14 )
Z m - = &Sigma; j = 1 N - 1 0.5 { sgn [ x m - ( mL + j ) ] - sgn [ x m - ( mL + j - 1 ) ] } - - - ( 15 )
WhereinCan represent with following formula:
x m + ( j ) = x ^ m ( j + mL ) - &alpha; x F m - - - ( 16 )
x m - ( j ) = x ^ m ( j + mL ) + &alpha; x F m - - - ( 17 )
In the present embodiment, αxValue be 0.5, but be not limited, its value also may be used in some embodiments It is adjusted according to practical situation.Thus by the benchmark of Adjustable calculation zero-crossing rate, can judge former more accurately Whether beginning phonetic sampling signal is consonant signal.Processing unit 104 also can average according to multiple speech frames Zero-crossing rate judges whether raw tone sampled signal is consonant signal, for example, to m-th voice For frame, can be according to its zero-crossing rate with nearest two speech frames (namely m-1, m-2 speech frame) Meansigma methods judges whether raw tone sampled signal is consonant signal, and it judges that formula can be as follows:
Z m 0 + Z m - 1 0 + Z m - 2 0 3 &GreaterEqual; 34 - - - ( 18 )
Z m + + Z m - 1 + + Z m - 2 + 3 &GreaterEqual; 30 - - - ( 19 )
Z m - + Z m - 1 - + Z m - 2 - 3 &GreaterEqual; 30 - - - ( 20 )
Described in example performed as described above, processing unit 104 can judge former according to energy or zero-crossing rate at least one Whether beginning phonetic sampling signal is consonant signal, namely processing unit 104 can the condition of summary formula At least one judges whether the raw tone sampled signal of corresponding target voice frame is consonant signal.Citing For, processing unit 104 can determine whether that formula (5), (7), (9), (11), (13), (18), (19), (20) are No meet simultaneously, just judge that the raw tone sampled signal of correspondence target voice frame is consonant if meet simultaneously Signal.The most such as, processing unit 104 also can determine whether formula (5), (8), (10), (11), (13), (18), (19), (20) meet, if meet the raw tone sampling just judging corresponding target voice frame simultaneously the most simultaneously Signal is consonant signal.
Fig. 2 A~2B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention, refer to figure 2A~2B.From above-described embodiment, the speech identifying method of voice identification apparatus can comprise the following steps. First, voice signal is carried out the first consonant frequency range and the bandpass filtering of the second consonant frequency range, with respectively Produce the first bandpass filtered signal and the second bandpass filtered signal (step S202).Then, by voice signal, First bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frame (step S204), respectively Speech frame includes N number of sampled signal, and N is positive integer.Then, sampled signal in target voice frame is calculated Energy, to obtain a raw tone sampled signal energy, one first consonant frequency band signals energy and Second consonant frequency band signals energy (step S206).Afterwards, according to the first consonant frequency band signals energy and the The ratio of two consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy The ratio of amount and the ratio in judgement pair of the second consonant frequency band signals energy and raw tone sampled signal energy Whether the raw tone sampled signal answering target voice frame is noise (step S208).Such as, can determine whether One consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy Measure ratio and the second consonant frequency band signals energy with raw tone sampled signal energy to take with raw tone The ratio of sample signal energy falls within the default ratio range of correspondence the most respectively, if the first consonant frequency band signals Energy and the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy take with raw tone The ratio of sample signal energy and the second consonant frequency band signals energy and the ratio of raw tone sampled signal energy Value falls within the default ratio range of correspondence respectively, then the raw tone sampled signal of target voice frame is noise Signal.
Afterwards, the speech frame of multiple raw tone sampled signal being judged as noise signal before is calculated Energy weighted mean, to obtain noise signal energy weighted mean (step S210).Then mesh is judged Whether mark raw tone sampled signal energy corresponding to speech frame is more than noise signal energy weighted mean (step S212), wherein corresponding each is judged as the speech frame of raw tone sampled signal of noise signal Weighted value can be with corresponding each speech frame of raw tone sampled signal being judged as noise signal and mesh Interval length between mark speech frame is different and changes.If the raw tone sampling corresponding to target voice frame Signal energy is more than noise signal energy weighted mean, then judge corresponding to target voice frame is original Phonetic sampling signal non-consonant signal (step S214).If on the contrary, original corresponding to target voice frame Phonetic sampling signal energy be more than noise signal energy weighted mean, then calculate multiple before be judged as The first consonant frequency band signals energy corresponding to raw tone sampled signal of noise signal takes with raw tone The weighted mean of the ratio of sample signal energy, to obtain the first consonant energy proportion weighted mean (step S216).Judge that the first consonant frequency band signals energy corresponding to target voice frame takes with raw tone the most again Whether the ratio of sample signal energy is less than the first consonant energy proportion weighted mean (step S218), wherein Corresponding each is judged as the first consonant frequency band signals corresponding to raw tone sampled signal of noise signal Energy is judged as noise signal with the weighted value of the ratio of raw tone sampled signal energy with corresponding each Raw tone sampled signal speech frame and target voice frame between interval length different and change.
If the first consonant frequency band signals energy corresponding to target voice frame and raw tone sampled signal energy Ratio less than the first consonant energy proportion weighted mean, the then original language corresponding to target voice frame Sound sampled signal non-consonant signal (step S214).If on the contrary, first auxiliary corresponding to target voice frame Audio band signals energy weights less than the first consonant energy proportion with the ratio of raw tone sampled signal energy Meansigma methods, the most then judges the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy Whether more than or equal to presetting ratio (step S220).If the second consonant frequency band signals energy takes with raw tone The ratio of sample signal energy is not more than or equal to presetting ratio, then the raw tone corresponding to target voice frame takes Sample signal non-consonant signal (step S214).If on the contrary, the second consonant frequency band signals energy and original language The ratio of sound sampled signal energy more than or equal to presetting ratio, then judges that raw tone sampled signal energy is No more than or equal to lower limit (step S222).If raw tone sampled signal energy is not more than or equal to lower limit, The then raw tone sampled signal non-consonant signal (step S214) corresponding to target voice frame.
If on the contrary, raw tone sampled signal energy is more than or equal to lower limit, the most then calculating this original First zero-crossing rate of phonetic sampling signal, the second zero-crossing rate and the 3rd zero-crossing rate, and calculate target voice The average zero-crossing rate of the raw tone sampled signal of the multiple speech frames before frame and target voice frame, with To one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd average zero-crossing rate (step S224). Wherein the first zero-crossing rate, the second zero-crossing rate and the 3rd zero-crossing rate are respectively original language in target voice frame Sound sampled signal by the first preset value, the second preset value and the number of times of the 3rd preset value, wherein second Preset value is less than the first preset value and more than the 3rd preset value.Judge the most again the first average zero-crossing rate, Whether two average zero-crossing rates and the 3rd average zero-crossing rate are respectively greater than the default average zero passage equal to its correspondence Rate (step S226).If the first average zero-crossing rate, the second average zero-crossing rate and the 3rd average zero-crossing rate are not All more than or equal to the default average zero-crossing rate of its correspondence, then the raw tone sampling corresponding to target voice frame Signal non-consonant signal (step S214).If on the contrary, the first average zero-crossing rate, the second average zero-crossing rate And the 3rd average zero-crossing rate more than or equal to the default average zero-crossing rate of its correspondence, the most then judge the second mistake Whether zero rate is more than or equal to presetting zero-crossing rate (step S228).If the second zero-crossing rate is more than or equal to presetting Zero rate, then the raw tone sampled signal non-consonant signal (step S214) corresponding to target voice frame.Phase Instead, if the second zero-crossing rate is more than or equal to presetting zero-crossing rate, the then raw tone corresponding to target voice frame Sampled signal is consonant signal (step S230).
Fig. 3 A~3B illustrates the schematic flow sheet of the speech identifying method of one embodiment of the invention, refer to figure 3A~3B.The present embodiment is with the difference of Fig. 2 A~2B embodiment, and the present embodiment is in step S212 Judge that the raw tone sampled signal energy corresponding to target voice frame is flat more than noise signal energy weighting After average, then judge the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy with And first the ratio sum of consonant frequency band signals energy and raw tone sampled signal energy whether be more than or equal to Preset and value (step S302), if the second consonant frequency band signals energy and raw tone sampled signal energy The ratio sum of ratio and the first consonant frequency band signals energy and raw tone sampled signal energy is not more than In presetting and value, then the raw tone sampled signal non-consonant signal (step corresponding to target voice frame S214).If on the contrary, the ratio of the second consonant frequency band signals energy and raw tone sampled signal energy with The ratio sum of the first consonant frequency band signals energy and raw tone sampled signal energy more than or equal to presetting and Value, then be directly entered step S220, it is judged that the second consonant frequency band signals energy and raw tone sampled signal Whether the ratio of energy is more than or equal to preset ratio, and as Fig. 2 A~2B embodiment continue executing with voice below The step of discrimination method.
In sum, embodiments of the invention the condition at least one of summary formula can judge correspondence Whether the raw tone sampled signal of target voice frame is consonant signal, accurate to improve the identification of consonant signal Exactness.Such as can according to the first consonant frequency band signals energy and the ratio of the second consonant frequency band signals energy, First consonant frequency band signals energy and the ratio of raw tone sampled signal energy and the second consonant frequency range are believed Number energy samples with the raw tone of the ratio in judgement corresponding target voice frame of raw tone sampled signal energy Whether signal is noise, to lower the situation generation that raw tone sampled signal is mistaken for consonant signal, And then improve the identification precision of consonant signal.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, rather than right It limits;Although the present invention being described in detail with reference to foregoing embodiments, this area common Skilled artisans appreciate that the technical scheme described in foregoing embodiments still can be modified by it, Or the most some or all of technical characteristic is carried out equivalent;And these amendments or replacement, and The essence not making appropriate technical solution departs from the scope of various embodiments of the present invention technical scheme.

Claims (22)

1. a voice identification apparatus, it is characterised in that including:
One bandpass filtering unit, carries out one first consonant frequency range and one second consonant frequency to a voice signal The bandpass filtering of section, to produce one first bandpass filtered signal and one second bandpass filtered signal respectively; And
One processing unit, couples this bandpass filtering unit, by this voice signal, this first bandpass filtering letter Number and this second bandpass filtered signal be divided into multiple speech frame, respectively this speech frame includes N number of taking Sample signal, N is positive integer, calculates the energy of sampled signal in target voice frame, to obtain an original language Sound sampled signal energy, one first consonant frequency band signals energy and one second consonant frequency band signals energy, According to this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this is first auxiliary The ratio of audio band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals The ratio in judgement of energy and this raw tone sampled signal energy is to should the raw tone of target voice frame take Whether sample signal is noise.
Voice identification apparatus the most according to claim 1, it is characterised in that this processing unit is also Judge this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this is first auxiliary The ratio of audio band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals Energy falls within corresponding default ratio range the most respectively with the ratio of this raw tone sampled signal energy, If this first consonant frequency band signals energy and the ratio of this second consonant frequency band signals energy, this first consonant The ratio of frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency band signals energy Amount falls within corresponding default ratio range, then this mesh respectively with the ratio of this raw tone sampled signal energy The raw tone sampled signal of mark speech frame is noise signal.
Voice identification apparatus the most according to claim 1, it is characterised in that this processing unit is also The energy weighting of the speech frame calculating multiple raw tone sampled signal being judged as noise signal before is flat Average, to obtain a noise signal energy weighted mean, and former according to corresponding to this target voice frame Whether beginning phonetic sampling signal energy judges this target voice more than this noise signal energy weighted mean Whether the raw tone sampled signal corresponding to frame is consonant signal.
Voice identification apparatus the most according to claim 3, it is characterised in that respectively this is judged to correspondence With correspondence, respectively this is judged as making an uproar the weighted value of speech frame of the raw tone sampled signal for noise signal of breaking Interval length between speech frame and this target voice frame of the raw tone sampled signal of acoustical signal different and Change.
Voice identification apparatus the most according to claim 3, it is characterised in that this processing unit is also First auxiliary with this according to this second consonant frequency band signals energy and the ratio of this raw tone sampled signal energy Audio band signals energy is the most default more than or equal to one with the ratio sum of this raw tone sampled signal energy Judge whether the raw tone sampled signal corresponding to this target voice frame is consonant signal with value.
Voice identification apparatus the most according to claim 5, it is characterised in that this processing unit is also Calculate multiple before be judged as noise signal raw tone sampled signal speech frame corresponding to this The weighted mean of the ratio of one consonant frequency band signals energy and this raw tone sampled signal energy, with To one first consonant energy proportion weighted mean and first auxiliary according to this corresponding to this target voice frame Whether audio band signals energy is less than this first consonant energy with the ratio of this raw tone sampled signal energy Proportion weighted meansigma methods judges whether this raw tone sampled signal corresponding to target voice frame is consonant Signal.
Voice identification apparatus the most according to claim 6, it is characterised in that respectively this is judged to correspondence Break this first consonant frequency band signals energy corresponding to the speech frame of the raw tone sampled signal of noise signal Measure with the weighted value of the ratio of this raw tone sampled signal energy that respectively this is judged as noise signal with corresponding Raw tone sampled signal speech frame and this target voice frame between interval length different and change.
Voice identification apparatus the most according to claim 6, it is characterised in that this processing unit is also Whether it is more than according to the ratio of this second consonant frequency band signals energy Yu this raw tone sampled signal energy Whether preset ratio to judge the raw tone sampled signal corresponding to this target voice frame in one is consonant letter Number.
Voice identification apparatus the most according to claim 8, it is characterised in that this processing unit is also Whether this target voice frame institute is judged more than or equal to a lower limit according to this raw tone sampled signal energy Whether corresponding raw tone sampled signal is consonant signal.
Voice identification apparatus the most according to claim 9, it is characterised in that this processing unit is also Calculate the first zero-crossing rate of this raw tone sampled signal, the second zero-crossing rate and the 3rd zero-crossing rate, and count Calculate raw tone sampled signal flat of multiple speech frames before this target voice frame and this target voice frame All zero-crossing rates, to obtain one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd average mistake Zero rate, and according to this first average zero-crossing rate, this second average zero-crossing rate and the 3rd average zero-crossing rate Whether it is respectively greater than former equal to what the default average zero-crossing rate of its correspondence judged corresponding to this target voice frame Whether beginning phonetic sampling signal is consonant signal, this first zero-crossing rate, this second zero-crossing rate and the 3rd Zero-crossing rate be respectively in this target voice frame this raw tone sampled signal by one first preset value, one Second preset value and the number of times of one the 3rd preset value, this second preset value is less than this first preset value and big In the 3rd preset value.
11. voice identification apparatus according to claim 10, it is characterised in that this processing unit Also whether preset zero-crossing rate to judge corresponding to this target voice frame more than or equal to one according to this second zero-crossing rate Raw tone sampled signal whether be consonant signal.
12. 1 kinds of speech identifying methods, it is characterised in that including:
One voice signal is carried out one first consonant frequency range and the bandpass filtering of one second consonant frequency range, with Produce one first bandpass filtered signal and one second bandpass filtered signal respectively;
This voice signal, this first bandpass filtered signal are divided into multiple language with this second bandpass filtered signal Sound frame, respectively this speech frame includes N number of sampled signal, and N is positive integer;
Calculate the energy of sampled signal in target voice frame, with obtain a raw tone sampled signal energy, One first consonant frequency band signals energy and one second consonant frequency band signals energy;And
According to this first consonant frequency band signals energy and this second consonant frequency band signals energy ratio, this The ratio of one consonant frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency range The ratio in judgement of signal energy and this raw tone sampled signal energy is to should the original language of target voice frame Whether sound sampled signal is noise.
13. speech identifying methods according to claim 12, it is characterised in that also include:
Judge this first consonant frequency band signals energy and this second consonant frequency band signals energy ratio, this The ratio of one consonant frequency band signals energy and this raw tone sampled signal energy and this second consonant frequency range Signal energy falls within corresponding default ratio model the most respectively with the ratio of this raw tone sampled signal energy Enclose;And
If the ratio of this first consonant frequency band signals energy and this second consonant frequency band signals energy, this first Consonant frequency band signals energy and the ratio of this raw tone sampled signal energy and this second consonant frequency range are believed Number energy falls within corresponding default ratio range respectively with the ratio of this raw tone sampled signal energy, then The raw tone sampled signal of this target voice frame is noise signal.
14. speech identifying methods according to claim 12, it is characterised in that also include:
The energy of the speech frame calculating multiple raw tone sampled signal being judged as noise signal before adds Weight average value, to obtain a noise signal energy weighted mean;And
According to the raw tone sampled signal energy corresponding to this target voice frame whether more than this noise signal Energy weighted mean judges whether this raw tone sampled signal corresponding to target voice frame is consonant Signal.
15. speech identifying methods according to claim 14, it is characterised in that correspondence respectively this quilt It is judged as that respectively this is judged as with correspondence for the weighted value of speech frame of the raw tone sampled signal of noise signal Interval length between speech frame and this target voice frame of the raw tone sampled signal of noise signal is different And change.
16. speech identifying methods according to claim 14, it is characterised in that also include:
According to this second consonant frequency band signals energy and this raw tone sampled signal energy ratio with this Whether one consonant frequency band signals energy is more than or equal to one with the ratio sum of this raw tone sampled signal energy Preset and value judges whether this raw tone sampled signal corresponding to target voice frame is consonant signal.
17. speech identifying methods according to claim 16, it is characterised in that also include:
Calculate corresponding to the speech frame of multiple raw tone sampled signal being judged as noise signal before The weighted mean of the ratio of this first consonant frequency band signals energy and this raw tone sampled signal energy, To obtain one first consonant energy proportion weighted mean;And
Sample with this raw tone according to this first consonant frequency band signals energy corresponding to this target voice frame Whether the ratio of signal energy judges this target voice less than this first consonant energy proportion weighted mean Whether the raw tone sampled signal corresponding to frame is consonant signal.
18. speech identifying methods according to claim 17, it is characterised in that correspondence respectively this quilt It is judged as this first consonant frequency band signals energy corresponding to the raw tone sampled signal of noise signal and is somebody's turn to do With correspondence, respectively this is judged as the original of noise signal to the weighted value of the ratio of raw tone sampled signal energy Interval length between speech frame and this target voice frame of phonetic sampling signal is different and changes.
19. speech identifying methods according to claim 17, it is characterised in that also include:
The biggest with the ratio of this raw tone sampled signal energy according to this second consonant frequency band signals energy Whether it is auxiliary in presetting ratio equal to one to judge the raw tone sampled signal corresponding to this target voice frame Tone signal.
20. speech identifying methods according to claim 19, it is characterised in that also include:
Whether this target voice is judged more than or equal to a lower limit according to this raw tone sampled signal energy Whether the raw tone sampled signal corresponding to frame is consonant signal.
21. speech identifying methods according to claim 20, it is characterised in that also include:
Calculate the first zero-crossing rate of this raw tone sampled signal, the second zero-crossing rate and the 3rd zero-crossing rate, And calculate the raw tone sampled signal of the most N number of speech frame before this target voice frame and this target voice frame Average zero-crossing rate, flat to obtain one first average zero-crossing rate, one second average zero-crossing rate and one the 3rd All zero-crossing rates, wherein N is positive integer, this first zero-crossing rate, this second zero-crossing rate and the 3rd zero passage Rate be respectively in this target voice frame this raw tone sampled signal by one first preset value, one second Preset value and the number of times of one the 3rd preset value, this second preset value is less than this first preset value and more than being somebody's turn to do 3rd preset value;And
According to this first average zero-crossing rate, this second average zero-crossing rate and the 3rd average zero-crossing rate whether It is respectively greater than and judges the original language corresponding to this target voice frame equal to the default average zero-crossing rate of its correspondence Whether sound sampled signal is consonant signal.
22. speech identifying methods according to claim 21, it is characterised in that also include:
Zero-crossing rate whether is preset more than or equal to one right to judge this target voice frame institute according to this second zero-crossing rate Whether the raw tone sampled signal answered is consonant signal.
CN201510059977.5A 2015-02-05 2015-02-05 Voice recognition device and voice recognition method Active CN105989834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510059977.5A CN105989834B (en) 2015-02-05 2015-02-05 Voice recognition device and voice recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510059977.5A CN105989834B (en) 2015-02-05 2015-02-05 Voice recognition device and voice recognition method

Publications (2)

Publication Number Publication Date
CN105989834A true CN105989834A (en) 2016-10-05
CN105989834B CN105989834B (en) 2019-12-24

Family

ID=57036196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510059977.5A Active CN105989834B (en) 2015-02-05 2015-02-05 Voice recognition device and voice recognition method

Country Status (1)

Country Link
CN (1) CN105989834B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461090A (en) * 2017-02-21 2018-08-28 宏碁股份有限公司 Speech signal processing device and audio signal processing method
CN108648760A (en) * 2018-04-17 2018-10-12 四川长虹电器股份有限公司 Real-time sound-groove identification System and method for
CN108922558A (en) * 2018-08-20 2018-11-30 广东小天才科技有限公司 Voice processing method, voice processing device and mobile terminal
CN113936694A (en) * 2021-12-17 2022-01-14 珠海普林芯驰科技有限公司 Real-time human voice detection method, computer device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1189664A (en) * 1997-01-29 1998-08-05 合泰半导体股份有限公司 Sub-voice discrimination method of voice coding
KR100322731B1 (en) * 1995-11-27 2002-06-20 윤종용 Voice recognition method and method of normalizing time of voice pattern adapted therefor
CN1431650A (en) * 2003-02-21 2003-07-23 清华大学 Antinoise voice recognition method based on weighted local energy
CN1598927A (en) * 2004-08-31 2005-03-23 四川微迪数字技术有限公司 Chinese voice signal process method for digital deaf-aid
US20070225972A1 (en) * 2006-03-18 2007-09-27 Samsung Electronics Co., Ltd. Speech signal classification system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100322731B1 (en) * 1995-11-27 2002-06-20 윤종용 Voice recognition method and method of normalizing time of voice pattern adapted therefor
CN1189664A (en) * 1997-01-29 1998-08-05 合泰半导体股份有限公司 Sub-voice discrimination method of voice coding
CN1431650A (en) * 2003-02-21 2003-07-23 清华大学 Antinoise voice recognition method based on weighted local energy
CN1598927A (en) * 2004-08-31 2005-03-23 四川微迪数字技术有限公司 Chinese voice signal process method for digital deaf-aid
US20070225972A1 (en) * 2006-03-18 2007-09-27 Samsung Electronics Co., Ltd. Speech signal classification system and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108461090A (en) * 2017-02-21 2018-08-28 宏碁股份有限公司 Speech signal processing device and audio signal processing method
CN108461090B (en) * 2017-02-21 2021-07-06 宏碁股份有限公司 Speech signal processing apparatus and speech signal processing method
CN108648760A (en) * 2018-04-17 2018-10-12 四川长虹电器股份有限公司 Real-time sound-groove identification System and method for
CN108922558A (en) * 2018-08-20 2018-11-30 广东小天才科技有限公司 Voice processing method, voice processing device and mobile terminal
CN113936694A (en) * 2021-12-17 2022-01-14 珠海普林芯驰科技有限公司 Real-time human voice detection method, computer device and computer readable storage medium

Also Published As

Publication number Publication date
CN105989834B (en) 2019-12-24

Similar Documents

Publication Publication Date Title
CN103578468B (en) The method of adjustment and electronic equipment of a kind of confidence coefficient threshold of voice recognition
EP2413313B1 (en) Method and device for audio signal classification
CN105989834A (en) Voice recognition apparatus and voice recognition method
KR20190045278A (en) A voice quality evaluation method and a voice quality evaluation apparatus
CN105321528A (en) Microphone array voice detection method and device
CN111429932A (en) Voice noise reduction method, device, equipment and medium
CN110070888A (en) A kind of Parkinson&#39;s audio recognition method based on convolutional neural networks
CN106504760B (en) Broadband ambient noise and speech Separation detection system and method
CN106448696A (en) Adaptive high-pass filtering speech noise reduction method based on background noise estimation
JP2017027076A (en) Method and apparatus for detecting correctness of pitch period
US20160217787A1 (en) Speech recognition apparatus and speech recognition method
CN105916090A (en) Hearing aid system based on intelligent speech recognition technology
CN107690034A (en) Intelligent scene mode switching system and method based on environmental background sound
US9495973B2 (en) Speech recognition apparatus and speech recognition method
CN110211596A (en) One kind composing entropy cetacean whistle signal detection method based on Mel subband
CN111599372B (en) Stable on-line multi-channel voice dereverberation method and system
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
CN113763966B (en) End-to-end text irrelevant voiceprint recognition method and system
CN116935894B (en) Micro-motor abnormal sound identification method and system based on time-frequency domain mutation characteristics
CN105989835A (en) Voice recognition apparatus and voice recognition method
CN104424954B (en) noise estimation method and device
CN109377982A (en) A kind of efficient voice acquisition methods
CN108717851B (en) Voice recognition method and device
Fraile et al. Mfcc-based remote pathology detection on speech transmitted through the telephone channel-impact of linear distortions: Band limitation, frequency response and noise
CN116230018A (en) Synthetic voice quality evaluation method for voice synthesis system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant