CN103366740A

CN103366740A - Voice command recognition method and voice command recognition device

Info

Publication number: CN103366740A
Application number: CN2012100848242A
Authority: CN
Inventors: 袁媛
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2012-03-27
Filing date: 2012-03-27
Publication date: 2013-10-23
Anticipated expiration: 2032-03-27
Also published as: CN103366740B

Abstract

The invention discloses a voice command recognition method and a voice command recognition device and relates to the technical field of voice control. The method and the device can improve the voice recognition rate and make the operation more convenient. The method of the invention comprises the steps of receiving an audio signal, decomposing and filtering the audio signal according to an effective voice command feature to obtain a voice sample, and carrying out semantic recognition on the voice sample and determining a corresponding voice command. The voice command recognition method and the voice command recognition device are mainly used in the process of voice command recognition.

Description

Voice command recognition methods and device

Technical field

The present invention relates to the sound control technique field, relate in particular to a kind of voice command recognition methods and device.

Background technology

Along with the development of sound control technique, sound control technique is widely applied in people's daily life and the work.Sound control technique be a kind of can be with the control technology of human speech as input command, inevitably can run into the aliasing of the noises such as user's voice and ambient noise, other staff's voice during use, therefore how the filtered voice that non-important sound source is sent, and accurately identify the voice command of important sound source, become the major issue that voice-operated device need to solve.Accordingly, voice-operated device becomes the important topic of paying close attention in the industry to the accuracy of speech recognition and the friendly of voice-operated device.

In the prior art, voice-operated device only can be identified predetermined voice.For example, the manipulator of voice-operated device is owner A, then behind the speech samples by a large amount of owner A of typing, the speech samples of owner A is stored as the standard commands database, as the foundation of voice command identification.Owner B because the features such as the sound frequency of owner B and owner A, tone color are different, even send same voice command, can not be identified when controlling voice-operated device.

Therefore, in realization in the process of predicate sound command recognition, the inventor finds that there are the following problems at least in the prior art: because according to the manipulator's of in advance typing the speech samples basis of characterization as voice command, the personnel that control of voice-operated device are restricted, and cause phonetic recognization rate low; And any manipulator must carry out the typing in a large amount of standard commands storehouses before using voice-operated device, increased operation easier, causes use procedure unfriendly.

Summary of the invention

Embodiments of the invention provide a kind of voice command recognition methods and device, can improve phonetic recognization rate, and so that operating process is more convenient.

For achieving the above object, embodiments of the invention adopt following technical scheme:

A kind of voice command recognition methods comprises:

Received audio signal;

Described sound signal is decomposed and filter according to the efficient voice command characteristics, obtain speech samples;

Described speech samples is carried out semanteme identification, determine corresponding voice command.

A kind of voice command recognition device comprises:

The audio frequency receiving element is used for received audio signal;

The sample extraction unit for according to the efficient voice command characteristics described sound signal being decomposed and filtering, obtains speech samples;

The command recognition unit is used for described speech samples is carried out semanteme identification, determines corresponding voice command.

Voice command recognition methods and device that the embodiment of the invention provides, the sound signal that receives is decomposed and filter according to the efficient voice command characteristics, carry out again semanteme identification and determine voice command, with existing the sound signal that receives is compared with the technology that owner's speech samples of typing mates, can not limit the user of voice command recognition device, raising is to the discrimination of voice command, and need not a large amount of speech samples of in advance typing, so that operation is more convenient.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Figure l is the voice command recognition methods process flow diagram of embodiment of the invention l;

Fig. 2 is a kind of voice command recognition methods process flow diagram in the embodiment of the invention 2;

Fig. 3 is the another kind of voice command recognition methods process flow diagram in the embodiment of the invention 2;

Fig. 4 is that a kind of voice command recognition device in the embodiment of the invention 3 forms schematic diagram;

Fig. 5 is that the another kind of voice command recognition device in the embodiment of the invention 3 forms schematic diagram;

Fig. 6 is that the another kind of voice command recognition device in the embodiment of the invention 3 forms schematic diagram.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

Embodiment 1

The embodiment of the invention provides a kind of voice command recognition methods, and shown in figure l, the method can comprise:

101, received audio signal.

Wherein, the source of sound signal is not limited to specific user, can be adult or children, sex etc., and voice command recognition methods provided by the invention can receive and identify the voice command of the human language of various different tone colors.Under special circumstances, for example the voice command recognition device does not wish to be used by children, or the sound of looking children is not specific voice command, then can decompose with filter process in, with unwanted sound filtering.

And in actual mechanical process, directly received audio signal carries out relevance filtering and identifying operation, and typing user's sample sound storehouse so that the voice command recognition device more is simple and easy to usefulness, is improved the user and experienced in advance.

102, according to the efficient voice command characteristics described sound signal is decomposed and filter, obtain speech samples.

Wherein, the efficient voice command characteristics can be set according to the practical application needs, for example, frequency is higher and sound signal that sound is very brief can be considered as be children's sound, perhaps in whole sound signal the low-frequency sound of sustainable existence can be considered as be environmental noise etc., these all do not meet the efficient voice command characteristics, therefore can with the filtering of unconcerned sound composition, obtain satisfactory efficient voice order.

103, described speech samples is carried out semanteme identification, determine corresponding voice command.

Wherein, after in step 102, obtaining speech samples, described speech samples is carried out semanteme identification, determine that the method for corresponding voice command is specifically as follows: the sound characteristic point that the sound characteristic point of described speech samples is corresponding with voice command in the voice command material database mates; Determine matching rate the highest and reach the voice command of regulation matching rate.Described speech samples is carried out semanteme identification, the method of determining corresponding voice command specifically also can for: the sound characteristic point of described speech samples and the keyword feature point in the voice command material database are mated, determine to reach the keyword of regulation matching rate; Determine corresponding voice command according to described keyword.

The voice command recognition methods that the embodiment of the invention provides, the sound signal that receives is decomposed and filter according to the efficient voice command characteristics, carry out again semanteme identification and determine voice command, with existing the sound signal that receives is compared with the technology that owner's speech samples of typing mates, can not limit the user of voice command recognition device, raising is to the discrimination of voice command, and need not a large amount of speech samples of in advance typing, so that operation is more convenient.

Embodiment 2

The embodiment of the invention provides a kind of voice command recognition methods, and as shown in Figure 2, the method can comprise:

201, received audio signal.

202, analyze the sound signal receive in the audio frequency receiving cycle, the starting point of screening human speech in the time-domain signal intercepts the time-domain signal of effective human speech.

Wherein, complete audio frequency receives the voice duration of cycle duration and a voice command may be not identical, perhaps may receive a plurality of human languages in a complete audio frequency receiving cycle, or a plurality of voice command.Therefore, can analyze the sound signal that in the audio frequency receiving cycle, receives, screen the starting point of human speech in the time-domain signal, intercept the time-domain signal of effective human speech.

If 203 are truncated to the time-domain signal of at least two effective human speeches in described audio frequency receiving cycle, then the temporal signatures according to the efficient voice order filters out the sound signal that meets the time domain requirement.

Wherein, if in an audio frequency receiving cycle, be truncated to the time-domain signal of a more than effective human speech, comprise at least two time-domain signals in the sound signal that namely in described audio frequency receiving cycle, receives.Can filter out according to the temporal signatures of efficient voice order the time-domain signal that meets the time domain requirement, as the required sound signal of subsequent treatment.Concrete, if with adult's voice as the efficient voice order, then can be according to children's sound high frequency characteristics and the duration of speaking the slightly short characteristics of being grown up, the time-domain signal of preliminary screening adult voice.

204, described sound signal is carried out frequency domain decomposition, the wave band of the too high and/or underfrequency of rejection frequency.

Wherein, to after the time-domain analysis of sound signal and filtering, can further carry out frequency-domain analysis and filtration to the sound signal after filtering through step 202-203.Concrete, frequency can be higher than the noise-filtering that the sound of first threshold is made a lot of noise as children, the sound that also frequency can be lower than Second Threshold is as the environmental noise filtering, the perhaps too high and excessively low equal filtering of sound with frequency.The threshold value of concrete frequency and the standard of filtering can be set according to the applied environment of actual speech command recognition unit, and the embodiment of the invention is not done restriction to this.

205, the sound signal of filtering through frequency domain is carried out independent component analysis, the filtering noise obtains speech samples.

Wherein, can comprise the sound that multi-acoustical sends in the sound signal that obtains after the frequency domain filtration by step 204, can further sound signal be carried out independent component analysis, filtering does not meet the noise of efficient voice command characteristics.For example, noise can comprise: background music, pet sound, children's sound etc.

In a kind of application scenarios of the embodiment of the invention, the speech samples that obtains after decomposition and the filtration directly can be mated and definite voice command, concrete grammar can comprise:

206, the sound characteristic point that the sound characteristic point of described speech samples is corresponding with voice command in the voice command material database mates.

Wherein, pre-configured described voice command material database can comprise sound characteristic point corresponding to voice command and voice command in the described voice command material database.The sound characteristic point that the sound characteristic point of described speech samples is corresponding with voice command in the voice command material database mates, if the matching rate of the sound characteristic point that the sound characteristic point of speech samples is corresponding with voice command in the voice command material database reaches the regulation matching rate, for example 75%, then can determine corresponding voice command.If the matching rate of the sound characteristic point that the sound characteristic point of speech samples is corresponding with voice command in the voice command material database is lower than described regulation matching rate, then can be considered as invalid speech samples, withdraw from the voice command identification process, or prompting user re-enters.

Be understandable that, the concrete numerical value of described regulation matching rate can be regulated according in the practical application voice command being identified required susceptibility, and the embodiment of the invention is not done restriction to this.

207, determine matching rate the highest and reach the voice command of regulation matching rate.

Wherein, there is and only has one if satisfy the speech samples of regulation matching rate, then can directly determine corresponding voice command; There are at least two if satisfy the speech samples of regulation matching rate, then can select the highest speech samples of matching rate, and determine the highest with this matching rate and reach the voice command corresponding to speech samples of regulation matching rate.

In addition, also can will reach the voice command of regulation matching rate show so that the required voice command of user selection or re-enter.Concrete, there are at least two if satisfy the speech samples of regulation matching rate, can determine the voice command of at least two correspondences, and the voice command of described a plurality of correspondences is presented, so that the operation that the required voice command of user selection is corresponding perhaps selects to re-enter voice command.

208, carry out operation corresponding to described voice command.

Wherein, the operation that voice command is corresponding can specifically be set according to the equipment of working control, and for example, the operation that " lower one page " is corresponding can be the page turning of PPT or e-book; The voice commands such as " beginning ", " time-out ", " withdrawing from " can be corresponding to the relevant control operation of application program.

In the another kind of application scenarios of the embodiment of the invention, can mate the keyword that obtains correspondence with decomposition with in the speech samples that obtains after filtering, thereby determine corresponding voice command.Concrete grammar as shown in Figure 3, above

step

206 and 207 also can replace with following steps:

209, the sound characteristic point of described speech samples and the keyword feature point in the voice command material database are mated, determine to reach the keyword of regulation matching rate.

Wherein, pre-configured described voice command material database can comprise voice command, keyword and keyword feature point that voice command is corresponding in the described voice command material database.The sound characteristic point of speech samples and the keyword feature point in the voice command material database are mated, if the matching rate of the keyword feature point in the sound characteristic point of speech samples and the voice command material database reaches the regulation matching rate, for example 75%, then can determine corresponding keyword.If the matching rate of the keyword feature point in the sound characteristic point of neither one speech samples and the voice command material database reaches described regulation matching rate, then can be considered as invalid speech samples, withdraw from the voice command identification process, or prompting user re-enters.

210, determine corresponding voice command according to described keyword.

Wherein, if coupling obtains the keyword of a matching rate requirement up to specification, then can determine voice command according to this keyword that the match is successful.If coupling obtains the keyword of a plurality of matching rate requirements up to specification, also can comprehensively determine corresponding voice command according to the keyword that the match is successful.

In addition, also can show by will reach relevant with described keyword voice command, so that the required voice command of user selection or re-enter.Concrete, can determine according to keyword the voice command of a plurality of correspondences, and the voice command of described a plurality of correspondences is presented, so that operation corresponding to the required voice command of user selection, perhaps voice command is re-entered in selection.

Embodiment 3

The embodiment of the invention provides a kind of voice command recognition device, and as shown in Figure 4, this device can comprise: audio frequency receiving element 31, sample extraction unit 32, command recognition unit 33.

Audio frequency receiving element 31 is used for received audio signal.

Sample extraction unit 32 for according to the efficient voice command characteristics described sound signal being decomposed and filtering, obtains speech samples.

Command recognition unit 33 is used for described speech samples is carried out semanteme identification, determines corresponding voice command.

Further, as shown in Figure 5, this voice command recognition device can also comprise: time domain interception unit 34.

Time domain interception unit 34 is used for analyzing the sound signal that receives in receiving cycle after described audio frequency receiving element 31 receives sound signal, screens the starting point of human speech in the time-domain signal, intercepts the time-domain signal of effective human speech.

Corresponding, described sample extraction unit 32 can also be used for: the time-domain signal of described effective human speech is decomposed and filter according to the efficient voice command characteristics, obtain speech samples.

Further, this voice command recognition device can also comprise: time domain screening unit 35.

Time domain screening unit 35, be used for after the time-domain signal of the effective human speech of described time domain interception unit 34 interceptings, when in an audio frequency receiving cycle, being truncated to the time-domain signal of at least two effective human speeches, filtering out according to the temporal signatures of efficient voice order and to meet the sound signal that time domain requires.

Further, described time domain screening unit 35 specifically also is used for: according to children's sound high frequency characteristics and the duration of speaking the slightly short characteristics of being grown up, the time-domain signal of preliminary screening adult voice.

Further, described sample extraction unit 32 can comprise: the first filtering module 321, the second filtering module 322.

The first filtering module 321 is used for described sound signal is carried out frequency domain decomposition, the wave band of the too high and/or underfrequency of rejection frequency.

The second filtering module 322 is used for the sound signal of filtering through frequency domain is carried out independent component analysis, and the filtering noise obtains speech samples.

Wherein, described noise comprises: background music, pet sound, children's sound.

In a kind of application scenarios of the embodiment of the invention, described command recognition unit 33 can comprise: the first matching module 331, the first determination module 332.

The first matching module 331 is used for the sound characteristic point that the sound characteristic point of described speech samples is corresponding with the voice command of voice command material database and mates.

The first determination module 332 is used for determining matching rate the highest and reach the voice command of regulation matching rate, and the voice command that perhaps will reach the regulation matching rate shows, so that the required voice command of user selection or re-enter.

As shown in Figure 6, in the another kind of application scenarios of the embodiment of the invention, described command recognition unit 33 can comprise: the second matching module 333, the second determination module 334.

The second matching module 333 is used for the sound characteristic point of described speech samples and the keyword feature point of voice command material database are mated, and determines to reach the keyword of regulation matching rate.

The second determination module 334 is used for determining corresponding voice command according to described keyword, and voice command demonstration that perhaps will be relevant with described keyword is so that the required voice command of user selection or re-enter.

Further, this voice command recognition device can also comprise: performance element 36

Performance element 36 is used for carrying out operation corresponding to described voice command after corresponding voice command is determined in described command recognition unit 33.

The voice command recognition device that the embodiment of the invention provides, the sound signal that receives is decomposed and filter according to the efficient voice command characteristics, carry out again semanteme identification and determine voice command, with existing the sound signal that receives is compared with the technology that owner's speech samples of typing mates, can not limit the user of voice command recognition device, raising is to the discrimination of voice command, and need not a large amount of speech samples of in advance typing, so that operation is more convenient.

Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential common hardware, can certainly pass through hardware, but the former is better embodiment in a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium that can read, floppy disk such as computing machine, hard disk or CD etc., comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

The above; be the specific embodiment of the present invention only, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims

1. a voice command recognition methods is characterized in that, comprising:

Received audio signal;

2. voice command recognition methods according to claim 1 is characterized in that, behind described received audio signal, also comprises:

The sound signal that analysis receives in the audio frequency receiving cycle, the starting point of screening human speech in the time-domain signal intercepts the time-domain signal of effective human speech;

Corresponding, describedly described sound signal is decomposed and filtration is specially according to the efficient voice command characteristics: the time-domain signal of described effective human speech is decomposed and filter according to the efficient voice command characteristics.

3. voice command recognition methods according to claim 2 is characterized in that,

After the time-domain signal of the effective human speech of described intercepting, also comprise:

If be truncated to the time-domain signal of at least two effective human speeches in described audio frequency receiving cycle, then the temporal signatures according to the efficient voice order filters out the sound signal that meets the time domain requirement.

4. voice command recognition methods according to claim 3 is characterized in that, described temporal signatures according to the efficient voice order filters out the sound signal that meets the time domain requirement and comprises:

According to children's sound high frequency characteristics and the duration of speaking the slightly short characteristics of being grown up, the time-domain signal of preliminary screening adult voice.

5. voice command recognition methods according to claim 1 is characterized in that,

Describedly described sound signal decomposed and filter according to the efficient voice command characteristics, obtain speech samples, comprising:

Described sound signal is carried out frequency domain decomposition, the wave band of the too high and/or underfrequency of rejection frequency;

The sound signal of filtering through frequency domain is carried out independent component analysis, and the filtering noise obtains speech samples.

6. voice command recognition methods according to claim 5 is characterized in that, described noise comprises: background music, pet sound, children's sound.

7. voice command recognition methods according to claim 1 is characterized in that,

Described described speech samples is carried out semanteme identification, determines corresponding voice command, comprising:

The sound characteristic point that the sound characteristic point of described speech samples is corresponding with voice command in the voice command material database mates;

Determine matching rate the highest and reach the voice command of regulation matching rate, the voice command that perhaps will reach the regulation matching rate shows, so that the required voice command of user selection or re-enter.

8. voice command recognition methods according to claim 1 is characterized in that,

The sound characteristic point of described speech samples and the keyword feature point in the voice command material database are mated, determine to reach the keyword of regulation matching rate;

Determine corresponding voice command according to described keyword, voice command demonstration that perhaps will be relevant with described keyword is so that the required voice command of user selection or re-enter.

9. each described voice command recognition methods is characterized in that according to claim 1-8,

After determining corresponding voice command, also comprise:

Carry out operation corresponding to described voice command.

10. a voice command recognition device is characterized in that, comprising:

The audio frequency receiving element is used for received audio signal;

11. voice command recognition device according to claim 10 is characterized in that, also comprises:

The time domain interception unit is used for analyzing the sound signal that receives in the audio frequency receiving cycle after described audio frequency receiving element receives sound signal, screens the starting point of human speech in the time-domain signal, intercepts the time-domain signal of effective human speech;

Corresponding, described sample extraction unit also is used for: the time-domain signal of described effective human speech is decomposed and filter according to the efficient voice command characteristics, obtain speech samples.

12. voice command recognition device according to claim 11 is characterized in that, also comprises:

Time domain screening unit, be used for after described time domain interception unit intercepts the time-domain signal of effective human speech, when in described audio frequency receiving cycle, being truncated to the time-domain signal of at least two effective human speeches, filtering out according to the temporal signatures of efficient voice order and to meet the sound signal that time domain requires.

13. voice command recognition device according to claim 12 is characterized in that, described time domain screening unit specifically also is used for: according to children's sound high frequency characteristics and the duration of speaking the slightly short characteristics of being grown up, the time-domain signal of preliminary screening adult voice.

14. voice command recognition device according to claim 11 is characterized in that,

Described sample extraction unit comprises:

The first filtering module is used for described sound signal is carried out frequency domain decomposition, the wave band of the too high and/or underfrequency of rejection frequency;

The second filtering module is used for the sound signal of filtering through frequency domain is carried out independent component analysis, and the filtering noise obtains speech samples.

15. voice command recognition device according to claim 14 is characterized in that, described noise comprises: background music, pet sound, children's sound.

16. voice command recognition device according to claim 11 is characterized in that,

Described command recognition unit comprises:

The first matching module is used for the sound characteristic point that the sound characteristic point of described speech samples is corresponding with the voice command of voice command material database and mates;

The first determination module is used for determining matching rate the highest and reach the voice command of regulation matching rate, and the voice command that perhaps will reach the regulation matching rate shows, so that the required voice command of user selection or re-enter.

17. voice command recognition device according to claim 11 is characterized in that,

Described command recognition unit comprises:

The second matching module is used for the sound characteristic point of described speech samples and the keyword feature point of voice command material database are mated, and determines to reach the keyword of regulation matching rate;

The second determination module is used for determining corresponding voice command according to described keyword, and voice command demonstration that perhaps will be relevant with described keyword is so that the required voice command of user selection or re-enter.

18. each described voice command recognition device is characterized in that according to claim 11-17, also comprises:

Performance element is used for carrying out operation corresponding to described voice command after corresponding voice command is determined in described command recognition unit.