CN109599107A

CN109599107A - A kind of method, apparatus and computer storage medium of speech recognition

Info

Publication number: CN109599107A
Application number: CN201811496744.1A
Authority: CN
Inventors: 刘健军; 王慧君; 秦萍
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2019-04-09

Abstract

The invention discloses a kind of method, apparatus of speech recognition and computer storage mediums, to solve the not high technical problem of phonetic recognization rate existing in the prior art.It include: the voice for obtaining smart home device described in user's control, the current scene image information of image capture device acquisition and the sign data of user；According to the current scene image information and the sign data, effective noise source is determined；Wherein, effective noise source is that the noise source of direct interference is generated to the voice that user issues in the current scene；Effective noise source is rejected from the voice, the voice after being denoised, according to smart home device described in the voice control after the denoising.

Description

A kind of method, apparatus and computer storage medium of speech recognition

Technical field

The present invention relates to smart home fields, method, apparatus and computer storage more particularly, to a kind of speech recognition Medium.

Background technique

In smart home field, the interactive voice man-machine interaction mode novel as one kind of smart home device is wide General use.

For example, intelligent sound speaker, intelligent sound TV, intelligent sound air-conditioning etc., user can be by voice to it Controlled.

However, user, when carrying out voice control to smart home device, user may also do other things simultaneously, such as Running, cooking, blow head hair etc., will necessarily be mixed into other more make an uproar in the collected user speech of smart home device at this time Sound, this will reduce smart home device to the discrimination of user speech.

In consideration of it, how effectively to improve the phonetic recognization rate of smart home device, become a technology urgently to be resolved Problem.

Summary of the invention

The present invention provides the method, apparatus and computer storage medium of a kind of speech recognition, to solve in the prior art The not high technical problem of existing phonetic recognization rate.

In a first aspect, in order to solve the above technical problems, a kind of method of speech recognition provided in an embodiment of the present invention, application It is as follows in the technical solution of smart home device, this method:

Obtain the voice of smart home device described in user's control, the current scene image information of image capture device acquisition And the sign data of user；

According to the current scene image information and the sign data, effective noise source is determined；Wherein, described effectively to make an uproar Source of sound is that the noise source of direct interference is generated to the voice that user issues in the current scene；

Effective noise source is rejected from the voice, the voice after being denoised, according to the language after the denoising Smart home device described in sound control system.

When obtaining the voice of user's control smart home device, current scene image is also acquired by image capture device The sign data of information and user；And according to current scene image information and sign data, by current scene to user The noise source that the voice of sending generates direct interference is determined as effective noise source；And reject effective noise source from voice, it obtains Voice after must denoising, further according to the voice control smart home device after denoising.To enable smart home device according to really Effective noise source in the current scene made carries out specific aim noise reduction to user speech, to improve phonetic recognization rate.

Optionally, according to the current scene image information and the sign data, effective noise source is determined, comprising:

According to the current scene image information, the corresponding user of the current scene image information is obtained from database Noise source set present in scene；

According to the corresponding sign data range of every kind of User Status in the database, judge that the sign data is corresponding Current user state；

According to the current user state from the noise source set, effective noise source is obtained.

Optionally, according to the current user state from the noise source set, effective noise source, packet are obtained It includes:

According to the current user state, judge whether that the corresponding noise source of user's noise for generating user is made an uproar from described It is removed in source of sound set, obtains judging result；Wherein, the sound issued when user's noise is user movement；

If the judging result is to need to remove the corresponding noise source of user's noise, from the noise source set The corresponding noise source of user's noise is removed, new noise source set is obtained；

The noise source that interference value is less than preset threshold is picked out from the new noise source set, acquisition is described effectively to make an uproar Source of sound.

Optionally, effective noise source is rejected from the voice, the voice after being denoised, comprising:

After the corresponding audio signal of the effective noise source is done reverse phase processing, the corresponding audio letter of the voice that is added to Voice in number, after obtaining the denoising.

Optionally, the sign data is acquired particular by wearable device；The wearable device includes movement The group of any one or more of sensor, biosensor, environmental sensor, skin electric transducer, heart rate sensor, barometer It closes.

Second aspect, the embodiment of the invention provides a kind of devices for speech recognition, comprising:

Acquiring unit, for obtaining the voice of smart home device described in user's control, image capture device acquisition is worked as The sign data of preceding scene image information and user；

Determination unit, for determining effective noise source according to the current scene image information and the sign data；Its In, effective noise source is that the noise source of direct interference is generated to the voice that user issues in the current scene；

Culling unit, for effective noise source to be rejected from the voice, the voice after being denoised, according to institute Smart home device described in voice control after stating denoising.

Optionally, the determination unit is specifically used for:

Optionally, the determination unit is also used to:

Optionally, the culling unit is specifically used for:

Optionally, the sign data includes that movement passes particular by the wearable device that wearable device acquires The group of any one or more of sensor, biosensor, environmental sensor, skin electric transducer, heart rate sensor, barometer It closes.

The third aspect, the embodiment of the present invention also provide a kind of device for speech recognition, comprising:

At least one processor, and

The memory being connect at least one described processor；

Wherein, the memory is stored with the instruction that can be executed by least one described processor, described at least one The instruction that device is stored by executing the memory is managed, the method as described in above-mentioned first aspect is executed.

Fourth aspect, the embodiment of the present invention also provide a kind of computer readable storage medium, comprising:

The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers When, so that computer executes the method as described in above-mentioned first aspect.

The technical solution in said one or multiple embodiments through the embodiment of the present invention, the embodiment of the present invention at least have There is following technical effect:

In embodiment provided by the invention, when obtaining the voice of user's control smart home device, also pass through image Acquire equipment acquisition current scene image information and the sign data of user；And according to current scene image information and sign The noise source for generating direct interference to the voice that user issues in current scene is determined as effective noise source by data；And will have Effect noise source is rejected from voice, the voice after being denoised, further according to the voice control smart home device after denoising.To Enable smart home device according to effective noise source in the current scene determined, specific aim noise reduction carried out to user speech, To improve phonetic recognization rate.

Detailed description of the invention

Fig. 1 is a kind of flow chart of audio recognition method provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram that smart home device carries out speech recognition in running scene provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of speech recognition equipment provided in an embodiment of the present invention.

Specific embodiment

Implementation column of the present invention provides the method, apparatus and computer storage medium of a kind of speech recognition, to solve existing skill The not high technical problem of phonetic recognization rate present in art.

In order to solve the above technical problems, general thought is as follows for technical solution in the embodiment of the present application:

There is provided a kind of method of speech recognition, comprising: obtain the voice of user's control smart home device, Image Acquisition is set The current scene image information of standby acquisition and the sign data of user；According to current scene image information and sign data, Determine effective noise source；Wherein, effective noise source is to generate making an uproar for direct interference to the voice that user issues in current scene Source of sound, effective noise source is rejected from voice, the voice after being denoised, and is set according to the voice control smart home after denoising It is standby.

Due in the above scheme, when obtaining the voice of user's control smart home device, also being set by Image Acquisition The sign data of standby acquisition current scene image information and user；And according to current scene image information and sign data, The noise source for generating direct interference to the voice that user issues in current scene is determined as effective noise source；And by effective noise Source is rejected from voice, the voice after being denoised, further according to the voice control smart home device after denoising.To make intelligence Home equipment can carry out specific aim noise reduction to user speech, to mention according to effective noise source in the current scene determined High phonetic recognization rate.

In order to better understand the above technical scheme, below by attached drawing and specific embodiment to technical solution of the present invention It is described in detail, it should be understood that the specific features in the embodiment of the present invention and embodiment are to the detailed of technical solution of the present invention Thin explanation, rather than the restriction to technical solution of the present invention, in the absence of conflict, the embodiment of the present invention and embodiment In technical characteristic can be combined with each other.

Referring to FIG. 1, the embodiment of the present invention provides a kind of method of speech recognition, it is applied in smart home device, it should The treatment process of method is as follows.

Step 101: obtaining the voice of control smart home device, the current scene image letter of image capture device acquisition Breath and the sign data of user.

With extensive use of the speech recognition technology in smart home device, user is not only allowed to eliminate in intelligence to be controlled Controller can be needed to find when home equipment, or to bothering for control button is operated on smart home device, allow user at home When can control smart home device at any time.For example, remote controler can be used in control air-conditioning in the past, it now is possible to pass through voice control Air-conditioning processed.

In embodiment provided by the invention, image capture device can be camera, ccd sensor, camera etc., image Acquisition equipment can be the component part of smart home device, be also possible to external image capture device, can also be intelligence Camera on mobile phone, external image capture device can be communicated by wired mode with smart home device, It can wirelessly be communicated with smart home device, specifically without limitation.

Optionally, the sign data of user is acquired particular by wearable device.

For example, being sent out when obtaining the voice of user's control smart home device from intelligentized Furniture equipment to wearable device The sign data of user is acquired out；Or image capture device acquire current scene image information when, from image capture device to Wearable device issues the sign data of acquisition user；Or wearable device is periodically to smart home device or image capture device The sign data of report of user.

Specifically, wearable device can be Intelligent bracelet, motion sensor, biosensor, environmental sensor, skin electricity The combination of any one or more of sensor, heart rate sensor, barometer, wearable device can be by wireless networks, such as WLAN, bluetooth, Zigbee etc. are communicated with smart home device, and the sign data measured is transferred to smart home Equipment or image capture device acquisition, are communicated in which way between specific wearable device and smart home device, This is without limitation.

For example, refer to Fig. 2, when user runs on a treadmill, with the increase of running time, make user produce compared with More heats, user wants to open air-conditioning, but is unwilling to stop and be controlled with remote controler, at this point, user can run Phonetic control command is issued to air-conditioning while step, such as says " booting is adjusted to 26 DEG C " to air-conditioning.

At this point, smart home device air-conditioning gets the voice of user's control smart home device air-conditioning, and pass through image Equipment acquisition current scene image information (running scene) is acquired, while also being used by wearable device (such as Intelligent bracelet) acquisition The sign data (such as heart rate, movement velocity) at family.

Obtain control smart home device voice, the current scene image information of image capture device acquisition and After the sign data of user, step 102 can be executed.

Step 102: effective noise source is determined according to current scene image information and sign data；Wherein, effective noise source To generate the noise source of direct interference to the voice that user issues in the current scene.

Specifically, can be obtained and current scene image information pair from database first according to current scene image information Noise source set present in the user's scene answered；Its then, further according to the corresponding sign number of every kind of User Status in database According to range, the corresponding current user state of sign data is judged；Finally, being obtained from noise source set according to current user state Take effective noise source.

For example, smart home device can be according to the scene image information in database still by taking the example in Fig. 2 as an example In feature, be compared with the feature in collected current scene image information, as there is the body of user in scene of running Certain gradient, arm alternately leave ground in swing, foot, and rhythm is identical as the amplitude of arms swing, has race Step machine etc. is set when the characteristic similarity in the running scene image information in the feature and database in scene image information reaches When determining threshold value (such as 90%), can determine current scene image information it is corresponding be running scene image information.And then it can be from It is determined in database, corresponding noise source has the set noise source generated when treadmill work, uses in scene image information of running The noise source of wheezing etc. in the friction noise source, user that are generated when family moves on a treadmill, and then determine to work as from database Noise source set present in the corresponding user's scene of preceding scene image information are as follows: set noise source, wheezes and makes an uproar at friction noise source Source of sound.

Later, according to the corresponding sign data range of every kind of User Status in database (such as heart rate data range), judgement The corresponding current user state of sign data (i.e. heart rate data) that air-conditioning is obtained by Intelligent bracelet, such as user in database Heart rate range is that 80-100 corresponds to static or state of being careful when for static state, and user is that heart rate range when jogging is 101-120 pairs Should jog state, and heart rate range of the user when hurrying up is greater than 121 corresponding states of hurrying up.If the body obtained by Intelligent bracelet Sign data, that is, heart rate is 90 and illustrates that user's current state is static or state of being careful；If the heart rate measured is 130, illustrate to use Family current state is state of hurrying up.

After determining the current state of user, according to current user state from noise source set, effective noise is obtained Source can be accomplished by the following way:

Firstly, judge whether that the user's noise for generating user is removed from noise source set according to current user state, Obtain judging result；Wherein, the sound issued when user's noise is user movement.

Secondly, removing user if judging result is the corresponding noise source of removal user's noise from noise source set and making an uproar The corresponding noise source of sound obtains new noise source set.

Finally, picking out the noise source that interference value is less than preset threshold from new noise source set, effective noise is obtained Source.

For example, the voice influence very little that whoop issues user at this time can if current state is static or state of being careful To ignore, and then whoop can be removed from noise source set, obtain new noise source collection and be combined into set noise source and rub The noise source (for example friction noise source) that interference value is less than given threshold is later removed from new noise source set in rubbing noise source It removes, obtaining effective noise source is set noise source.If current state is state of hurrying up, wheeze what noise source issued user at this time Voice is affected, and cannot ignore, and obtains new noise source collection and is combined into set noise source and friction noise source, noise source of wheezing, Therefrom the noise source (noise source of such as wheezing) that interference value is less than given threshold is removed later, effective noise source is obtained and makes an uproar for machine Source of sound, friction noise source.

After obtaining effective noise source, step 103 can be executed.

Step 103: effective noise source being rejected from voice, the voice after being denoised, according to the voice control after denoising Smart home device processed.

Specifically, after the corresponding audio signal of effective noise source is done reverse phase processing, the corresponding audio letter of the voice that is added to Voice in number, after being denoised.

For example, effective noise source is set noise source, friction noise source still by taking the example in Fig. 2 as an example, machine is made an uproar After the corresponding audio signal of source of sound and friction noise source does reverse phase processing, the language of the user's control that is added to smart home device In sound, it is cancelled set noise source and friction noise source in the voice of user's control smart home device, to be gone Except the voice (voice after denoising) after set noise source and friction noise source.Then according to the voice control intelligence after denoising It can home equipment execution booting and temperature adjusting (adjusting temperature to 26 DEG C).

Based on the same inventive concept, a kind of device for speech recognition is provided in one embodiment of the invention, the device The specific embodiment of audio recognition method can be found in the description of embodiment of the method part, and overlaps will not be repeated, refer to Fig. 3, the device include:

Acquiring unit 301, for obtaining the voice of smart home device described in user's control, image capture device acquisition The sign data of current scene image information and user；

Determination unit 302, for determining effective noise according to the current scene image information and the sign data Source；Wherein, effective noise source is that the noise of direct interference is generated to the voice that user issues in the current scene Source；

Culling unit 303, for effective noise source to be rejected from the voice, the voice after being denoised, root According to smart home device described in the voice control after the denoising.

Optionally, the determination unit 302 is specifically used for:

Optionally, the determination unit 302 is also used to:

Optionally, the culling unit 303 is specifically used for:

Optionally, the sign data is acquired particular by wearable device；

The wearable device includes motion sensor, biosensor, environmental sensor, skin electric transducer, heart rate biography The combination of any one or more of sensor, barometer.

Based on the same inventive concept, a kind of device for speech recognition is provided in the embodiment of the present invention, comprising: at least One processor, and

The memory being connect at least one described processor；

Wherein, the memory is stored with the instruction that can be executed by least one described processor, described at least one The instruction that device is stored by executing the memory is managed, audio recognition method as described above is executed.

Based on the same inventive concept, the embodiment of the present invention also mentions a kind of computer readable storage medium, comprising:

The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers When, so that computer executes audio recognition method as described above.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as the production of method, system or computer program Product.Therefore, in terms of the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and hardware Embodiment form.Moreover, it wherein includes computer available programs generation that the embodiment of the present invention, which can be used in one or more, The meter implemented in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of code The form of calculation machine program product.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, equipment (system) and computer program product Flowchart and/or the block diagram describe.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating Processing of the machine program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Device is to generate a machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute For realizing the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram Device.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of method of speech recognition is applied to smart home device characterized by comprising

Obtain user's control described in smart home device voice, image capture device acquisition current scene image information and The sign data of user；

According to the current scene image information and the sign data, effective noise source is determined；Wherein, effective noise Source is that the noise source of direct interference is generated to the voice that user issues in the current scene；By effective noise source from It is rejected in the voice, the voice after being denoised, according to smart home device described in the voice control after the denoising.

2. the method as described in claim 1, which is characterized in that according to the current scene image information and the sign number According to determining effective noise source, comprising:

According to the current scene image information, user corresponding with the current scene image information is obtained from database Noise source set present in scape；

According to the corresponding sign data range of every kind of User Status in the database, judge that the sign data is corresponding current User Status；

3. method according to claim 2, which is characterized in that according to the current user state from the noise source set In, obtain effective noise source, comprising:

According to the current user state, judge whether the corresponding noise source of user's noise for generating user from the noise source It is removed in set, obtains judging result；Wherein, the sound issued when user's noise is user movement；

If the judging result is to remove the corresponding noise source of user's noise, from the noise source set described in removal The corresponding noise source of user's noise obtains new noise source set；

The noise source that interference value is less than preset threshold is picked out from the new noise source set, obtains effective noise Source.

4. the method as described in any claim of claim 1-3, which is characterized in that by effective noise source from the voice It rejects, the voice after being denoised, comprising:

After the corresponding audio signal of the effective noise source is done reverse phase processing, be added to the corresponding audio signal of the voice In, the voice after obtaining the denoising.

5. method as claimed in claim 4, which is characterized in that the sign data is acquired particular by wearable device 's；The wearable device include motion sensor, biosensor, environmental sensor, skin electric transducer, heart rate sensor, The combination of any one or more of barometer.

6. a kind of device of speech recognition is applied to smart home device characterized by comprising

Acquiring unit, for obtaining the voice of smart home device described in user's control, front court is worked as in image capture device acquisition The sign data of scape image information and user；

Determination unit, for determining effective noise source according to the scene image information and the sign data；Wherein, described Effective noise source is that the noise source of direct interference is generated to the voice that user issues in the current scene；

Culling unit, for rejecting effective noise source from the voice, the voice after being denoised is gone according to described Smart home device described in voice control after making an uproar.

7. device as claimed in claim 6, which is characterized in that the determination unit is specifically used for:

According to the current scene image information, the corresponding user's scene of the current scene image information is obtained from database Present in noise source set；

8. device as claimed in claim 7, which is characterized in that the determination unit is also used to:

If the judging result is to need to remove the corresponding noise source of user's noise, removed from the noise source set The corresponding noise source of user's noise, obtains new noise source set；

9. a kind of device of speech recognition characterized by comprising

At least one processor, and

The memory being connect at least one described processor；

Wherein, the memory is stored with the instruction that can be executed by least one described processor, at least one described processor By executing the instruction of the memory storage, the method according to claim 1 to 5 is executed.

10. a kind of computer readable storage medium, it is characterised in that:

The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers, So that computer executes method according to any one of claims 1 to 5.