CN113611305A

CN113611305A - Voice control method, system, device and medium in autonomous learning home scene

Info

Publication number: CN113611305A
Application number: CN202111037587.XA
Authority: CN
Inventors: 张泽宇
Original assignee: Unisound Shanghai Intelligent Technology Co Ltd
Current assignee: Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-11-05

Abstract

The invention discloses a voice control method, a system, equipment and a medium under an autonomous learning home scene, wherein the method comprises the following steps: receiving a user voice command sent by pickup equipment and the information of the space where the pickup equipment is located; recognizing the voice command, inquiring intention control logic configured by a user in a corpus, and if the intention control logic is configured by the user, directly executing the intention control logic; and if not, inquiring an intention control logic which is consistent with the voice recognition result and the spatial information of the sound pickup equipment in a pre-stored intention library, if so, directly executing the intention control logic, sending an inquiry about whether to execute the correct to the user, acquiring feedback information about whether to execute the correct to the intention control logic by the user, and if so, recording the intention control logic into the user-defined configured intention control logic which is consistent with the voice recognition result of the user in the corpus. The invention solves the problem that a machine cannot accurately judge the intention of a user when the voice instruction of the user is fuzzy in an intelligent home scene.

Description

Voice control method, system, device and medium in autonomous learning home scene

Technical Field

The invention relates to the technical field of smart home, in particular to a voice control method, a system, equipment and a medium in an autonomous learning home scene.

Background

After the intelligent home voice control obtains the voice instruction of the user, firstly, voice is converted into characters through ASR, then, the characters are subjected to word segmentation processing, and the [ space ] information and the [ equipment information ] in the voice instruction of the user are obtained.

If the [ space ] information is not contained in the voice command of the user, the common practice in the industry is to directly control all devices of the type in the home of the user by default, or to inquire the user which device under the [ space ] is to be controlled by secondary clarification. When a user says the "light on" command in a different spatial domain, the real intent is not the same.

For example, a user says "turn on lights" (without explicit spatial information) when he just enters his home, and the real intention is to turn on lights in the entrance and living room; while the user, when entering the main bedroom or lying in bed and turning on the lights, has the real intention of turning on the bedroom lights only.

The current NLU and NLP technologies cannot accurately understand the difference of the user's intentions of the same fuzzy instruction in different scenes.

Disclosure of Invention

In view of the above problems, the present invention provides a voice control method, system, device and computer storage medium in an autonomous learning home scenario, which solves the problem that a machine is unable to accurately determine the user's intention when the user's voice instruction is fuzzy in an intelligent home scenario.

In order to realize the technical effects, the invention adopts the technical scheme that:

in one aspect, the invention provides a voice control method in an autonomous learning home scene, the method comprising:

the first step is as follows: receiving a user voice command sent by pickup equipment and space information of the pickup equipment;

the second step is that: performing voice recognition processing on the voice instruction, inquiring whether intention control logic which is configured by user definition and accords with the voice recognition result exists in a pre-stored corpus, if so, directly executing the intention control logic, wherein the intention control logic comprises the type, the space and the target action of target equipment which accords with the voice recognition result, and ending the process; if not, executing the third step;

the third step: inquiring whether a default intention control logic of the system, which is consistent with the voice recognition result and the spatial information of the sound pickup equipment, exists in the prestored intention library, if so, directly executing the intention control logic, sending an inquiry whether to execute the right to the user, and entering the fourth step; if not, executing the fifth step;

the fourth step: acquiring whether the user executes correct feedback information on the intention control logic, and if so, recording the intention control logic into the intention control logic which is configured by the user in a user-defined manner and conforms to the voice recognition result of the user in the corpus; if not, executing the fifth step;

the fifth step: and reminding the user of the problem that the voice instruction cannot be executed.

Preferably, the speech recognition processing includes at least one of the following modes: voiceprint recognition, ASR recognition + word segmentation processing, ASR recognition + NLU understanding.

Preferably, the result of the speech recognition process includes at least one of the following corpora: user voiceprint information, type or name of the target device, space in which the target device is located, and actions required to be performed by the target device.

Preferably, the second step further comprises: and according to the result of performing voice recognition processing on the voice command, judging whether the voice command meets the requirement of querying in the corpus in advance, if so, performing subsequent querying steps, and if not, directly executing the fifth step.

In another aspect, the present invention provides a voice control system in an autonomous learning home scenario, where the system includes:

the pickup device comprises a collecting module, a processing module and a processing module, wherein the collecting module is used for receiving a user voice instruction sent by the pickup device and the space information of the pickup device;

the recognition module is used for carrying out voice recognition processing on the voice command;

the first matching module is used for inquiring intention control logic which is matched with the voice recognition result and is configured by a user in a pre-stored corpus;

the first execution module is used for executing the intent control logic which is inquired by the first matching module and is configured by the user self-definition;

the second matching module is used for inquiring default intention control logic of the system, which is consistent with the voice recognition result and the spatial information of the sound pickup equipment, in the prestored intention library;

the second execution module is used for executing the default intention control logic of the system inquired by the second matching module;

the confirming module is used for confirming whether the execution is correct or not with a user after the second executing module finishes executing the default intention control logic of the system, and recording the correctly executed intention control logic into the intention control logic which is configured by the user in a user-defined way and accords with the voice recognition result of the user in the corpus;

and the reminding module is used for reminding the user of the problem that the voice instruction cannot be executed.

Preferably, the identification module comprises at least one of the following modules: the system comprises a voiceprint recognition module, an ASR recognition and word segmentation processing module and an ASR recognition and NLU understanding module.

Preferably, the result of the speech processing by the recognition module includes at least one of the following corpora: user voiceprint information, type or name of the target device, space in which the target device is located, and actions required to be performed by the target device.

As a preferred scheme, the system further includes a pre-judging module, configured to pre-judge whether the voice instruction meets the query requirement in the corpus for the result of voice recognition obtained by the recognition module, and output the result to the first matching module if the result meets the query requirement in the corpus, and output the result to the prompting module if the result does not meet the query requirement in the corpus.

In still another aspect, the present invention provides a voice control device in an autonomous learning home scenario, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the voice control method as described above when executing the program.

In another aspect, the present invention also provides a computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the voice control method as described above.

Compared with the prior art, the invention has the beneficial effects that:

from the actual life scene and the habit of the conversation, the machine is helped to better understand the real intention of the user corresponding to the fuzzy language in different spaces through the feedback of the user. Meanwhile, in consideration of the language habit difference and the space difference of each person, a set of logic of NLU learning is stored aiming at the historical data of each user and machine conversation so as to achieve the effect of thousands of people.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a voice control method for smart home based on user habits according to an embodiment of the present invention.

Fig. 2 is a block diagram of a structure of a voice control system for smart homes based on user habits according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

The terms in the examples of the present invention are explained as follows:

ASR: automatic Speech Recognition (Automatic Speech Recognition technology) is a technology for converting human Speech into text.

NLP: natural Language Processing is a technology for communicating with a computer using Natural Language. The research uses the electronic computer to simulate the human language communication process, so that the computer can understand and use the natural language of human society, such as Chinese and English, to realize the natural language communication between human and machine, to replace part of mental labor, including the processing of information inquiry, question answering, document extraction, compilation and all the information about natural language.

The invention provides a voice control method under an autonomous learning home scene, which is applied to a smart home ecosystem, wherein the smart home ecosystem comprises an intelligent home Application (APP), a voice acquisition device and a plurality of smart home devices, the voice acquisition device is communicated with the smart home application in a wireless or wired mode, the smart home devices are also communicated with the smart home application in a wireless or wired mode, when a user wants to control a certain smart home device through voice, the user sends voice information, then the voice acquisition device acquires the voice information sent by the user, and when the voice acquisition device acquires the voice information sent by the user, the voice information is sent to the smart home application.

The embodiment of the invention provides a voice control method and system in an autonomous learning home scene, and the system is used for realizing the voice control method. It can be understood that the system is the smart home Application (APP), and the voice collecting device in this embodiment is a sound pickup device (smart speaker, voice assistant, etc.).

Referring to fig. 1, an embodiment of the present invention provides a voice control method in an autonomous learning home scenario, including the following steps:

the first step is as follows: receiving a user voice command sent by pickup equipment and the information of the space where the pickup equipment is located;

it should be noted that the sound pickup apparatus may be an intelligent sound box, a voice assistant, and the like, and after the sound pickup apparatus logs in the system, information corresponding to the sound pickup apparatus, including ID information, location space information, and the like, is automatically stored on the system, so that when the system acquires a voice command of the sound pickup apparatus, the system can automatically recognize the location space information of the sound pickup apparatus, and in a broad sense, the system acquires the voice command and the location space information of the sound pickup apparatus at the same time. The user voice command, for example, the user says "light on", "air conditioner on", "television on", and among them "light", "air conditioner", "television" should be a smart home device with smart control capability.

The second step is that: performing voice recognition processing on the voice instruction, inquiring whether an intention control logic which is configured by a user in a self-definition mode and accords with the voice recognition result in a pre-stored corpus, and if so, directly executing the intention control logic which is configured by the user in the self-definition mode, wherein the intention control logic which is configured by the user in the self-definition mode comprises the type, the space and the target action of the target equipment which accords with the voice recognition result, and ending the process; if not, executing the third step;

the target equipment type is the type of target intelligent household equipment, such as an air conditioner, a lamp or a television, and the target action is the action executed by controlling the corresponding target intelligent household equipment through a user voice instruction, such as turning on or off (a lamp), increasing or decreasing (air conditioner temperature) and the like

Specifically, in this step, the speech recognition process may include at least one of the following ways: voiceprint recognition, ASR recognition + word segmentation processing, ASR recognition + NLU understanding.

In order to determine the identity of the user, the smart home device that the user wants to control, and the action performed on the smart home device to be controlled, an electroacoustic instrument may be used to perform voiceprint recognition on the user voice command, and the voiceprint information has the identity recognition function as a fingerprint. According to a preset voice recognition algorithm, the voiceprint information of the user in the voice instruction can be recognized.

The system identifies the voiceprint information of the user, and inquires whether the intention control logic of the user-defined configuration which is consistent with the voiceprint information exists in a pre-stored corpus according to the voiceprint information, if the user completely inputs the type, the space and the target action of the target equipment which is consistent with the voiceprint information in the corpus, at the moment, the information can be directly inquired according to the voiceprint information, and the intention control logic of the user-defined configuration can be completed by controlling the target equipment of the type in the space to execute the target action.

If the user lacks any one of the target device type, space and target action which is consistent with the voiceprint information in the corpus, the system can select to perform ASR recognition and simple word segmentation processing on the voice instruction, or perform ASR recognition and NLU understanding to obtain the target device type, space and target action in the voice instruction, and can also realize the purpose control logic for executing the user-defined configuration.

In some cases, the spatial information where the target device is located is lacking in the voice instruction of the user, for example, the user only says "turn on the light", but does not know whether "turn on the light in the living room" or "turn on the light in the bedroom", and at this time, the user does not know from the user habit, that is, the spatial information where the device configured by the user defined and corresponding to the voice instruction is located is lacking in the corpus. At this point the third step is entered.

Further, the method can further comprise the following steps: and according to the result of carrying out voice recognition processing on the voice command, judging whether the voice command meets the requirement of carrying out query in the corpus in advance, if so, carrying out the subsequent query step, and if not, directly executing the fifth step. In this case, an error is reported directly for a voice instruction that is not supported by the corpus, such as a voice instruction sent by a user without authority, so as to remind the user to send a correct voice instruction, and the error reporting can be performed by the sound pickup device.

The third step: inquiring whether a default intention control logic of the system, which is consistent with the voice recognition result and the spatial information of the sound pickup equipment, exists in a pre-stored intention library, if so, directly executing the intention control logic, sending an inquiry whether to execute the right to the user, and entering the fourth step; if not, executing the fifth step;

the contents of the intent library can be found in table 1 below:

corpus	Room where pickup equipment is located	Intention (device executing action)
			Turning on lamp	Parlor	Hallway lamp and living room lamp
Turning on lamp	Principal and subordinate bed	Bedroom lamp

When the voice instruction of the user lacks the spatial information of the target device and only the type and the target action of the target device exist, for example, the user says "turn on the light", and the lights of the living room and the main-lying room are in the same intelligent home ecosystem of the user, it is difficult to judge whether the real intention of the user is "turn on the light of the living room" or "turn on the light of the main-lying room". At this time, the space where the user is located can be judged according to the space information where the sound pickup device is located, and the sound pickup device is generally set to be only effective in a certain space, such as a living room or a main sleeping space, so that when the voice command is sent by the sound pickup device in the living room, the user can be judged to be in the living room at this time, and combined with the type and the target action of the target device in the voice command of the user, the default intention control logic of the system is obtained, namely when the space where the sound pickup device is located is the living room, the turning on of the lamp in the living room (including the entrance lamp and the hall lamp) is executed, or when the space where the sound pickup device is located is the main sleeping, the turning on of the bedroom lamp is executed.

Therefore, the problem that the intention of a user cannot be accurately judged by a machine when the voice instruction of the user is fuzzy in an intelligent home scene can be solved.

Further, after the system default intent control logic is executed, the system may also query the user to: is it performed correctly? An inquiry is made by the sound pick-up device.

in this step, if the user answers and executes correctly, the voice command and the default intention control logic of the system are correlated with each other and issued to the intention control logic configured by the user in the user-defined manner under the user account, and the intention control logic is stored in the corresponding corpus, and the intention control logic is directly executed for the same voice command next time, and the user is not asked whether to execute correctly or not twice. If the user answers that the execution is incorrect, the default intention control logic of the system is not executed next time;

the fifth step: reminding the user that the voice instruction can not be executed.

This step can be implemented by a sound pickup device, and reminds the user to set the type, space and target action of the target device which the voice command wants to control in the system customization (this step is not necessary and can be omitted).

Referring to fig. 2, an embodiment of the present invention provides a voice control system in an autonomous learning home scenario, where the system includes:

the pickup module 11 is configured to receive a user voice instruction sent by pickup equipment and spatial information where the pickup equipment is located;

the recognition module 12 is used for performing voice recognition processing on the voice instruction;

a first matching module 13, configured to query a pre-stored corpus for intent control logic configured by a user in a customized manner, the intent control logic corresponding to the result of the speech recognition;

a first executing module 14, configured to execute the intent control logic of the user-defined configuration queried by the first matching module 13;

the second matching module 15 is used for inquiring default intention control logic of the system, which is consistent with the voice recognition result and the spatial information of the sound pickup equipment, in the prestored intention library;

a second executing module 16, configured to execute the default intention control logic of the system queried by the second matching module;

the confirming module 17 is configured to confirm whether the execution is correct with the user after the second executing module finishes executing the default intention control logic of the system, and record the correctly executed intention control logic into the intention control logic of the user-defined configuration in the corpus that corresponds to the result of the voice recognition of the user;

and the reminding module 18 is used for reminding the user that the problem that the voice instruction cannot be executed exists.

The identification module 12 may specifically include at least one of the following modules: the system comprises a voiceprint recognition module, an ASR recognition and word segmentation processing module and an ASR recognition and NLU understanding module.

The result of the speech processing by the recognition module 12 may include at least one of the following corpora: user voiceprint information, type or name of the target device, space in which the target device is located, and actions required to be performed by the target device.

The system may further include a pre-determining module 19, configured to pre-determine whether the voice command satisfies the query requirement in the corpus, if so, output the voice command to the first matching module 13, and if not, output the voice command to the prompting module 18, so as to directly report an error.

In addition, an embodiment of the present invention further provides a voice control device in an autonomous learning home scenario, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the methods of the foregoing embodiments when executing the program.

Furthermore, embodiments of the present invention also provide a computer storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the methods of the embodiments described above.

The invention starts from the actual life scene and the habit of conversation, and helps the machine to better understand the real intention of the user corresponding to the fuzzy language command in different spaces through the feedback of the user. Meanwhile, in consideration of the language habit difference and the space difference of each person, a set of logic of NLU learning is stored aiming at the historical data of each user and machine conversation so as to achieve the effect of thousands of people.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims. In addition, the parts not related to the invention are the same as or can be realized by the prior art.

Claims

1. A voice control method under an autonomous learning home scene is characterized by comprising the following steps:

2. The speech control method under the autonomous learning home scenario of claim 1, wherein the speech recognition processing comprises at least one of: voiceprint recognition, ASR recognition + word segmentation processing, ASR recognition + NLU understanding.

3. The speech control method under the autonomous learning home scenario of claim 1, wherein the result of the speech recognition processing comprises at least one of the following corpora: user voiceprint information, type or name of the target device, space in which the target device is located, and actions required to be performed by the target device.

4. The voice control method in the autonomous learning home scenario according to claim 1, further comprising in the second step: and according to the result of performing voice recognition processing on the voice command, judging whether the voice command meets the requirement of querying in the corpus in advance, if so, performing subsequent querying steps, and if not, directly executing the fifth step.

5. A speech control system under a home scene of autonomous learning, comprising:

6. The voice control system under autonomous learning home scenario of claim 5, wherein the recognition module comprises at least one of: the system comprises a voiceprint recognition module, an ASR recognition and word segmentation processing module and an ASR recognition and NLU understanding module.

7. The speech control system under autonomous learning home scene of claim 5, wherein the result of the speech processing by the recognition module comprises at least one of the following corpora: user voiceprint information, type or name of the target device, space in which the target device is located, and actions required to be performed by the target device.

8. The voice control system under the autonomous learning home scene according to claim 5, further comprising a pre-determination module, configured to pre-determine whether the voice command satisfies a query requirement in the corpus for a voice recognition result obtained by the recognition module, and output the voice command to the first matching module if the voice command satisfies the query requirement, and output the voice command to the reminding module if the voice command does not satisfy the query requirement.

9. A voice control device under an autonomous learning home scene, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, characterized in that the processor implements the voice control method according to any one of claims 1 to 4 when executing the program.

10. A computer storage medium on which a computer program is stored, which program, when executed by a processor, implements the speech control method according to any one of claims 1 to 4.