CN109785836B

CN109785836B - Interaction method and device

Info

Publication number: CN109785836B
Application number: CN201910079020.5A
Authority: CN
Inventors: 袁煜然
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2021-03-30
Anticipated expiration: 2039-01-28
Also published as: CN109785836A

Abstract

The embodiment of the application discloses an interaction method and device. One embodiment of the method comprises: responding to the acquired voice of the user, and judging whether to execute the awakening operation by utilizing the awakening judgment model based on reference information, wherein the reference information comprises: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged; and responding to the judgment result, and executing the operation associated with the judgment result. On one hand, the problem of complicated interaction caused by the fact that the intelligent equipment can be woken up to interact only by inputting voice containing the name of the intelligent equipment in each interaction process is avoided, and the interaction flow is simplified. On the other hand, the smart device and the user may interact through speech in natural language form.

Description

Interaction method and device

Technical Field

The application relates to the field of computers, in particular to the field of human-computer interaction, and particularly relates to an interaction method and device.

Background

The intelligent device such as the intelligent sound box can interact with the user in a voice mode, analyze the voice of the user indicating the operation expected to be executed by the user, determine the operation expected to be executed by the user, and execute the operation expected to be executed by the user. Currently, it is often necessary for a user to speak a voice, such as containing the name of the smart device, to first wake up the smart device and then for the smart device to perform the operation that the user desires to perform. On the one hand, the interaction process is complicated, and on the other hand, the intelligent device cannot interact with the user through the voice in the natural language form.

Disclosure of Invention

The embodiment of the application provides an interaction method and device.

In a first aspect, an embodiment of the present application provides an interaction method, where the method includes: responding to the acquired voice of the user, and judging whether to execute the awakening operation by utilizing the awakening judgment model based on reference information, wherein the reference information comprises: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged; and responding to the judgment result, and executing the operation associated with the judgment result.

In a second aspect, an embodiment of the present application provides an interaction apparatus, including: a judging unit configured to judge whether to execute a wake-up operation based on reference information by using a wake-up judging model in response to acquisition of the voice of the user, the reference information including: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged; and the execution unit is configured to respond to the judgment result and execute the operation associated with the judgment result.

According to the interaction method and device provided by the embodiment of the application, whether the awakening operation is executed or not is judged by responding to the acquired voice of the user and utilizing the awakening judgment model based on the reference information, wherein the reference information comprises: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged; and responding to the judgment result, and executing the operation associated with the judgment result. On one hand, the problem of complicated interaction caused by the fact that the intelligent equipment can be woken up to interact only by inputting voice containing the name of the intelligent equipment in each interaction process is avoided, and the interaction flow is simplified. On the other hand, the smart device and the user may interact through speech in natural language form.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates an exemplary system architecture suitable for use in implementing embodiments of the present application;

FIG. 2 shows a flow diagram of one embodiment of an interaction method according to the present application;

FIG. 3 shows a schematic structural diagram of one embodiment of an interaction device according to the present application;

FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 illustrates an exemplary system architecture suitable for use in implementing embodiments of the present application.

As shown in fig. 1, the system architecture may include a smart device 101, a network 102, and a server 103. The network 102 may be a wired network or a wireless network.

The smart device 101 communicates data with the server 103 via the network 102, and the smart device 101 may include, but is not limited to: intelligent audio amplifier, intelligent interactive robot. The microphone on the smart device 101 may collect the voice of the user near the smart device, and thus, the voice of the user is obtained. The smart device 101 may recognize the voice of the user to obtain a text corresponding to the voice of the user, and determine whether to perform a wakeup operation by using a wakeup determination model running on the smart device 101, that is, determine whether to set the smart device 101 in a wakened state. When the determination result is available, the smart device 101 performs an operation associated with the determination result. The smart device 101 may also send the acquired voice of the user to the server 103, recognize the voice of the user to obtain a text corresponding to the voice of the user, determine whether to perform a wake-up operation by using a wake-up determination model running on the server 103, and when a determination result is obtained, the server 103 sends the determination result to the smart device 101.

Referring to fig. 2, a flow diagram of one embodiment of an interaction method according to the present application is shown. The method comprises the following steps:

in step 201, in response to acquiring the voice of the user, a wake-up judgment model is used to judge whether to execute a wake-up operation based on the reference information.

In this embodiment, the reference information includes: the method comprises the steps of obtaining a text corresponding to the voice of a user, and the duration between the moment when the intelligent device executes the awakening operation last time and the moment when whether the awakening operation is executed or not is judged.

In this embodiment, the time for determining whether to execute the wake-up operation is not particularly limited to a certain time. Each time when the wake-up operation is determined to be executed can be referred to as a time when the wake-up operation is determined to be executed.

In this embodiment, a state in which the smart device can perform an operation that the user needs to perform by the smart device may be referred to as an awake state. When the intelligent device is not awakened, the intelligent device is in a standby state. After the intelligent device acquires the voice of the user, whether the awakening operation is executed or not can be judged, when the awakening operation is determined to be executed, the awakening operation is executed, the intelligent device is enabled to be in an awakened state, and then the operation which needs to be executed by the intelligent device by the user is executed.

In this embodiment, the neural network for determining whether to perform the wake-up operation may be referred to as a wake-up determination model. The neural network for judging whether to execute the awakening operation is obtained by training in advance by using the training sample. Each training sample includes: the method comprises the following steps of a text for training, the duration between the moment when the intelligent device executes the awakening operation last time and the moment when whether the awakening operation is executed or not is judged, and target output corresponding to the text for training. The text used for training indicates the needs of the user. The types of user's needs may include, but are not limited to: operations need to be performed by the smart device, information needs to be queried by the smart device.

For example, a training sample contains a text "help me find weather conditions", the text indicates a requirement of a user to find weather conditions, and a target output corresponding to the text for training indicates that the intelligent device performs a wake-up operation. As another example, a training sample may contain the text "help me close window" for training, which corresponds to a target output to instruct the smart device not to perform a wake-up operation.

In this embodiment, the neural network used to determine whether to perform the wake-up operation has initial network parameters at the time of creation. Training may be performed by a plurality of training samples, iteratively adjusting network parameters. After training with a plurality of training samples, the neural network for determining whether to perform the wake-up operation may determine whether to perform the wake-up operation based on a text corresponding to the voice of the user, a duration between a time when the smart device has performed the wake-up operation last time and a time when it is determined whether to perform the wake-up operation.

In other words, the neural network, i.e., the wake-up determination model, for determining whether to perform the wake-up operation may determine whether to wake up the smart device based on a duration between a time when the smart device has performed the wake-up operation last time and a time when the smart device has determined whether to perform the wake-up operation, and a text corresponding to a voice of the user.

In training, for each training sample, the text for training in the training sample may be encoded to obtain an encoded representation of the text for training. The time length between the moment when the intelligent device executes the awakening operation last time and the moment when whether the awakening operation is executed or not is judged can be coded, and coded representation of the time length is obtained. In the neural network for judging whether to execute the awakening operation, a prediction result can be obtained based on the coded representation of the text and the coded representation of the duration for training, and the prediction result indicates whether the intelligent device executes the awakening operation in advance. The network parameters to be adjusted can be calculated according to the loss function indicating the difference between the predicted result and the target output result, and the network parameters to be adjusted can be adjusted.

In this embodiment, the wake-up determining module may determine whether to execute the wake-up operation, that is, determine whether to wake up the smart device according to the obtained text corresponding to the voice of the user, and a duration between a time when the smart device has executed the wake-up operation last time and a time when the obtained voice of the user is determined.

In this embodiment, the duration between the time of the last wake-up operation in the training sample and the time of determining whether to perform the wake-up operation may be set according to the confidence that the text for training in the training sample reflects the requirement of the user.

In this embodiment, when the text for training may definitely reflect the user's needs, such as that the user needs to be performed by the smart device and needs to be queried by the smart device for information, the confidence that the text for training reflects the user's needs is higher, the duration between the time when the smart device has performed the wake-up operation last time and the time when it is determined whether to perform the wake-up operation in the training sample may be longer, for example, 1 hour, and the target output in the training sample is to indicate the smart device to perform the wake-up operation. When the text used for training may indicate that the user needs to be performed by the smart device, needs to be queried for information by the smart device, or is a text corresponding to daily voice of the user, a time duration between a time when the wake-up operation is performed last time in a training sample and a time when whether the wake-up operation is performed is determined may be shorter, for example, 10 minutes, and a target output in the training sample is to indicate that the smart device performs the wake-up operation.

In this embodiment, after training is performed by using the training sample, the neural network, that is, the wake-up determination model, for determining whether to perform the wake-up operation may determine whether to perform the wake-up operation based on the confidence that the obtained text corresponding to the voice of the user reflects the requirement of the user and the duration between the time when the smart device has performed the wake-up operation last time and the time when determining whether to perform the wake-up operation.

In this embodiment, when the text corresponding to the user's voice is definitely reflected

When a user needs to perform an operation by the smart device, needs to query information by the smart device, and other user requirements, even if the duration between the time when the smart device has performed the wake-up operation last time and the time when whether the wake-up operation is performed is long, for example, 1 hour, since the text corresponding to the voice of the user clearly reflects the user requirements that the user needs to perform the operation by the smart device, needs to query information by the smart device, and other user requirements, the determination result may be to perform the wake-up operation. When the text used for training may indicate the user's needs to be operated by the smart device, to be queried by the smart device, or the text corresponding to the user's daily voice, when the duration between the time when the smart device has last performed the wake-up operation and the time when the obtained user's voice is determined to be shorter, for example, 10 minutes, the determination result may be that the wake-up operation is performed. When the duration between the moment when the intelligence last executed the wake-up operation and the moment when the acquired voice of the user is determined is longer, for example, 1 hour, the determination result may be that the wake-up operation is not executed.

In step 202, the operation associated with the judgment result is executed.

In this embodiment, after determining whether to perform the wake-up operation according to the obtained text corresponding to the voice of the user, the time length between the time when the smart device has performed the wake-up operation the last time and the time when the obtained voice of the user is determined by using the wake-up determination model, when a determination result is available, the operation associated with the determination result may be performed. When the determination result is that the wake-up operation is executed, the operation associated with the determination result may include executing the wake-up operation by the execution device, setting the smart device in a woken-up state, and then executing the operation associated with the acquired voice of the user. When the determination result is that the wake-up operation is not performed, the operation associated with the determination result may include updating the parameter value of the accumulated non-wake-up duration parameter. The parameter value of the accumulated non-wake-up time parameter is the accumulated time length for which the smart device does not perform the wake-up operation since the last time the wake-up operation was performed. When the wake-up judgment model is used for judging whether to execute the wake-up operation, the parameter value of the accumulated un-wake-up duration parameter can be used as the duration between the moment when the intelligent device executes the wake-up operation last time in the reference information and the moment when the acquired voice of the user is judged.

In some optional implementation manners of this embodiment, when determining whether to execute the wake-up operation by using the wake-up determination model according to the obtained text corresponding to the voice of the user, and a duration between a time when the smart device has executed the wake-up operation last time and a time when the obtained voice of the user is determined, the wake-up determination model may determine whether the obtained text corresponding to the voice of the user includes a wake-up word. When the awakening judgment model determines that the acquired text corresponding to the voice of the user includes the awakening words, a judgment result indicating to execute the awakening operation can be obtained.

For example, the intelligent device is an intelligent sound box, the name of the intelligent sound box is small a, the awakening word is the name of the intelligent sound box, the obtained voice of the user is small a, so that the user can look up the weather of the day, at this time, it can be determined that the text corresponding to the obtained voice of the user contains the awakening word, and a judgment result indicating that the awakening operation is executed can be obtained.

In some optional implementation manners of this embodiment, when the wake-up determination model is used to determine, based on the obtained text corresponding to the voice of the user and the time length between the time when the smart device has performed the wake-up operation last time and the time when it is determined whether the wake-up operation is performed, whether the obtained text corresponding to the voice of the user includes a keyword associated with an operation that cannot be performed by the smart device may be determined by the wake-up determination model. When the text corresponding to the acquired voice of the user is judged to include the keyword associated with the operation that cannot be executed by the intelligent device, a judgment result indicating that the wake-up operation is not executed can be obtained. Keywords associated with operations that cannot be executed by the intelligent device may be preset, and when the text corresponding to the voice of the user includes the keywords indicating the associated operations that cannot be executed by the intelligent device, a determination result indicating that the wake-up operation is not to be executed is obtained.

For example, the acquired voice of the user is "help me close the window", and since the smart device cannot perform the operation of closing the window, the keyword associated with the operation that the smart device cannot perform includes two keywords, namely, close and window, it can be determined that the text corresponding to the acquired voice of the user includes the keyword associated with the operation that the smart device cannot perform, and a determination result indicating that the wakeup operation is not performed can be obtained.

In some optional implementation manners of this embodiment, when it is determined, by using the wake-up determination model, whether a determination result is not obtained after the wake-up operation is performed based on the obtained text corresponding to the voice of the user, the time length between the time when the wake-up operation is performed last time and the time when the obtained voice of the user is determined, the interactive guidance sentence may be played. When feedback voice indicating that the intelligent device needs to execute the operation associated with the voice of the user is acquired, the acquired labeling information of the voice of the user can be generated in response to the acquired feedback voice indicating that the intelligent device needs to execute the operation associated with the voice of the user. Then, a training sample of a wake up decision model may be generated, the training sample of the wake up decision model including: the method comprises the steps of acquiring the voice of a user, and acquiring the labeling information of the voice of the user, wherein the acquired labeling information of the voice of the user indicates to execute the awakening operation. The generated training sample of the awakening judgment model can be used for continuing training the awakening judgment model. When training is performed, the acquired user voice in the generated training sample is used as an input of the wake-up judgment model, and the labeling information of the acquired user voice in the generated training sample is output as a target.

For example, the obtained voice of the user is "help me to check a page. -,", the voice is used for inquiring information of a certain aspect, and when a neural network used for judging whether to execute a wake-up operation is trained in advance, the neural network used for judging whether to execute the wake-up operation is not trained by using a training sample composed of the voice indicating that the user needs to obtain the information of the aspect and indication information indicating to execute the wake-up operation. Accordingly, the wakeup judging model cannot judge whether to execute the wakeup operation based on the acquired voice of the user, and cannot obtain a judgment result. At this point, the smart device may play a guide voice "ask for a question if you are speaking with me". The user's feedback voice is "yes", which indicates that the smart device is required to perform an operation associated with the acquired user's voice. The smart device may play the feedback voice "good". At this point, a training sample may be generated. The training sample comprises the obtained labeling information of the voice of the user, namely the voice of the user, and the obtained labeling information of the voice of the user, namely the voice of the user, which indicates that the user needs to perform a wakeup operation.

In some optional implementation manners of this embodiment, when it is determined, by using the wake-up determination model, whether a determination result is not obtained after the wake-up operation is performed based on the obtained text corresponding to the voice of the user, the time length between the time when the wake-up operation is performed last time and the time when the obtained voice of the user is determined, the interactive guidance sentence may be played. When feedback voice indicating that the intelligent device is not required to execute the operation associated with the voice of the user is acquired, the acquired annotation information of the voice of the user can be generated in response to the acquired feedback voice indicating that the intelligent device is not required to execute the operation associated with the voice of the user. Then, a training sample of a wake up decision model may be generated, the training sample of the wake up decision model including: the obtained voice of the user and the obtained marking information of the voice of the user indicate that the awakening operation is not executed. The generated training sample of the awakening judgment model can be used for continuing training the awakening judgment model. When training is performed, the acquired user voice in the generated training sample is used as an input of the wake-up judgment model, and the labeling information of the acquired user voice in the generated training sample is output as a target.

For example, when the acquired voice of the user is "rice is done", and the neural network for determining whether to perform the wake-up operation is trained in advance, the neural network for determining whether to perform the wake-up operation is not trained using a training sample composed of a voice indicating that the user needs to acquire information of that aspect and instruction information indicating that the wake-up operation is not to be performed. When the voice of the user, that is, the rice is done, is obtained, a judgment result cannot be obtained. At this point, the smart device may play a guide voice "ask for a question if you are speaking with me". Then, a user's feedback voice "not yes" may be obtained, indicating that the smart device is not required to perform the operation associated with the obtained user's voice. The smart device may play the feedback voice "good". At this time, a training sample may be generated, where the training sample includes the obtained labeling information of "how done the rice" in the voice of the user and "how done the rice" in the voice of the user, and the obtained labeling information of "how done the rice" in the voice of the user indicates that the wake-up operation is not to be performed.

Referring to fig. 3, as an implementation of the method shown in the above figures, the present application provides an embodiment of an interactive apparatus, which corresponds to the embodiment of the method shown in fig. 2. Specific implementations of operations that the respective units in the interaction apparatus are configured to perform may refer to the corresponding specific implementations of operations described in the method embodiments.

As shown in fig. 3, the interaction device of the present embodiment includes: a judging unit 301 and an executing unit 302. Wherein, the determining unit 301 is configured to determine, in response to acquiring the voice of the user, whether to perform a wake-up operation based on reference information by using a wake-up determination model, where the reference information includes: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged; the execution unit 302 is configured to execute an operation associated with the determination result in response to obtaining the determination result.

In some optional implementations of this embodiment, the interaction apparatus further includes: the first collection unit is configured to play a guide interactive statement in response to a non-obtained judgment result, wherein the guide interactive statement is used for guiding the user to determine whether the intelligent device is required to execute the operation associated with the obtained voice of the user; responding to the obtained feedback voice instruction of the user, wherein the intelligent device is required to execute the operation associated with the obtained voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates the intelligent device to execute the awakening operation; generating a training sample of a wake up judgment model, the training sample comprising: the obtained voice of the user and the obtained labeling information of the voice of the user.

In some optional implementations of this embodiment, the interaction apparatus further includes: the second collection unit is configured to play a guide interactive statement in response to the failure of obtaining the judgment result, wherein the guide interactive statement is used for guiding the user to determine whether the intelligent device is required to execute the operation associated with the voice of the user; responding to the obtained feedback voice indication of the user without executing the operation associated with the voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates that the intelligent equipment does not execute the awakening operation; generating a training sample of a wake up judgment model, the training sample comprising: the obtained voice of the user and the obtained labeling information of the voice of the user.

In some optional implementations of this embodiment, the determining unit includes: the first awakening judgment subunit is configured to determine whether the acquired text corresponding to the voice of the user comprises an awakening word or not by utilizing an awakening judgment model; and obtaining a judgment result indicating to execute the awakening operation in response to the fact that the obtained text corresponding to the voice of the user comprises the awakening words.

In some optional implementations of this embodiment, the determining unit includes: a second wake-up determination subunit configured to determine, using the wake-up determination model, whether text corresponding to the voice of the user includes a keyword indicating an association with an operation that cannot be performed by the smart device; obtaining a judgment result indicating not to execute the awakening operation in response to determining that the text corresponding to the voice of the user comprises a keyword indicating that the operation cannot be executed by the intelligent equipment

FIG. 4 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the computer system are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406; an output section 407; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, the processes described in the embodiments of the present application may be implemented as computer programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising instructions for carrying out the method illustrated in the flow chart. The computer program can be downloaded and installed from a network through the communication section 409 and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401.

The present application further provides an electronic device that may be configured with one or more processors; a memory for storing one or more programs, the one or more programs may include instructions for performing the operations described in the above embodiments. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the instructions of the operations described in the above embodiments.

The present application also provides a computer readable medium, which may be included in an electronic device; or the device can be independently arranged and not assembled into the electronic equipment. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the operations described in the embodiments above.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a message execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a message execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable messages for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer messages.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An interaction method, comprising:

responding to the acquired voice of the user, and judging whether to execute the awakening operation by utilizing the awakening judgment model based on reference information, wherein the reference information comprises: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged;

in response to obtaining a judgment result, executing an operation associated with the judgment result;

in response to the fact that a judgment result is not obtained, a guiding interactive statement is played, and the guiding interactive statement is used for guiding the user to determine whether the intelligent equipment needs to execute the operation associated with the obtained voice of the user;

responding to the obtained feedback voice instruction of the user, wherein the intelligent device is required to execute the operation associated with the obtained voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates the intelligent device to execute the awakening operation;

generating a training sample of a wake up judgment model, the training sample comprising: the obtained voice of the user and the obtained labeling information of the voice of the user.

2. The method of claim 1, further comprising:

in response to the fact that a judgment result is not obtained, a guiding interactive statement is played, and the guiding interactive statement is used for guiding the user to determine whether the intelligent equipment needs to execute the operation associated with the voice of the user;

responding to the obtained feedback voice indication of the user without executing the operation associated with the voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates that the intelligent equipment does not execute the awakening operation;

3. The method according to claim 1 or 2, wherein the judging whether to perform the wake-up operation based on the reference information by using the wake-up judging model in response to acquiring the voice of the user comprises:

determining whether the text corresponding to the voice of the user comprises an awakening word or not by utilizing an awakening judgment model;

and obtaining a judgment result indicating to execute the awakening operation in response to the fact that the obtained text corresponding to the voice of the user comprises the awakening words.

4. The method according to claim 1 or 2, wherein the judging whether to perform the wake-up operation based on the reference information by using the wake-up judging model in response to acquiring the voice of the user comprises:

determining whether text corresponding to the voice of the user includes a keyword indicating that the text is associated with an operation that cannot be performed by the smart device using a wake-up determination model;

and obtaining a judgment result indicating that the awakening operation is not executed in response to determining that the text corresponding to the voice of the user comprises a keyword which indicates that the operation cannot be executed by the intelligent equipment.

5. An interaction device, comprising:

a judging unit configured to judge whether to execute a wake-up operation based on reference information by using a wake-up judging model in response to acquisition of the voice of the user, the reference information including: the text corresponding to the voice of the user, the time when the intelligent device executes the awakening operation for the last time and the time when whether the awakening operation is executed or not are judged;

an execution unit configured to execute, in response to obtaining a determination result, an operation associated with the determination result;

the first collection unit is configured to play a guide interactive statement in response to a non-obtained judgment result, wherein the guide interactive statement is used for guiding the user to determine whether the intelligent device is required to execute the operation associated with the obtained voice of the user; responding to the obtained feedback voice instruction of the user, wherein the intelligent device is required to execute the operation associated with the obtained voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates the intelligent device to execute the awakening operation; generating a training sample of a wake up judgment model, the training sample comprising: the obtained voice of the user and the obtained labeling information of the voice of the user.

6. The apparatus of claim 5, the apparatus further comprising:

the second collection unit is configured to play a guide interactive statement in response to the failure of obtaining the judgment result, wherein the guide interactive statement is used for guiding the user to determine whether the intelligent device is required to execute the operation associated with the voice of the user; responding to the obtained feedback voice indication of the user without executing the operation associated with the voice of the user, and generating the obtained voice labeling information of the user, wherein the obtained voice labeling information of the user indicates that the intelligent equipment does not execute the awakening operation; generating a training sample of a wake up judgment model, the training sample comprising: the obtained voice of the user and the obtained labeling information of the voice of the user.

7. The apparatus according to claim 5 or 6, the judging unit comprising:

the first awakening judgment subunit is configured to determine whether the acquired text corresponding to the voice of the user comprises an awakening word or not by utilizing an awakening judgment model; and obtaining a judgment result indicating to execute the awakening operation in response to the fact that the obtained text corresponding to the voice of the user comprises the awakening words.

8. The apparatus according to claim 5 or 6, the judging unit comprising:

a second wake-up determination subunit configured to determine, using the wake-up determination model, whether text corresponding to the voice of the user includes a keyword indicating an association with an operation that cannot be performed by the smart device; and obtaining a judgment result indicating that the awakening operation is not executed in response to determining that the text corresponding to the voice of the user comprises a keyword which indicates that the operation cannot be executed by the intelligent equipment.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.