CN112331185A - Voice interaction method, system, storage medium and electronic equipment - Google Patents

Voice interaction method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN112331185A
CN112331185A CN202011248204.9A CN202011248204A CN112331185A CN 112331185 A CN112331185 A CN 112331185A CN 202011248204 A CN202011248204 A CN 202011248204A CN 112331185 A CN112331185 A CN 112331185A
Authority
CN
China
Prior art keywords
intention
voice interaction
matched
voice
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011248204.9A
Other languages
Chinese (zh)
Other versions
CN112331185B (en
Inventor
李禹慧
黄姿荣
李�瑞
吴伟
贾巨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202011248204.9A priority Critical patent/CN112331185B/en
Publication of CN112331185A publication Critical patent/CN112331185A/en
Application granted granted Critical
Publication of CN112331185B publication Critical patent/CN112331185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a voice interaction method, a system, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring voice information; analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with necessary parameters required for executing the intention; judging whether the intention is a clear intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value or not; if the intent is an explicit intent, then the intent is executed. Based on the technical scheme of the invention, the user intention can be executed as long as the weight value of the necessary parameters matched with the elements is greater than or equal to the preset threshold value through the analysis of the voice information, and all the necessary parameters are not required to be matched one by one through multiple rounds of conversations with the user, so that the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.

Description

Voice interaction method, system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of voice interaction technologies, and in particular, to a voice interaction method, a voice interaction system, a storage medium, and an electronic device.
Background
At present, a voice interaction technology is widely adopted in an intelligent home system to realize interaction with a user and control of the system by the user. In the actual voice interaction process, a user often cannot give necessary parameters required for executing the user's intention through voice at one time, so that multiple rounds of dialog interaction are generally required to respectively obtain the corresponding necessary parameters.
Therefore, the problem to be solved urgently exists in the current intelligent home voice control process, namely the number of interaction rounds of an interaction mode which needs to perform multiple rounds of conversation corresponding to multiple parameters is too large, the user experience is poor, the user patience is eliminated, and finally the expected result cannot be achieved. Therefore, a voice interaction method is needed to complete a conversation in voice interaction with as few turns as possible, so as to improve user experience.
Disclosure of Invention
In view of the above problems in the prior art, the present application provides a voice interaction method, system, storage medium, and electronic device capable of reducing the number of turns of voice interaction as much as possible and completing a conversation.
In a first aspect, the present invention provides a voice interaction method, including:
acquiring voice information;
analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with necessary parameters required for executing the intention;
judging whether the intention is a clear intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value or not;
if the intent is an explicit intent, then the intent is executed.
In one embodiment, the method further comprises:
if the intention is not an explicit intention, performing supplementary voice interaction to obtain a second element, wherein the second element is matched with the necessary parameters which are not matched with the first element;
the sum of the weighted values of the necessary parameters respectively corresponding to the second element and the first element is greater than or equal to the preset threshold;
and judging the intention as an explicit intention and executing the intention.
By the embodiment, when the voice information acquired for the first time does not meet the intention execution condition, supplementary voice interaction is performed to supplement and acquire necessary information of the execution intention, and smooth execution of the intention is ensured.
In one embodiment, the method further comprises:
judging whether a target record matched with the intention exists in a historical record or not according to the analysis result of the voice information;
and if the target record matched with the intention exists in the historical record, performing a round of voice interaction to confirm whether the intention is executed according to the target record.
By the embodiment, the target record corresponding to the intention of the user is matched in the history record, so that the record matched with the current intention of the user can be quickly found according to the use habit of the user, and the intention of the user can be quickly executed according to the target record when the user agrees.
In one embodiment, if there is a target record matching the intention in the history, performing a round of voice interaction to confirm whether to execute the intention according to the target record, includes:
judging whether only one record with the highest occurrence frequency exists in the target records;
if only one record with the highest occurrence frequency exists, performing a round of voice interaction to determine whether the intention is executed according to the record with the highest occurrence frequency;
and if the only record with the highest frequency of occurrence does not exist, performing a round of voice interaction to confirm whether the intention is executed according to the latest record in the target records.
By the embodiment, the matching weight of the target record is set, namely the weight of the use frequency of the user is the largest, and the use time is close to the current degree, so that the use habit of the user can be further met.
In one embodiment, performing a complementary voice interaction to obtain a second element that matches the necessary parameters that did not match to the first element comprises:
judging the weight value of the necessary parameter which is not matched with the first element;
and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
With the present embodiment, when performing supplementary voice interaction, supplementary voice interaction corresponding to a necessary parameter having the largest weight value is preferentially performed. Therefore, on the premise of ensuring that the intention executing condition is met, the number of interaction rounds of voice supplement is reduced as much as possible, and the user experience is improved.
In one embodiment, the complementary voice interaction is performed for at most two rounds.
Through this embodiment, set up the supplementary voice interaction of at most two rounds to end the dialogue after two rounds of interaction, and then reduce mutual round number greatly, improve user experience.
In one embodiment, among the necessary parameters required to execute the intention, there is a case where a sum of weight values of two of the necessary parameters is greater than or equal to the preset threshold value.
In one embodiment, the method further comprises:
when the intention is executed, judging whether the necessary parameters which are not matched with the vacancy of the element exist;
if the necessary parameters for the vacancy exist, the necessary parameters for the vacancy are filled and the intent is performed.
Through the implementation mode, after the intention execution condition is met, the necessary parameters of the residual vacancy are automatically filled by the voice interaction system, unnecessary interaction with the user is avoided, and the user experience is improved.
In a second aspect, the present invention further provides a voice interaction system, including:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring an intention corresponding to the voice information and a first element of a necessary parameter corresponding to the intention;
the judging module is used for judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold or not;
an execution module to execute the intent when the intent is an explicit intent.
In a third aspect, the present invention further provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for voice interaction is implemented.
In a fourth aspect, the present invention further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the computer program, when executed by the processor, implements the above-mentioned voice interaction method.
The features mentioned above can be combined in various suitable ways or replaced by equivalent features as long as the object of the invention is achieved.
Compared with the prior art, the voice interaction method, the voice interaction system, the storage medium and the electronic equipment provided by the invention at least have the following beneficial effects:
according to the voice interaction method, the voice interaction system, the storage medium and the electronic equipment, the preset threshold, the necessary parameters and the weight values of the necessary parameters are set in the corresponding application scene, the user intention can be executed as long as the weight values of the necessary parameters matched with the elements are larger than or equal to the preset threshold through the analysis of the voice information, and the necessary parameters are not required to be matched one by one through multiple rounds of conversation with the user, so that the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.
Drawings
The invention will be described in more detail hereinafter on the basis of embodiments and with reference to the accompanying drawings. Wherein:
fig. 1 shows a flow chart of the voice interaction method of the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
Example one
This embodiment mainly illustrates the principle of the voice interaction method of the present invention.
As shown in fig. 1, the present invention provides a voice interaction method, which comprises the following steps:
step S1: and acquiring voice information.
Specifically, the voice information input by the user can be acquired through a microphone of the corresponding electronic device.
Step S2: analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with necessary parameters required by executing the intention.
Specifically, the voice information input by the user has a corresponding intention corresponding to a target result that the user wants to obtain. And executing necessary parameters which must satisfy the intention, and acquiring a first element from the voice information through the analysis of the voice information to match and satisfy the corresponding necessary parameters.
Preferably, step S2 further includes:
step S21: judging whether a target record matched with the intention exists in the historical record or not according to the analysis result of the voice information;
step S22: if the target record matched with the intention exists in the historical record, a round of voice interaction is carried out to confirm whether the intention is executed according to the target record.
Specifically, according to semantic analysis of the voice information and a first element matched with corresponding necessary parameters, whether a target record with the same intention corresponding to the current voice information is intended or not is searched in a historical record, and if the corresponding target record exists, whether the target record is executed or not is confirmed to a user in a voice interaction mode. If the user agrees to execute according to the target record, skipping the subsequent steps of the voice interaction method and directly entering the execution of the intention; and if the user does not agree to execute according to the target record, continuing to enter the subsequent steps of the voice interaction method.
Through the matching of the historical records, the current intention of the user can be quickly matched according to the past interaction records of the user, and the conversation can be completed and the intention can be executed through one round of interaction.
Preferably, step S22 further includes:
step S221: judging whether only one record with the highest occurrence frequency exists in the target records;
step S222: if only one record with the highest occurrence frequency exists, performing a round of voice interaction to determine whether to execute the intention according to the record with the highest occurrence frequency;
step S223: and if the only record with the highest frequency of occurrence does not exist, performing a round of voice interaction to confirm whether the execution intention is carried out according to the latest record in the target record.
Specifically, when matching target records in the history, if there is only one target record, it may be defined to some extent as the only one record with the highest frequency of occurrence, and the user is confirmed by voice interaction according to the target record. More often, there is more than one target record, because the first element in the user's voice message may only match some of the necessary parameters, and accordingly, there will be more than one corresponding target record. Therefore, the target records need to be filtered through steps S221 to S223.
Specifically, it is first determined whether there is a unique record with the highest frequency of occurrence in the target records, and the record with the highest frequency of occurrence represents the usage habit of the user to some extent. And if the only record with the highest frequency of occurrence exists in the target records, confirming the user through voice interaction according to the record with the highest frequency of occurrence. If there is no only one record with the highest frequency in the target records, the method may correspond to various situations, for example, the frequency of occurrence of all the target records is the same or the frequency of occurrence of some records in the target records is the same, under these situations, a user is confirmed through voice interaction according to the latest record in the target records, and the latest record represents the latest record sorted according to time.
When the user is confirmed through voice interaction, if the user agrees to execute according to the corresponding target record, the subsequent steps of the voice interaction method are skipped to directly enter the execution of the intention; and if the user does not agree to execute according to the corresponding target record, continuing to enter the subsequent steps of the voice interaction method.
Step S3: and judging whether the intention is a clear intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value or not.
Specifically, the intention of executing the speech information corresponds to the necessary parameters that satisfy the intention, and the necessary parameters are satisfied by matching the necessary parameters to the first element in the speech information. The number of the necessary parameters is usually multiple, and some of the necessary parameters are satisfied in the present implementation, and when the weight value of the satisfied necessary parameters is greater than or equal to the preset threshold, the key information indicating that the execution intention needs has been acquired, and the intention means that the execution intention is clear and is a clear intention that can be directly executed. If the weight value of the necessary parameter satisfied by the first element is smaller than the preset threshold, the meaning of the intention is ambiguous and the intention is a fuzzy intention, and the second element needs to be acquired in step S5 to complement the necessary parameter.
It should be noted that, if there are a plurality of necessary parameters matched to the first element, the weight value of the necessary parameter matched to the first element here represents the sum of the weight values of the plurality of necessary parameters.
Preferably, the necessary parameters, the weight values of the necessary parameters, and the preset threshold are set according to an application scenario of the voice interaction.
Specifically, the application scenario herein represents a field of voice interaction application, and may be applicable to different types of specific electronic devices, such as air conditioners, washing machines, electric cookers, and other household appliances with voice interaction functions, or may be applicable to software systems such as smart home voice systems.
In addition, it should be noted that the necessary parameters should be understood as a set of a kind of key information necessary for execution intention, which is set in advance according to an application scenario of the voice interaction. Because the executable intentions in the corresponding application scenes are in a certain range and cannot necessarily execute any intentions of the user, and the necessary parameters correspond to all intentions in the certain range which can be executed in the corresponding application scenes, the necessary parameters cannot be changed due to different intentions input by the user, and only the relevant elements which are matched with the necessary parameters in the voice information, namely the specific key information, are changed.
Step S4: if the intent is a clear intent, then the intent is executed.
Specifically, the explicit intended meaning is explicit and can be directly executed.
Step S5 includes the following steps:
step S51: if the intention is not an explicit intention, a complementary voice interaction is performed to obtain a second element that matches the necessary parameters that did not match the first element.
Specifically, if the intention is not a definite intention, that is, the weight value of the necessary parameter matched to the first element is smaller than the preset threshold, the intention is a fuzzy intention, and the second element needs to be acquired to be supplemented to the necessary parameter not matched to the first element. According to the difference value between the weight value of the necessary parameter matched with the first element and the preset threshold value, a plurality of second elements which need to be acquired through supplementary voice interaction may be available, and each round of supplementary voice interaction corresponds to one second element.
Step S52: and enabling the sum of the weighted values of the necessary parameters respectively corresponding to the second element and the first element to be greater than or equal to a preset threshold value.
Specifically, when the supplementary voice interaction is performed round by round to obtain the second element, once the sum of the weighted value of the necessary parameter matched with the obtained second element and the weighted value of the necessary parameter matched with the first element is just greater than or equal to the preset threshold, the execution step of the intention is directly entered. Even if the necessary parameters which are not matched with the elements still exist, the next round of supplementary voice interaction is not continued.
Step S53: and judging the intention as a clear intention and executing the intention.
The purpose of step S5 is to acquire the second element through the supplementary voice interaction to match and satisfy the necessary parameters that are not matched to the first element, and make the sum of the weighted values of the necessary parameters respectively corresponding to the second element and the first element greater than or equal to the preset threshold value, i.e. after acquiring enough second elements, the intention is changed from a fuzzy intention to a clear intention that can be executed. The number of turns of voice interaction in step S5 is at least one, each turn corresponds to obtaining a second element, and the number of turns to be performed depends on the difference between the weight value of the necessary parameter corresponding to the first element obtained in step S2 and the preset threshold and the weight value of the necessary parameter that is not matched to the first element.
Preferably, the complementary voice interaction is performed for at most two rounds.
Specifically, in order to improve the voice interaction experience of the user as much as possible, the number of turns of voice interaction must be reduced as much as possible, so that the supplementary voice interaction is set to be performed at most two turns, so that after the voice information of the user is acquired for the first time, the acquisition of the second element is completed through the at most two turns of supplementary voice interaction, and the conversation is ended, thereby greatly improving the user experience.
Further, in order to complete a conversation within two rounds of supplementary voice interaction, a preset threshold value and weight values of necessary parameters need to be preset, that is, the sum of the weight values of three necessary parameters needs to be greater than or equal to the preset threshold value in necessary parameters required for executing an intention, and the necessary parameter with the smallest weight value must be included in the three necessary parameters.
Specifically, in an extreme case, the speech information acquired in step S1 is analyzed in step S2, and only one necessary parameter matches the acquired first element, so that the acquisition of the second element needs to be performed in step S5. Based on the requirement of improving the user experience, the supplementary voice interaction is only set to at most two rounds, and the second element obtained in each round only matches one necessary parameter, so that the second element obtained in step S5 can match and satisfy at most two necessary parameters, and then at most three necessary parameters can be satisfied through steps S2 and S5. Then, it is necessary to set the value of the preset threshold and the weight value of the necessary parameter so that the sum of the weight values of the three necessary parameters is greater than or equal to the preset threshold. Meanwhile, since the necessary parameter matched to the first element and satisfied in step S2 is determined according to the first element in the voice message input by the user, and there is uncertainty that the first element acquired in step S2 may match the necessary parameter with the smallest weight value, in order to avoid the influence of the uncertainty, the necessary parameter with the smallest weight value must be included in the three necessary parameters whose sum of the weight values is greater than or equal to the preset threshold value.
Preferably, in the necessary parameters required for executing the intention, the sum of the weight values of two of the necessary parameters is greater than or equal to the preset threshold.
Specifically, according to the foregoing, at most two supplementary voice interactions are preferably set in step S5, and in a further requirement for improving the user experience, it is preferable to complete the dialog only once in step S5, and then further setting the weight value of the necessary parameter is required to make it possible to complete the dialog only once. Then, also in an extreme case, after the speech information obtained in step S1 is analyzed in step S2, only one necessary parameter matches the obtained first element, and the second element obtained through one round of the supplementary speech interaction in step S5 matches one necessary parameter, and two necessary parameters are satisfied by the elements, it is necessary to set the sum of the weighted values of the two necessary parameters to be greater than or equal to the preset threshold.
Therefore, in an extreme case, if only one necessary parameter is matched to the acquired first element in step S2, and the necessary parameter is one of two necessary parameters of which the sum of the weighted values is greater than or equal to the preset threshold, step S5 acquires a second element corresponding to the other necessary parameter through one round of voice interaction, i.e., the dialog can be completed through one round of voice interaction, thereby further improving the user experience.
Preferably, in step S51, the method further includes:
step S511: judging the weight value of the necessary parameter which is not matched with the first element;
step S512: and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
Specifically, the second element corresponding to the necessary parameter with the largest weight value is preferably obtained to satisfy the necessary parameter, so that step S51 can quickly realize that the sum of the weight values of the necessary parameters corresponding to the second element and the first element respectively is greater than or equal to the preset threshold in as few turns of voice interaction as possible.
Preferably, the voice interaction method of the present embodiment further includes:
when the intention is executed, judging whether the necessary parameters which are not matched with the vacancy of the element exist;
if the necessary parameters for the vacancy exist, the necessary parameters for the vacancy are filled and the intent is performed.
Specifically, when the intention satisfies the execution condition that the weight value of the necessary parameter that the first element matches and satisfies is greater than or equal to the preset threshold value or the sum of the weight values of the necessary parameters that the second element and the first element respectively correspond to is greater than or equal to the preset threshold value, there may still be an empty necessary parameter that is not matched to any element. In the process of executing the intention, all necessary parameters should be satisfied in principle, only when part of the necessary parameters are satisfied and the corresponding weight values are greater than the preset threshold value, the core information required by the execution intention is acquired, the rest of the still vacant necessary parameters can be adaptively filled, the filling process does not need to be performed by the user in person, and the corresponding voice interaction system can fill by itself.
Further, for the filling of the necessary parameters of the vacancy, the filling is preferentially performed by matching the usage habits of the user in the history. And if no corresponding user habit exists in the history record, filling according to the first element or the first element and the second element in combination with the conventional situation.
Example two
The present embodiment takes an electric cooker as an example, and further describes the voice interaction method of the present invention through a voice interaction example.
The voice interaction application scene is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: the reservation time, the cooking mode, the rice type and the taste are set corresponding to the necessary parameters. Meanwhile, setting corresponding weight values and preset threshold values according to the influence degree of the necessary parameters on the functions of the electric cooker; the full scale is 10, the appointment time is 2, the cooking mode is 4.5, the rice type is 0.5, the taste is 2.5, and the preset threshold value is 6.
The user: i want to eat soft rice.
The voice information is obtained and analyzed, the intention of the user is obtained as cooking, and two first elements are obtained simultaneously: the method comprises the following steps that (1) the user feels soft, the user can cook rice, the user feels soft, the user feels cooked rice, the cooking mode corresponds to the cooking mode, the sum of the two necessary parameters is 7 and is larger than; meanwhile, the intention of cooking is carried out according to the 'soft' and 'rice' and combining the custom of the user and the conventional situation to fill the appointment time, the rice type and the rice type by oneself.
Preferably, the voice information of the user and the first element are matched to reply to the user by voice.
And (3) recovering: you are cooking a bit of soft rice.
EXAMPLE III
The present embodiment takes an electric cooker as an example, and further describes the voice interaction method of the present invention through a voice interaction example.
The voice interaction application scene is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: the reservation time, the cooking mode, the rice type and the taste are set corresponding to the necessary parameters. Meanwhile, setting corresponding weight values and preset threshold values according to the influence degree of the necessary parameters on the functions of the electric cooker; the full scale is 10, the appointment time is 2, the cooking mode is 4.5, the rice type is 0.5, the taste is 2.5, and the preset threshold value is 6.
The user: i need to eat rice.
The voice information is obtained and analyzed, the intention of the user is obtained as cooking, and simultaneously a first element is obtained: the method comprises the steps that rice is cooked, the rice corresponds to a cooking mode, necessary parameters of the cooking mode are met, the weight value of the parameters is 4.5 and is smaller than a preset threshold value 6, the intention of a user is not clear intention and is fuzzy intention, and then the subsequent voice interaction step is continued. Except for the cooking mode, if the weighted value of the mouthfeel is the largest in the remaining necessary parameters, the second element matched with the mouthfeel is obtained preferentially through one round of voice interaction.
And (3) recovering: asking for what taste rice you want to eat?
The user: is a little softer.
Acquiring a second element 'soft' matched with the mouthfeel, wherein the sum of the mouthfeel and the weight value of the cooking mode is 7 and is greater than a preset threshold value 6, the intention of the user is converted into an executable clear intention from a fuzzy intention, and the conversation is finished; meanwhile, the intention of cooking is carried out according to the 'soft' and 'rice' and combining the custom of the user and the conventional situation to fill the appointment time, the rice type and the rice type by oneself.
Preferably, the voice information of the user and the first element and the second element are matched to reply to the user by voice.
And (3) recovering: you are cooking a bit of soft rice.
In addition, in the above example, if the user cannot answer the mouth feel, the next round of voice interaction is continued, such as:
and (3) recovering: asking for what taste rice you want to eat?
The user: is not known.
And removing the necessary parameter of the mouthfeel, wherein the largest weighted value in the rest necessary parameters is the appointment time, and preferentially acquiring a second element matched with the appointment time through the next round of voice interaction.
And (3) recovering: ask how long do you want to reserve?
The user: XX hours.
Acquiring a second element 'XX time' matching the reservation time, wherein the reservation time and the cooking mode are satisfied, the sum of the weight values of the reservation time and the cooking mode is 6.5 and is greater than a preset threshold value 6, the intention of a user is a clear intention which can be directly executed, and the conversation is completed; meanwhile, the intention of cooking is carried out according to the 'soft' and 'rice' and the combination of the user habit and the conventional situation for filling the rice type and the rice type by oneself.
Preferably, the voice information of the user and the first element and the second element are matched to reply to the user by voice.
And (3) recovering: you cook rice after XX hours.
Further, if the user has the condition that the necessary parameters cannot be matched in the voice information which cannot be answered or answered in the interaction process, according to the remaining necessary parameters, the next round of supplementary voice interaction is sequentially carried out from large to small according to the weight values to obtain the second element and match the corresponding necessary parameters, so that the execution condition of the intention is finally met. If the number of turns of the supplementary voice interaction exceeds the preset number of turns of two turns due to an unexpected situation (the user cannot reply), the setting to be disconnected from the conversation or the next round of supplementary voice interaction is continued when the weight values of the remaining necessary parameters can satisfy the intention execution condition according to the actual situation. If the necessary parameters matched by voice interaction finally fail to satisfy the intended execution conditions (the user cannot reply or reply cannot be recognized), the user is replied by the fundamentals, for example:
and (3) recovering: ask you to tell me after thinking.
Further, if the sum of the weighted values of all the remaining necessary parameters and the weighted values of the satisfied necessary parameters cannot reach the condition that the sum is greater than or equal to the preset threshold value in the process of the voice-filling interaction, the dialog may be set to be ended in advance through the bottom-pocketing operation, and meaningless voice-filling interaction is not continued round by round.
Example four
The present embodiment takes an electric cooker as an example, and further describes the voice interaction method of the present invention through a voice interaction example.
The voice interaction application scene is a specific household appliance, namely an electric cooker, and the known intelligent electric cooker has the following functional parameters: the reservation time, the cooking mode, the rice type and the taste are set corresponding to the necessary parameters. Meanwhile, setting corresponding weight values and preset threshold values according to the influence degree of the necessary parameters on the functions of the electric cooker; the full scale is 10, the appointment time is 2, the cooking mode is 4.5, the rice type is 0.5, the taste is 2.5, and the preset threshold value is 6.
The user: i want to eat soft rice.
Analyzing the voice information, acquiring corresponding intention 'cooking' and first elements 'soft' and 'rice', searching a historical record, and judging whether a target record matching the corresponding intention and the first elements exists in the historical record. And if the target record does not exist, continuing to enter the subsequent steps of the voice interaction method. If the target records exist, whether only one record with the highest frequency of occurrence exists in the target records is further judged.
If only one record with the highest frequency of occurrence exists, replying the user for confirmation according to the only one record with the highest frequency of occurrence, for example:
and (3) recovering: matching to the cooking record that you have the most used, appointment time: XX, cooking mode: XX, rice species: XX, rice type: XX, mouthfeel: XX, whether to cook according to the record.
If there is no only one record with the highest frequency of occurrence, replying to the user for confirmation according to the latest record in the target record, for example:
and (3) recovering: matching to the cooking record that you have the most recent usage, appointment time: XX, cooking mode: XX, rice species: XX, rice type: XX, mouthfeel: XX, whether to cook according to the record.
If the user replies agreement, executing according to the corresponding target record and skipping the subsequent steps of the voice interaction method; and if the user replies disagreement, continuing to enter corresponding steps subsequent to the voice interaction method according to whether the intention of the user is an explicit intention. In the above example, after the user replies the disagreement, according to the analysis result of the voice information, the necessary parameters corresponding to the first element "soft" and "rice" are the mouth feel and the cooking mode, the sum of the weighted values of the two is 7, and is greater than the preset threshold 6, so that the intention of the user is a clear intention, and the subsequent voice interaction step is skipped; meanwhile, the intention of cooking is carried out according to the 'soft' and 'rice' and combining the custom of the user and the conventional situation to fill the appointment time, the rice type and the rice type by oneself.
In addition, regardless of whether the speech information first input by the user corresponds to a clear intention or a vague intention, the target record is first matched in the history. Such as:
the user: i need to eat rice.
And similarly, analyzing the voice information, matching corresponding target records in the history records according to the analysis result, and performing corresponding steps according to whether the target records exist or not. And if the corresponding target record does not exist or the user does not agree to execute according to the target record, entering the subsequent corresponding step of the voice interaction method according to the analysis result of the voice information. For the above example, the first element "rice" matching the cooking pattern is analyzed, the weight value of the first element "rice" is 4.5, the weight value of the first element "rice" is smaller than the preset threshold 6, the intention is not an explicit intention, the subsequent interaction step is performed, and the subsequent step is performed with reference to the related step in the third embodiment.
EXAMPLE five
According to an embodiment of the present invention, there is also provided a voice interaction system, including:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring an intention corresponding to the voice information and a first element of a necessary parameter corresponding to the intention;
the judging module is used for judging whether the intention is a clear intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold or not;
and the execution module is used for executing the intention when the intention is an explicit intention.
EXAMPLE six
According to an embodiment of the present invention, there is also provided a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for voice interaction in any one of the above embodiments is implemented.
EXAMPLE seven
According to an embodiment of the present invention, there is also provided an electronic device, where the electronic device includes a memory and a processor, and the memory stores a computer program that can run on the processor, and when the computer program is executed by the processor, the electronic device implements the voice interaction method in any one of the above embodiments.
Through the explanation of the above embodiments, according to the voice interaction method, the voice interaction system, the storage medium and the electronic device provided by the invention, the preset threshold, the necessary parameters and the weight values of the necessary parameters are set in the corresponding application scene, so long as the weight values of the necessary parameters matched to the elements through the analysis of the voice information are greater than or equal to the preset threshold, the execution of the user intention can be performed, and the necessary parameters do not need to be matched one by one through multiple rounds of conversations with the user, so that the number of rounds of voice interaction is greatly reduced, and the user experience in the voice interaction process is remarkably improved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (11)

1. A method of voice interaction, comprising:
acquiring voice information;
analyzing the voice information, and acquiring an intention corresponding to the voice information and a first element corresponding to the intention, wherein the first element is matched with necessary parameters required for executing the intention;
judging whether the intention is a clear intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold value or not;
if the intent is an explicit intent, then the intent is executed.
2. The voice interaction method of claim 1, further comprising:
if the intention is not an explicit intention, performing supplementary voice interaction to obtain a second element, wherein the second element is matched with the necessary parameters which are not matched with the first element;
the sum of the weighted values of the necessary parameters respectively corresponding to the second element and the first element is greater than or equal to the preset threshold;
and judging the intention as an explicit intention and executing the intention.
3. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:
judging whether a target record matched with the intention exists in a historical record or not according to the analysis result of the voice information;
and if the target record matched with the intention exists in the historical record, performing a round of voice interaction to confirm whether the intention is executed according to the target record.
4. The method of claim 3, wherein if there is a target record matching the intention in the history, performing a round of voice interaction to determine whether to execute the intention according to the target record comprises:
judging whether only one record with the highest occurrence frequency exists in the target records;
if only one record with the highest occurrence frequency exists, performing a round of voice interaction to determine whether the intention is executed according to the record with the highest occurrence frequency;
and if the only record with the highest frequency of occurrence does not exist, performing a round of voice interaction to confirm whether the intention is executed according to the latest record in the target records.
5. The voice interaction method according to claim 2, wherein performing a supplementary voice interaction to obtain a second element that matches the necessary parameter that is not matched to the first element comprises:
judging the weight value of the necessary parameter which is not matched with the first element;
and preferentially acquiring the second element corresponding to the necessary parameter with the largest weight value through the supplementary voice interaction.
6. The method of claim 2, wherein the supplemental voice interaction is performed for at most two rounds.
7. The method according to claim 6, wherein, in the necessary parameters required for executing the intention, the sum of the weight values of two necessary parameters is greater than or equal to the preset threshold.
8. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:
when the intention is executed, judging whether the necessary parameters which are not matched with the vacancy of the element exist;
if the necessary parameters for the vacancy exist, the necessary parameters for the vacancy are filled and the intent is performed.
9. A voice interaction system, comprising:
the voice acquisition module is used for acquiring voice information;
the analysis module is used for analyzing the voice information and acquiring an intention corresponding to the voice information and a first element of a necessary parameter corresponding to the intention;
the judging module is used for judging whether the intention is an explicit intention or not according to whether the weight value of the necessary parameter matched with the first element is larger than or equal to a preset threshold or not;
an execution module to execute the intent when the intent is an explicit intent.
10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the voice interaction method of any one of claims 1 to 8.
11. An electronic device, characterized in that the electronic device comprises a memory, a processor, a computer program being stored on the memory and being executable on the processor, the computer program, when executed by the processor, implementing the voice interaction method according to any one of claims 1 to 8.
CN202011248204.9A 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment Active CN112331185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011248204.9A CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011248204.9A CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112331185A true CN112331185A (en) 2021-02-05
CN112331185B CN112331185B (en) 2023-08-11

Family

ID=74317969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011248204.9A Active CN112331185B (en) 2020-11-10 2020-11-10 Voice interaction method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112331185B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113889076A (en) * 2021-09-13 2022-01-04 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847278A (en) * 2012-12-31 2017-06-13 威盛电子股份有限公司 System of selection and its mobile terminal apparatus and information system based on speech recognition
CN109657236A (en) * 2018-12-07 2019-04-19 腾讯科技(深圳)有限公司 Guidance information acquisition methods, device, electronic device and storage medium
US10418032B1 (en) * 2015-04-10 2019-09-17 Soundhound, Inc. System and methods for a virtual assistant to manage and use context in a natural language dialog
CN110516786A (en) * 2019-08-28 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of dialogue management method and apparatus
CN111223485A (en) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 Intelligent interaction method and device, electronic equipment and storage medium
CN111680144A (en) * 2020-06-03 2020-09-18 湖北亿咖通科技有限公司 Method and system for multi-turn dialogue voice interaction, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847278A (en) * 2012-12-31 2017-06-13 威盛电子股份有限公司 System of selection and its mobile terminal apparatus and information system based on speech recognition
US10418032B1 (en) * 2015-04-10 2019-09-17 Soundhound, Inc. System and methods for a virtual assistant to manage and use context in a natural language dialog
CN109657236A (en) * 2018-12-07 2019-04-19 腾讯科技(深圳)有限公司 Guidance information acquisition methods, device, electronic device and storage medium
CN110516786A (en) * 2019-08-28 2019-11-29 出门问问(武汉)信息科技有限公司 A kind of dialogue management method and apparatus
CN111223485A (en) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 Intelligent interaction method and device, electronic equipment and storage medium
CN111680144A (en) * 2020-06-03 2020-09-18 湖北亿咖通科技有限公司 Method and system for multi-turn dialogue voice interaction, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113889076A (en) * 2021-09-13 2022-01-04 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium
CN113889076B (en) * 2021-09-13 2022-11-01 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112331185B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
JP6991251B2 (en) Voice user interface shortcuts for assistant applications
CN107644641B (en) Dialog scene recognition method, terminal and computer-readable storage medium
CN107452376B (en) Method for controlling cooking through voice
McCarthy et al. Experience-based critiquing: Reusing critiquing experiences to improve conversational recommendation
CN113485144B (en) Intelligent home control method and system based on Internet of things
CN110070857B (en) Model parameter adjusting method and device of voice awakening model and voice equipment
CN104473556A (en) Method and device for controlling cooking device and cooking device
CN108334606B (en) Voice interaction method and device for smart home and server
CN112331185A (en) Voice interaction method, system, storage medium and electronic equipment
CN110021299B (en) Voice interaction method, device, system and storage medium
CN112017754A (en) Menu recommendation method and device, range hood and storage medium
CN108903587A (en) Voice auxiliary cooking method, speech processing device and computer readable storage medium
JP2007299159A (en) Content retrieval device
CN113009839B (en) Scene recommendation method and device, storage medium and electronic equipment
CN111222553A (en) Training data processing method and device of machine learning model and computer equipment
CN112182189A (en) Conversation processing method and device, electronic equipment and storage medium
WO2018146923A1 (en) Distributed coordination system, apparatus behavior monitoring device, and appliance
CN117669717A (en) Knowledge enhancement-based large model question-answering method, device, equipment and medium
CN115809669B (en) Dialogue management method and electronic equipment
CN113597623A (en) Apparatus and method for determining cooking ability index
CN115079579A (en) Method and device for controlling intelligent voice equipment and intelligent voice equipment
CN111210232B (en) Data processing method and device and electronic equipment
CN113920995A (en) Processing method and device of voice engine, electronic equipment and storage medium
CN111667082A (en) Feedback method and apparatus, storage medium, and electronic apparatus
CN109544195B (en) Information processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant