CN107644641B - Dialog scene recognition method, terminal and computer-readable storage medium - Google Patents

Dialog scene recognition method, terminal and computer-readable storage medium Download PDF

Info

Publication number
CN107644641B
CN107644641B CN201710636464.5A CN201710636464A CN107644641B CN 107644641 B CN107644641 B CN 107644641B CN 201710636464 A CN201710636464 A CN 201710636464A CN 107644641 B CN107644641 B CN 107644641B
Authority
CN
China
Prior art keywords
scene
information
user
candidate
scenes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710636464.5A
Other languages
Chinese (zh)
Other versions
CN107644641A (en
Inventor
卢道和
郑德荣
张超
杨海军
钟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201710636464.5A priority Critical patent/CN107644641B/en
Publication of CN107644641A publication Critical patent/CN107644641A/en
Application granted granted Critical
Publication of CN107644641B publication Critical patent/CN107644641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a conversation scene recognition method, which comprises the following steps: receiving input user dialogue information; screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information; based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information; and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene. The invention also discloses a dialog scene recognition terminal and a computer readable storage medium. The invention realizes the accurate recognition of the scene in the conversation scene change process.

Description

Dialog scene recognition method, terminal and computer-readable storage medium
Technical Field
The present invention relates to the field of dialog scene recognition technologies, and in particular, to a dialog scene recognition method, a terminal, and a computer-readable storage medium.
Background
The automatic question-answering is a task of automatically answering questions posed by a user by using a computer to meet the knowledge requirements of the user, and is a high-level form of information service. In recent years, with the rapid development of artificial intelligence, automatic question answering has become a research direction which is concerned about and has a wide development prospect, the automatic question answering is considered as one of main tasks for verifying whether a machine has natural language understanding capability, and the research of the automatic question answering is beneficial to promoting the development of related subjects of the artificial intelligence.
However, the current automatic question-answering system is not perfect and still faces many specific problems and difficulties. Most of the existing intelligent robots are single-round conversation systems, data such as conversation context information, user historical conversation information and the like are not considered, a plurality of given answers are disconnected and more abrupt, and user experience is seriously influenced. Aiming at the situation, a multi-scene recognition method is provided, and during the conversation process, the method can actively adapt to the conversation scene of the user according to the current input, historical conversation information and other data of the user, so that the conversation is smoother and more natural.
Disclosure of Invention
The invention mainly aims to provide a conversation scene recognition method, a recognition terminal and a computer readable storage medium, and aims to solve the technical problem that scene change is difficult to accurately recognize in the voice interaction process of an intelligent robot and a human.
In order to achieve the above object, the present invention provides a dialog scene recognition method, including:
receiving input user dialogue information;
screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening the preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information;
based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;
and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.
Preferably, the step of screening candidate scenes by using a preset scene recognition rule based on the user session information to obtain a first class of candidate scenes corresponding to the user session information includes:
extracting key words in the user dialogue information;
based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
Preferably, the step of screening the preset candidate scenes by using a scene discrimination model based on the user session information to obtain the second type of candidate scenes corresponding to the user session information includes:
extracting characteristic information in the user dialogue information;
and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.
Preferably, the step of performing reinforcement learning processing on the user dialog information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialog scene corresponding to the user dialog information includes:
and taking the user dialogue information as an action and the first-class candidate scenes and the second-class candidate scenes as states, and performing reinforcement learning processing on the user dialogue information so as to screen out the optimal dialogue scene from the first-class candidate scenes and the second-class candidate scenes.
In order to achieve the above object, the present invention further provides an identification terminal, including:
a memory storing a dialogue scene recognition program;
a processor configured to execute the dialog scene recognition program to perform the following operations:
receiving input user dialogue information;
screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening the preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information;
based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;
and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.
Optionally, the performing, based on the user dialog information, an operation of screening preset candidate scenes by using a preset scene recognition rule to obtain a first type of candidate scenes corresponding to the user dialog information includes:
extracting key words in the user dialogue information;
based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
Optionally, the performing, based on the user dialog information, the operation of screening the preset candidate scene by using a scene discrimination model to obtain a second type of candidate scene corresponding to the user dialog information includes:
extracting characteristic information in the user dialogue information;
and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.
Preferably, the executing the operation of performing reinforcement learning processing on the user dialog information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialog scene corresponding to the user dialog information includes:
and taking the user dialogue information as an action and the first-class candidate scenes and the second-class candidate scenes as states, and performing reinforcement learning processing on the user dialogue information so as to screen out the optimal dialogue scene from the first-class candidate scenes and the second-class candidate scenes.
To achieve the above object, the present invention further provides a computer-readable storage medium having a dialog scene recognition program stored thereon, which, when executed by a processor, implements the steps of the dialog scene recognition method according to any one of the above.
In the invention, in the specific scene change process, when a conversation scene changes, firstly, a preset scene recognition rule and a scene discrimination model are adopted to carry out first screening on preset candidate scenes to obtain corresponding candidate scenes, then, an enhanced learning strategy is utilized to carry out second screening on the candidate scenes to further obtain the optimal conversation scene, and finally, a proper conversation result is selected according to the optimal conversation scene to reply to a user, so that the accurate recognition of the current conversation scene change is realized, and the use experience of human-computer interaction of the user is further improved.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of a dialog scene recognition method for an intelligent robot according to the present invention;
FIG. 2 is a detailed flowchart of step S20 in FIG. 1;
FIG. 3 is a detailed flowchart of step S30 in FIG. 1;
fig. 4 is a detailed flowchart of step S40 in fig. 1.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a conversation scene recognition method of an intelligent robot according to the present invention. In this embodiment, the dialog scene recognition method includes the following steps:
step S10, receiving input user dialogue information;
in this embodiment, when a user has a conversation with the intelligent robot, the intelligent robot receives the conversation information input by the user through the voice interaction system. The mode of receiving the user dialogue information by the intelligent robot comprises the steps of converting voice information and utilizing a voice recognition system in the system.
The information conversion of the sound is to convert the sound into electric waves, convert the sound into information which can be identified by the intelligent robot, and the robot performs corresponding feedback by receiving the information.
When the voice recognition system is used, the tone color and the like of the user are recognized, and corresponding feedback is made through the information corresponding to the tone and the tone color.
Step S20, based on the user dialogue information, adopting a preset scene recognition rule to screen preset candidate scenes to obtain a first type of candidate scenes corresponding to the user dialogue information;
in this embodiment, when receiving dialog information input by a user, the preset scene information is screened by using a scene recognition rule to obtain a corresponding candidate scene. The scene recognition rule is specifically a dialog rule set based on practical experience of a person, and the setting of the preset scene recognition rule is not limited in the present embodiment.
For example, scene recognition is performed by acquiring rules in the dialog information. For example, when the user inputs the dialog information "who is you and is old this year", the obtained rules are "identity" and "age", and the scene recognition rule can be set as the basic introduction of the person, so that the corresponding answer can be "you are good, i call small white, and is old this year". For another example, when the user inputs the dialog information "i want to query for an air ticket", the acquired dialog keyword is "query for an air ticket", and the scene recognition rule is set up to be provided in a manner of querying for an air ticket, so that the corresponding answer may be "which day the departure time is".
For another example, scene recognition is performed by correspondence of questions and answers. For example, when the user inputs the dialogue information "who you are more commander with me", the answer directly corresponding to the scene recognition rule at this time may be "certainly me" or "me is more commander". For another example, when the user inputs the dialogue information as "good night", the user may directly answer "good night".
Step S30, based on the user dialogue information, adopting a scene discrimination model to screen the preset candidate scene, and obtaining a second type of candidate scene corresponding to the user dialogue information;
in this embodiment, the scene discrimination model is an algorithm for machine learning, and specifically, many pieces of relevant information are extracted, and in combination with the historical information, whether the historical information is similar to the extracted information is found, and if so, similar scenes are classified into one category. There are various algorithms of the discriminant model, and the type of the algorithm of the discriminant model used in the present embodiment is not limited. In this embodiment, the discriminant model calculation method used may be: and calculating the correlation of the characteristic information and the candidate scene information so as to determine the candidate scene.
For example, when the user session information is "i want to exercise", the user information is acquired according to the information, the user wants to exercise, and the related information of the question about exercise or exercise mentioned by the previous session is acquired from the historical session, and when the related feature information includes the user intention understanding, the historical session information, and the user preference information, the scene is filtered through the discriminant model according to the feature information, and the answer at this time may be "go x exercise club go to the bar" or "go to protect the city river side bar" instead of "i don't want to exercise".
For another example, when the user conversation information is "what is eaten at dinner", there may exist related scenes in the scene that "dinner goes to a western-style restaurant to eat steak", "dinner goes to a Hunan dish", "dinner does an egg by itself to cook rice", or "I do not eat dinner". The four scenes have a similarity that the user eats dinner, but the fourth scene and the first three scenes obviously have a problem that the object eating dinner is different, and the user becomes a robot while the fourth scene becomes a robot. So when the user says "what dinner" it is obvious that the fourth is not compliant with the scenario at that time.
It should be noted that the order of executing steps S20 and S30 is not limited, for example, step S20 is executed first and then step S30 is executed, or step S30 is executed first and then step S20 is executed, or steps S20 and S30 are executed simultaneously.
Step S40, based on the first type candidate scene and the second type candidate scene, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;
in this embodiment, reinforcement learning is an autonomous learning process, and an optimal action that can achieve the target is selected through continuous learning. The reinforcement learning may be implemented in various ways, and since there are a value function (state value function), a Q function (action value function), or a policy, etc. in the reinforcement learning process, and the value function, the Q function, or the policy are defined differently according to different tasks, the implementation manner thereof in the present embodiment is not limited.
In this embodiment, the first category candidate scene obtained in step S20 and the second category candidate scene obtained in step S30 are used as the candidate scene set for reinforcement learning. When the user dialogue information action is obtained, the candidate scene is taken as a state set, and the candidate scene is screened through reinforcement learning.
For example, when the user dialog information is 'what is eaten at dinner', a first-class candidate scene and a second-class candidate scene are obtained after the first screening of the scene recognition rule and the scene discrimination model. At this time, the candidate scenes may include: the dinner party has no need of eating steak in a western-style food restaurant, the dinner party has no need of eating Hunan dishes, the dinner party has no need of cooking eggs and rice, the dinner party has no need of cooking dishes, and the like.
When the user asks the problem of 'what the dinner eaten', the intelligent robot clearly knows the target of the user, and through the calculation of a value function, each scene corresponds to one value, for example, the value corresponding to 'eating steak in a western-style restaurant at dinner' is 54, the value corresponding to 'eating Hunan dish at dinner' is 63, the value corresponding to 'the dinner self-cooking bar' is 72, the value corresponding to 'eating the dinner bar after six hours' is 81, and the value corresponding to 'doing egg and cooking at dinner self' is 90, at the moment, different values represent the degree of proximity to the target (full score is 100, namely the target is reached), so the optimal answer can be 'doing egg and cooking at dinner self'.
In this embodiment, through calculation of a value function in reinforcement learning, a scoring ranking is performed on scenes by using a numerical value, and a scene with the highest score is selected as an optimal scene to be output.
Step S50, judging whether the optimal dialog scene is the same as the current dialog scene, if not, taking the optimal dialog scene as the current dialog scene;
in this embodiment, when an optimal dialog scene is screened out through reinforcement learning, the optimal dialog scene is compared with the current dialog scene, if the optimal dialog scene is not changed from the current dialog scene, the current dialog scene is not changed, and if the optimal dialog scene is different from the current dialog scene through comparison, the optimal dialog scene is output as the latest current dialog scene. For example, the scene in which the user has previously interacted with the robot is a topic scene about weather, and when the user has interacted with the robot again, a topic scene about eating is screened out according to the conversation information, and at this time, the conversation scene is changed from the previous topic scene about weather to the topic scene about eating.
In the embodiment, the most suitable dialog scene in the optimal dialog scene and the current dialog scene is output, and corresponding operation is performed according to the most suitable dialog scene.
Example two:
referring to fig. 2, fig. 2 is a detailed flowchart of step S20 in fig. 1. Based on the first embodiment, in this embodiment, the step S20 further includes:
step S201, extracting keywords in the user dialogue information;
step S202, based on the keywords, selecting scenes related to the keywords from the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
In this embodiment, when receiving dialog information input by a user, extracting a keyword from the dialog information, then screening a scene with a higher degree of correlation with the keyword from preset candidate scenes according to the extracted keyword, and using a part of the scene with the higher degree of correlation as a first type of candidate scenes.
For example, the dialog information input by the user is 'i want to eat dishes', when the input information is received, the keywords 'eating' and 'dishes' in the information are extracted, and at the moment, the keywords are matched with the preset candidate scenes to obtain partial scenes with high correlation degree. When the keywords 'eat' and 'food' are obtained, scene information about nearby brassica oleracea may exist in the matched scene, and more scenes about 'eat' and 'brassica oleracea' are obtained as first-class candidate scenes.
For another example, the dialog information input by the user is "i want to go to fitness", and when receiving the dialog information, the keyword "fitness" in the dialog information can be clearly acquired, and the acquired scene information may be: a series of scenes related to the "fitness" such as "what the fitness item is", "where the fitness is located", "how to exercise", and the like, appear in the candidate scenes, for example, the dialog output by the intelligent robot is "you want to run or go to a gym", and the user may select "i want to run" again. At this point, the scene of the conversation is selected.
In this embodiment, the first-class candidate scene information is obtained through the recognition rule, the obtained keyword of the dialog information, and the matched dialog scene is also a dialog scene that mentions the keyword, which is the set dialog scene information and will not change with the change of the user.
Example three:
referring to fig. 3, fig. 3 is a detailed flowchart of step S30 in fig. 1. Based on the first embodiment, in this embodiment, the step S30 further includes:
step S301, extracting characteristic information in the user dialogue information;
step S302, based on the feature information, calculating the matching degree of the preset candidate scene and the feature information through the scene discrimination model, and taking the partial scene with higher matching degree as a second type candidate scene corresponding to the user dialogue information.
In this embodiment, when receiving dialog information input by a user, extracting feature information in the dialog information, and performing calculation on the matching degree between a preset candidate scene and the feature information to obtain a part of scenes with a higher matching degree as a second class of candidate scenes, where the extracted feature information includes a user intention, a relevance between the dialog information and a previous segment of dialog information, a preference of the user about such a problem in the preset candidate scene, and the like.
For example, when the dialog information input by the user is "what to eat at noon", the user information is obtained, and at the same time, according to the information and similar questions asked by the user before, the characteristic information at this time may have frequent consumption behaviors of the user, where the user frequently eats, taste of the user, etc., by comparing the characteristic information to select a suitable eating place, at this time, the user may reply to "go x restaurant to eat a kitchen bar" or "make a meal at home" etc.
For another example, the user session information is "i want to exercise", and the decision model obtains the user information, and obtains the exercise required by the user according to the information, and also obtains the information in the session information alone, and further obtains a set of problems related to exercise or exercise mentioned by the previous session from the historical session according to the historical information of the session, and extracts the selection preference, the exercise mode concerned, the frequent place and the like of the user from the set.
Through the second embodiment and the third embodiment, it can be found that, because the preset scene recognition rule is a set rule which cannot be changed, certain flexibility is lacked in the recognition process, and only one or a plurality of fixed answers can be replied when a problem is faced; the user intention can be accurately identified through the preset discrimination model, and a more appropriate answer is given, but a certain amount of dialogue information data needs to be accumulated by the method. Therefore, the preset candidate scenes are screened for the first time by the two methods, and a proper part of scenes are selected to narrow the screening range for the accurate screening for the second time.
Example four:
referring to fig. 4, fig. 4 is a detailed flowchart of step S40 in fig. 1. Based on the first embodiment, in this embodiment, the step S40 further includes:
step S401, receiving a first type of candidate scene and a second type of candidate scene, and taking the received candidate scenes as a state set of enhancement;
step S402, using the dialogue information as the action of reinforcement learning, screening out the best state in the state set through reinforcement learning, and further determining the best dialogue scene.
In this embodiment, after receiving a first-class candidate scene and a second-class candidate scene, regarding each scene in the candidate scenes as a state, regarding user session information as an action, calculating each state by combining a value function or a Q function in reinforcement learning with a current action, sorting the calculated values, and selecting a scene with a largest value as an optimal state, where the scene corresponding to the optimal state is the optimal session scene.
For example, when the user dialog information is "i want to go to fitness", the scene set of the first type candidate scene and the second type candidate scene obtained by the rule and scene discrimination model may have "go x exercise club to go to a bar", "go to a river side running bar", "go to a fitness coach bar", "go to a fitness card bar", "i do not want to exercise", and the like. The intelligent robot knows the user's goal clearly, and through the calculation of the value function, each scene corresponds to a value, for example, the value corresponding to "i do not do exercise" is 54, the value corresponding to "go to protect city and river side running bar" is 63, the value corresponding to "go x exercise club go to bar" is 72, the value corresponding to "go to do a fitness card bar" is 81, the value corresponding to "go to find a fitness coach bar" is 90, so the optimal answer at this time may be "go to find a fitness coach bar".
The invention also protects a conversation scene recognition terminal.
In an embodiment of the terminal of the present invention, the dialog scene recognition terminal includes:
a memory storing a dialogue scene recognition program; a processor configured to execute the dialog scene recognition program to perform the following operations:
receiving input user dialogue information;
screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening the preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information;
based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;
and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.
In this embodiment, when a user has a conversation with the intelligent robot, the intelligent robot receives the conversation information input by the user through the voice interaction system. The mode of receiving the user dialogue information by the intelligent robot comprises the steps of converting voice information and utilizing a voice recognition system in the system.
The information conversion of the sound is to convert the sound into electric waves, convert the sound into information which can be identified by the intelligent robot, and the robot performs corresponding feedback by receiving the information.
When the voice recognition system is used, the tone color and the like of the user are recognized, and corresponding feedback is made through the information corresponding to the tone and the tone color.
In this embodiment, when receiving dialog information input by a user, the preset scene information is screened by using a scene recognition rule to obtain a corresponding candidate scene. The scene recognition rule is specifically a dialog rule set based on practical experience of a person, and the setting of the preset scene recognition rule is not limited in the present embodiment.
For example, scene recognition is performed by acquiring keywords in the dialog information. For example, when the user inputs the dialog information "who is you and is old this year", the obtained rules are "identity" and "age", and the scene recognition rule can be set as the basic introduction of the person, so that the corresponding answer can be "you are good, i call small white, and is old this year". For another example, when the user inputs the dialog information that "i want to inquire about an air ticket", the acquired dialog keyword is "inquire about an air ticket", and at this time, the scene recognition rule is set up to be provided in a manner of inquiring about an air ticket, so that the corresponding answer may be "which day at the departure time".
For another example, scene recognition is performed by correspondence of questions and answers. For example, when the user inputs the dialogue information "who you are more commander with me", the answer directly corresponding to the scene recognition rule at this time may be "certainly me" or "me is more commander". For another example, when the user inputs the dialogue information as "good night", the user may directly answer "good night".
In this embodiment, the scene discrimination model is an algorithm for machine learning, and specifically, many pieces of relevant information are extracted, and in combination with the historical information, whether the historical information is similar to the extracted information is found, and if so, similar scenes are classified into one category. There are various algorithms of the discriminant model, and the type of the algorithm of the discriminant model used in the present embodiment is not limited. In this embodiment, the discriminant model calculation method used may be: and calculating the correlation of the characteristic information and the candidate scene information so as to determine the candidate scene.
For example, when the user session information is "i want to exercise", the user information is acquired according to the information, the user wants to exercise, and the related information of the question about exercise or exercise mentioned by the previous session is acquired from the historical session, and when the related feature information includes the user intention understanding, the historical session information, and the user preference information, the scene is filtered through the discriminant model according to the feature information, and the answer at this time may be "go x exercise club go to the bar" or "go to protect the city river side bar" instead of "i don't want to exercise".
For another example, when the user conversation information is "what is eaten at dinner", there may exist related scenes in the scene that "dinner goes to a western-style restaurant to eat steak", "dinner goes to a Hunan dish", "dinner does an egg by itself to cook rice", or "I do not eat dinner". The four scenes have a similarity that the user eats dinner, but the fourth scene and the first three scenes obviously have a problem that the object eating dinner is different, and the user becomes a robot while the fourth scene becomes a robot. So when the user says "what dinner" it is obvious that the fourth is not compliant with the scenario at that time.
In this embodiment, reinforcement learning is an autonomous learning process, and an optimal action that can achieve the target is selected through continuous learning. The reinforcement learning may be implemented in various ways, and since there are a value function (state value function), a Q function (action value function), or a policy, etc. in the reinforcement learning process, and the value function, the Q function, or the policy are defined differently according to different tasks, the implementation manner thereof in the present embodiment is not limited.
In this embodiment, the first category candidate scene obtained in step S20 and the second category candidate scene obtained in step S30 are used as the candidate scene set for reinforcement learning. When the user dialogue information action is obtained, the candidate scene is taken as a state set, and the candidate scene is screened through reinforcement learning.
For example, when the user dialog information is 'what is eaten at dinner', a first-class candidate scene and a second-class candidate scene are obtained after the first screening of the scene recognition rule and the scene discrimination model. At this time, the candidate scenes may include: the dinner party has no need of eating steak in a western-style food restaurant, the dinner party has no need of eating Hunan dishes, the dinner party has no need of cooking eggs and rice, the dinner party has no need of cooking dishes, and the like.
When the user asks the problem of 'what the dinner eaten', the intelligent robot clearly knows the target of the user, and through the calculation of a value function, each scene corresponds to one value, for example, the value corresponding to 'eating steak in a western-style restaurant at dinner' is 54, the value corresponding to 'eating Hunan dish at dinner' is 63, the value corresponding to 'the dinner self-cooking bar' is 72, the value corresponding to 'eating the dinner bar after six hours' is 81, and the value corresponding to 'doing egg and cooking at dinner self' is 90, at the moment, different values represent the degree of proximity to the target (full score is 100, namely the target is reached), so the optimal answer can be 'doing egg and cooking at dinner self'.
In this embodiment, through calculation of a value function in reinforcement learning, a scoring ranking is performed on scenes by using a numerical value, and a scene with the highest score is selected as an optimal scene to be output.
In this embodiment, when an optimal dialog scene is screened out through reinforcement learning, the optimal dialog scene is compared with the current dialog scene, if the optimal dialog scene is not changed from the current dialog scene, the current dialog scene is not changed, and if the optimal dialog scene is different from the current dialog scene through comparison, the optimal dialog scene is output as the latest current dialog scene. For example, the scene in which the user has previously interacted with the robot is a topic scene about weather, and when the user has interacted with the robot again, a topic scene about eating is screened out according to the conversation information, and at this time, the conversation scene is changed from the previous topic scene about weather to the topic scene about eating.
In the embodiment, the most suitable dialog scene in the optimal dialog scene and the current dialog scene is output, and corresponding operation is performed according to the most suitable dialog scene.
Further optionally, in an embodiment of the terminal of the present invention, the operation of the processor performing screening on preset candidate scenes by using a preset scene recognition rule based on the user session information to obtain a first class of candidate scenes corresponding to the user session information includes:
extracting key words in the user dialogue information;
based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
In this embodiment, when receiving dialog information input by a user, extracting a keyword from the dialog information, then screening a scene with a higher degree of correlation with the keyword from preset candidate scenes according to the extracted keyword, and using a part of the scene with the higher degree of correlation as a first type of candidate scenes.
For example, the dialog information input by the user is 'i want to eat dishes', when the input information is received, the keywords 'eating' and 'dishes' in the information are extracted, and at the moment, the keywords are matched with the preset candidate scenes to obtain partial scenes with high correlation degree. When the keywords 'eat' and 'food' are obtained, scene information about nearby brassica oleracea may exist in the matched scene, and more scenes about 'eat' and 'brassica oleracea' are obtained as first-class candidate scenes.
For another example, the dialog information input by the user is "i want to go to fitness", and when receiving the dialog information, the keyword "fitness" in the dialog information can be clearly acquired, and the acquired scene information may be: a series of scenes related to the "fitness" such as "what the fitness item is", "where the fitness is located", "how to exercise", and the like, appear in the candidate scenes, for example, the dialog output by the intelligent robot is "you want to run or go to a gym", and the user may select "i want to run" again. At this point, the scene of the conversation is selected.
In this embodiment, the first-class candidate scene information is obtained through the recognition rule, the obtained keyword of the dialog information, and the matched dialog scene is also a dialog scene that mentions the keyword, which is the set dialog scene information and will not change with the change of the user.
Optionally, in an embodiment of the terminal of the present invention, the operation of the processor performing, based on the user session information, a scene discrimination model to filter the preset candidate scenes to obtain a second type of candidate scenes corresponding to the user session information includes:
extracting characteristic information in the user dialogue information;
and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.
In this embodiment, when receiving dialog information input by a user, extracting feature information in the dialog information, and performing calculation on the matching degree between a preset candidate scene and the feature information to obtain a part of scenes with a higher matching degree as a second class of candidate scenes, where the extracted feature information includes a user intention, a relevance between the dialog information and a previous segment of dialog information, a preference of the user about such a problem in the preset candidate scene, and the like.
For example, when the dialog information input by the user is "what to eat at noon", the user information is obtained, and at the same time, according to the information and similar questions asked by the user before, the characteristic information at this time may have frequent consumption behaviors of the user, where the user frequently eats, taste of the user, etc., by comparing the characteristic information to select a suitable eating place, at this time, the user may reply to "go x restaurant to eat a kitchen bar" or "make a meal at home" etc.
For another example, the user session information is "i want to exercise", and the decision model obtains the user information, and obtains the exercise required by the user according to the information, and also obtains the information in the session information alone, and further obtains a set of problems related to exercise or exercise mentioned by the previous session from the historical session according to the historical information of the session, and extracts the selection preference, the exercise mode concerned, the frequent place and the like of the user from the set.
Further optionally, in an embodiment of the terminal of the present invention, the executing, by the processor, the reinforcement learning processing on the user dialog information based on the first class of candidate scenes and the second class of candidate scenes to obtain an optimal dialog scene corresponding to the user dialog information includes:
receiving a first class candidate scene and a second class candidate scene, and taking the received candidate scenes as a state set of enhancement;
and taking the dialogue information as an action of reinforcement learning, screening out the optimal state in the state set through the reinforcement learning, and further determining the optimal dialogue scene.
In this embodiment, after receiving a first-class candidate scene and a second-class candidate scene, regarding each scene in the candidate scenes as a state, regarding user session information as an action, calculating each state by combining a value function or a Q function in reinforcement learning with a current action, sorting the calculated values, and selecting a scene with a largest value as an optimal state, where the scene corresponding to the optimal state is the optimal session scene.
For example, when the user dialog information is "i want to go to fitness", the scene set of the first type candidate scene and the second type candidate scene obtained by the rule and scene discrimination model may have "go x exercise club to go to a bar", "go to a river side running bar", "go to a fitness coach bar", "go to a fitness card bar", "i do not want to exercise", and the like. The intelligent robot knows the user's goal clearly, and through the calculation of the value function, each scene corresponds to a value, for example, the value corresponding to "i do not do exercise" is 54, the value corresponding to "go to protect city and river side running bar" is 63, the value corresponding to "go x exercise club go to bar" is 72, the value corresponding to "go to do a fitness card bar" is 81, the value corresponding to "go to find a fitness coach bar" is 90, so the optimal answer at this time may be "go to find a fitness coach bar".
The embodiment of the invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores thereon a dialog scene recognition program that, when executed by a processor, implements the steps of the dialog scene recognition method in the embodiments described above.
In a specific scene change process, when a dialog scene changes, a preset scene recognition rule and a scene discrimination model are firstly adopted to carry out first screening on preset candidate scenes to obtain corresponding candidate scenes, then an enhanced learning strategy is utilized to carry out second screening on the candidate scenes to further obtain an optimal dialog scene, and finally a proper dialog result is selected according to the optimal dialog scene to reply a user, so that accurate recognition of the current dialog scene change is realized, and the use experience of human-computer interaction of the user is further improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A conversation scene recognition method, characterized by comprising the steps of:
receiving input user dialogue information;
screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; extracting feature information in the user dialogue information, and screening the preset candidate scene by adopting a scene discrimination model based on the feature information to obtain a second type of candidate scene corresponding to the user dialogue information, wherein the feature information comprises user intention, relevance between the user dialogue information and the previous section of dialogue information and preference of a user about the problems in the preset candidate scene;
performing reinforcement learning processing on the user dialogue information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialogue scene corresponding to the user dialogue information, specifically including determining a value corresponding to the first-class candidate scene and a value corresponding to the second-class candidate scene through value function calculation in the reinforcement learning processing, performing scoring sorting based on the values corresponding to the first-class candidate scene and the second-class candidate scene, and determining a scene with the highest score as the optimal dialogue scene corresponding to the user dialogue information according to the scoring sorting;
and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.
2. The method for recognizing conversation scene according to claim 1, wherein the step of obtaining the first kind of candidate scene corresponding to the user conversation information by screening the candidate scene based on the user conversation information by using a preset scene recognition rule comprises:
extracting key words in the user dialogue information;
based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
3. The method for recognizing conversation scene according to claim 1, wherein the step of obtaining the second type of candidate scene corresponding to the user conversation information by screening the preset candidate scene with a scene discrimination model based on the feature information comprises:
and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.
4. The method for recognizing conversation scene according to claim 1, wherein the step of performing reinforcement learning processing on the user conversation information based on the first category candidate scene and the second category candidate scene to obtain an optimal conversation scene corresponding to the user conversation information comprises:
and taking the user dialogue information as an action and the first-class candidate scenes and the second-class candidate scenes as states, and performing reinforcement learning processing on the user dialogue information so as to screen out the optimal dialogue scene from the first-class candidate scenes and the second-class candidate scenes.
5. A conversation scene recognition terminal, characterized in that the conversation scene recognition terminal comprises:
a memory storing a dialogue scene recognition program;
a processor configured to execute the dialog scene recognition program to perform the following operations:
receiving input user dialogue information;
screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; extracting feature information in the user dialogue information, and screening the preset candidate scene by adopting a scene discrimination model based on the feature information to obtain a second type of candidate scene corresponding to the user dialogue information, wherein the feature information comprises user intention, relevance between the user dialogue information and the previous section of dialogue information and preference of a user about the problems in the preset candidate scene;
performing reinforcement learning processing on the user dialogue information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialogue scene corresponding to the user dialogue information, specifically including determining a value corresponding to the first-class candidate scene and a value corresponding to the second-class candidate scene through value function calculation in the reinforcement learning processing, performing scoring sorting based on the values corresponding to the first-class candidate scene and the second-class candidate scene, and determining a scene with the highest score as the optimal dialogue scene corresponding to the user dialogue information according to the scoring sorting;
and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.
6. The dialog scene recognition terminal of claim 5, wherein the performing of the operation of filtering preset candidate scenes by using a preset scene recognition rule based on the user dialog information to obtain a first type of candidate scenes corresponding to the user dialog information comprises:
extracting key words in the user dialogue information;
based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.
7. The dialog scene recognition terminal of claim 5, wherein the performing of the operation of filtering the preset candidate scenes by using a scene discrimination model based on the feature information to obtain the second type of candidate scenes corresponding to the user dialog information includes:
and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.
8. The dialog scene recognition terminal of claim 5, wherein performing the operation of performing reinforcement learning processing on the user dialog information based on the first category candidate scenes and the second category candidate scenes to obtain an optimal dialog scene corresponding to the user dialog information comprises:
and taking the user dialogue information as an action and the first-class candidate scenes and the second-class candidate scenes as states, and performing reinforcement learning processing on the user dialogue information so as to screen out the optimal dialogue scene from the first-class candidate scenes and the second-class candidate scenes.
9. A computer-readable storage medium, characterized in that a dialog scene recognition program is stored on the computer-readable storage medium, which dialog scene recognition program, when executed by a processor, carries out the steps of the dialog scene recognition method according to one of claims 1 to 4.
CN201710636464.5A 2017-07-28 2017-07-28 Dialog scene recognition method, terminal and computer-readable storage medium Active CN107644641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710636464.5A CN107644641B (en) 2017-07-28 2017-07-28 Dialog scene recognition method, terminal and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710636464.5A CN107644641B (en) 2017-07-28 2017-07-28 Dialog scene recognition method, terminal and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN107644641A CN107644641A (en) 2018-01-30
CN107644641B true CN107644641B (en) 2021-04-13

Family

ID=61110969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710636464.5A Active CN107644641B (en) 2017-07-28 2017-07-28 Dialog scene recognition method, terminal and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN107644641B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086860B (en) * 2018-05-28 2022-03-15 北京光年无限科技有限公司 Interaction method and system based on virtual human
CN108804603B (en) * 2018-05-29 2021-07-23 北京灵智优诺科技有限公司 Man-machine written dialogue method and system, server and medium
CN110858226A (en) * 2018-08-07 2020-03-03 北京京东尚科信息技术有限公司 Conversation management method and device
CN111475206B (en) * 2019-01-04 2023-04-11 优奈柯恩(北京)科技有限公司 Method and apparatus for waking up wearable device
CN111813900B (en) * 2019-04-10 2023-12-08 北京猎户星空科技有限公司 Multi-round dialogue processing method and device, electronic equipment and storage medium
CN110880324A (en) * 2019-10-31 2020-03-13 北京大米科技有限公司 Voice data processing method and device, storage medium and electronic equipment
CN111161739B (en) * 2019-12-28 2023-01-17 科大讯飞股份有限公司 Speech recognition method and related product
CN111290953B (en) * 2020-01-22 2021-09-14 华为技术有限公司 Method and device for analyzing test logs
CN111881254A (en) * 2020-06-10 2020-11-03 百度在线网络技术(北京)有限公司 Method and device for generating dialogs, electronic equipment and storage medium
CN113488036A (en) * 2020-06-10 2021-10-08 海信集团有限公司 Multi-round voice interaction method, terminal and server
CN112487170B (en) * 2020-12-14 2023-12-15 南京三眼精灵信息技术有限公司 Man-machine interaction dialogue robot system facing scene configuration
CN113822058A (en) * 2021-09-18 2021-12-21 上海明略人工智能(集团)有限公司 Dialog information extraction method, system and computer readable storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1417679A (en) * 2001-10-21 2003-05-14 微软公司 Application abstraction aimed at dialogue
CN1781140A (en) * 2003-03-20 2006-05-31 索尼株式会社 Audio conversation device, method, and robot device
CN1881206A (en) * 2005-06-15 2006-12-20 富士通株式会社 Dialog system
CN101551998A (en) * 2009-05-12 2009-10-07 上海锦芯电子科技有限公司 A group of voice interaction devices and method of voice interaction with human
CN102395013A (en) * 2011-11-07 2012-03-28 康佳集团股份有限公司 Voice control method and system for intelligent television
CN102708454A (en) * 2012-05-14 2012-10-03 北京奇虎科技有限公司 Method and device for providing solution of terminal fault
CN103413549A (en) * 2013-07-31 2013-11-27 深圳创维-Rgb电子有限公司 Voice interaction method and system and interaction terminal
CN103974366A (en) * 2014-04-28 2014-08-06 南京邮电大学 Wireless body area network routing method based on reinforcement learning
CN104464733A (en) * 2014-10-28 2015-03-25 百度在线网络技术(北京)有限公司 Multi-scene managing method and device of voice conversation
CN104506906A (en) * 2014-11-12 2015-04-08 科大讯飞股份有限公司 Voice interaction assisting method and system based on television scene elements and voice assistant
CN105575386A (en) * 2015-12-18 2016-05-11 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105975511A (en) * 2016-04-27 2016-09-28 乐视控股(北京)有限公司 Intelligent dialogue method and apparatus
CN106020488A (en) * 2016-06-03 2016-10-12 北京光年无限科技有限公司 Man-machine interaction method and device for conversation system
CN106528522A (en) * 2016-08-26 2017-03-22 南京威卡尔软件有限公司 Scenarized semantic comprehension and dialogue generation method and system
CN106847271A (en) * 2016-12-12 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004233676A (en) * 2003-01-30 2004-08-19 Honda Motor Co Ltd Interaction controller
US20090265022A1 (en) * 2008-04-21 2009-10-22 Microsoft Corporation Playback of multimedia during multi-way communications
CN103456301B (en) * 2012-05-28 2019-02-12 中兴通讯股份有限公司 A kind of scene recognition method and device and mobile terminal based on ambient sound
JP6255274B2 (en) * 2014-02-19 2017-12-27 シャープ株式会社 Information processing apparatus, voice dialogue apparatus, and control program
WO2015130508A2 (en) * 2014-02-28 2015-09-03 Dolby Laboratories Licensing Corporation Perceptually continuous mixing in a teleconference
US9668073B2 (en) * 2015-10-07 2017-05-30 Robert Bosch Gmbh System and method for audio scene understanding of physical object sound sources
CN106356070B (en) * 2016-08-29 2019-10-29 广州市百果园网络科技有限公司 A kind of acoustic signal processing method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1417679A (en) * 2001-10-21 2003-05-14 微软公司 Application abstraction aimed at dialogue
CN1781140A (en) * 2003-03-20 2006-05-31 索尼株式会社 Audio conversation device, method, and robot device
CN1881206A (en) * 2005-06-15 2006-12-20 富士通株式会社 Dialog system
CN101551998A (en) * 2009-05-12 2009-10-07 上海锦芯电子科技有限公司 A group of voice interaction devices and method of voice interaction with human
CN102395013A (en) * 2011-11-07 2012-03-28 康佳集团股份有限公司 Voice control method and system for intelligent television
CN102708454A (en) * 2012-05-14 2012-10-03 北京奇虎科技有限公司 Method and device for providing solution of terminal fault
CN103413549A (en) * 2013-07-31 2013-11-27 深圳创维-Rgb电子有限公司 Voice interaction method and system and interaction terminal
CN103974366A (en) * 2014-04-28 2014-08-06 南京邮电大学 Wireless body area network routing method based on reinforcement learning
CN104464733A (en) * 2014-10-28 2015-03-25 百度在线网络技术(北京)有限公司 Multi-scene managing method and device of voice conversation
CN104506906A (en) * 2014-11-12 2015-04-08 科大讯飞股份有限公司 Voice interaction assisting method and system based on television scene elements and voice assistant
CN105575386A (en) * 2015-12-18 2016-05-11 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105975511A (en) * 2016-04-27 2016-09-28 乐视控股(北京)有限公司 Intelligent dialogue method and apparatus
CN106020488A (en) * 2016-06-03 2016-10-12 北京光年无限科技有限公司 Man-machine interaction method and device for conversation system
CN106528522A (en) * 2016-08-26 2017-03-22 南京威卡尔软件有限公司 Scenarized semantic comprehension and dialogue generation method and system
CN106847271A (en) * 2016-12-12 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system

Also Published As

Publication number Publication date
CN107644641A (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN107644641B (en) Dialog scene recognition method, terminal and computer-readable storage medium
CN107908803B (en) Question-answer interaction response method and device, storage medium and terminal
CN105487663B (en) A kind of intension recognizing method and system towards intelligent robot
CN106096576B (en) A kind of intelligent Service method of robot
JP2018527638A (en) Automatic response method, automatic response device, automatic response device, automatic response program, and computer-readable storage medium
CN113360622B (en) User dialogue information processing method and device and computer equipment
CN104427109B (en) Method for establishing contact item by voices and electronic equipment
CN110110049A (en) Service consultation method, device, system, service robot and storage medium
CN110569344B (en) Method and device for determining standard question corresponding to dialogue text
CN111199149B (en) Sentence intelligent clarification method and system for dialogue system
US20210097288A1 (en) Method and system for generating video
CN111078856A (en) Group chat conversation processing method and device and electronic equipment
US20120226642A1 (en) Method and apparatus for considering multi-user preference based on multi-user-criteria group
CN109787885A (en) A kind of question and answer method of servicing
CN111161726A (en) Intelligent voice interaction method, equipment, medium and system
CN110299143A (en) The devices and methods therefor of voice speaker for identification
CN110245826A (en) A kind of data analysing method and device
CN111540355A (en) Personalized setting method and device based on voice assistant
WO2023273776A1 (en) Speech data processing method and apparatus, and storage medium and electronic apparatus
CN106356056B (en) Audio recognition method and device
CN106503189A (en) search system optimization method and device based on artificial intelligence
CN113205129B (en) Cheating group identification method and device, electronic equipment and storage medium
CN112182189A (en) Conversation processing method and device, electronic equipment and storage medium
CN115083412B (en) Voice interaction method and related device, electronic equipment and storage medium
CN116052646A (en) Speech recognition method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant