CN107644641B

CN107644641B - Dialog scene recognition method, terminal and computer-readable storage medium

Info

Publication number: CN107644641B
Application number: CN201710636464.5A
Authority: CN
Inventors: 卢道和; 郑德荣; 张超; 杨海军; 钟伟
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2017-07-28
Filing date: 2017-07-28
Publication date: 2021-04-13
Anticipated expiration: 2037-07-28
Also published as: CN107644641A

Abstract

The invention discloses a conversation scene recognition method, which comprises the following steps: receiving input user dialogue information; screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information; based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information; and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene. The invention also discloses a dialog scene recognition terminal and a computer readable storage medium. The invention realizes the accurate recognition of the scene in the conversation scene change process.

Description

Dialog scene recognition method, terminal and computer-readable storage medium

Technical Field

The present invention relates to the field of dialog scene recognition technologies, and in particular, to a dialog scene recognition method, a terminal, and a computer-readable storage medium.

Background

The automatic question-answering is a task of automatically answering questions posed by a user by using a computer to meet the knowledge requirements of the user, and is a high-level form of information service. In recent years, with the rapid development of artificial intelligence, automatic question answering has become a research direction which is concerned about and has a wide development prospect, the automatic question answering is considered as one of main tasks for verifying whether a machine has natural language understanding capability, and the research of the automatic question answering is beneficial to promoting the development of related subjects of the artificial intelligence.

However, the current automatic question-answering system is not perfect and still faces many specific problems and difficulties. Most of the existing intelligent robots are single-round conversation systems, data such as conversation context information, user historical conversation information and the like are not considered, a plurality of given answers are disconnected and more abrupt, and user experience is seriously influenced. Aiming at the situation, a multi-scene recognition method is provided, and during the conversation process, the method can actively adapt to the conversation scene of the user according to the current input, historical conversation information and other data of the user, so that the conversation is smoother and more natural.

Disclosure of Invention

The invention mainly aims to provide a conversation scene recognition method, a recognition terminal and a computer readable storage medium, and aims to solve the technical problem that scene change is difficult to accurately recognize in the voice interaction process of an intelligent robot and a human.

In order to achieve the above object, the present invention provides a dialog scene recognition method, including:

receiving input user dialogue information;

screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; screening the preset candidate scenes by adopting a scene discrimination model based on the user dialogue information to obtain second type candidate scenes corresponding to the user dialogue information;

based on the first type of candidate scenes and the second type of candidate scenes, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;

and judging whether the optimal conversation scene is the same as the current conversation scene or not, and if not, taking the optimal conversation scene as the current conversation scene.

Preferably, the step of screening candidate scenes by using a preset scene recognition rule based on the user session information to obtain a first class of candidate scenes corresponding to the user session information includes:

extracting key words in the user dialogue information;

based on the keywords, screening scenes related to the keywords in the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.

Preferably, the step of screening the preset candidate scenes by using a scene discrimination model based on the user session information to obtain the second type of candidate scenes corresponding to the user session information includes:

extracting characteristic information in the user dialogue information;

and calculating the matching degree of the preset candidate scene and the characteristic information through the scene discrimination model based on the characteristic information, and taking the partial scene with higher matching degree as a second type of candidate scene corresponding to the user dialogue information.

Preferably, the step of performing reinforcement learning processing on the user dialog information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialog scene corresponding to the user dialog information includes:

and taking the user dialogue information as an action and the first-class candidate scenes and the second-class candidate scenes as states, and performing reinforcement learning processing on the user dialogue information so as to screen out the optimal dialogue scene from the first-class candidate scenes and the second-class candidate scenes.

In order to achieve the above object, the present invention further provides an identification terminal, including:

a memory storing a dialogue scene recognition program;

a processor configured to execute the dialog scene recognition program to perform the following operations:

receiving input user dialogue information;

Optionally, the performing, based on the user dialog information, an operation of screening preset candidate scenes by using a preset scene recognition rule to obtain a first type of candidate scenes corresponding to the user dialog information includes:

extracting key words in the user dialogue information;

Optionally, the performing, based on the user dialog information, the operation of screening the preset candidate scene by using a scene discrimination model to obtain a second type of candidate scene corresponding to the user dialog information includes:

extracting characteristic information in the user dialogue information;

Preferably, the executing the operation of performing reinforcement learning processing on the user dialog information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialog scene corresponding to the user dialog information includes:

To achieve the above object, the present invention further provides a computer-readable storage medium having a dialog scene recognition program stored thereon, which, when executed by a processor, implements the steps of the dialog scene recognition method according to any one of the above.

In the invention, in the specific scene change process, when a conversation scene changes, firstly, a preset scene recognition rule and a scene discrimination model are adopted to carry out first screening on preset candidate scenes to obtain corresponding candidate scenes, then, an enhanced learning strategy is utilized to carry out second screening on the candidate scenes to further obtain the optimal conversation scene, and finally, a proper conversation result is selected according to the optimal conversation scene to reply to a user, so that the accurate recognition of the current conversation scene change is realized, and the use experience of human-computer interaction of the user is further improved.

Drawings

FIG. 1 is a schematic flow chart illustrating an embodiment of a dialog scene recognition method for an intelligent robot according to the present invention;

FIG. 2 is a detailed flowchart of step S20 in FIG. 1;

FIG. 3 is a detailed flowchart of step S30 in FIG. 1;

fig. 4 is a detailed flowchart of step S40 in fig. 1.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a conversation scene recognition method of an intelligent robot according to the present invention. In this embodiment, the dialog scene recognition method includes the following steps:

step S10, receiving input user dialogue information;

in this embodiment, when a user has a conversation with the intelligent robot, the intelligent robot receives the conversation information input by the user through the voice interaction system. The mode of receiving the user dialogue information by the intelligent robot comprises the steps of converting voice information and utilizing a voice recognition system in the system.

The information conversion of the sound is to convert the sound into electric waves, convert the sound into information which can be identified by the intelligent robot, and the robot performs corresponding feedback by receiving the information.

When the voice recognition system is used, the tone color and the like of the user are recognized, and corresponding feedback is made through the information corresponding to the tone and the tone color.

Step S20, based on the user dialogue information, adopting a preset scene recognition rule to screen preset candidate scenes to obtain a first type of candidate scenes corresponding to the user dialogue information;

in this embodiment, when receiving dialog information input by a user, the preset scene information is screened by using a scene recognition rule to obtain a corresponding candidate scene. The scene recognition rule is specifically a dialog rule set based on practical experience of a person, and the setting of the preset scene recognition rule is not limited in the present embodiment.

For example, scene recognition is performed by acquiring rules in the dialog information. For example, when the user inputs the dialog information "who is you and is old this year", the obtained rules are "identity" and "age", and the scene recognition rule can be set as the basic introduction of the person, so that the corresponding answer can be "you are good, i call small white, and is old this year". For another example, when the user inputs the dialog information "i want to query for an air ticket", the acquired dialog keyword is "query for an air ticket", and the scene recognition rule is set up to be provided in a manner of querying for an air ticket, so that the corresponding answer may be "which day the departure time is".

For another example, scene recognition is performed by correspondence of questions and answers. For example, when the user inputs the dialogue information "who you are more commander with me", the answer directly corresponding to the scene recognition rule at this time may be "certainly me" or "me is more commander". For another example, when the user inputs the dialogue information as "good night", the user may directly answer "good night".

Step S30, based on the user dialogue information, adopting a scene discrimination model to screen the preset candidate scene, and obtaining a second type of candidate scene corresponding to the user dialogue information;

in this embodiment, the scene discrimination model is an algorithm for machine learning, and specifically, many pieces of relevant information are extracted, and in combination with the historical information, whether the historical information is similar to the extracted information is found, and if so, similar scenes are classified into one category. There are various algorithms of the discriminant model, and the type of the algorithm of the discriminant model used in the present embodiment is not limited. In this embodiment, the discriminant model calculation method used may be: and calculating the correlation of the characteristic information and the candidate scene information so as to determine the candidate scene.

For example, when the user session information is "i want to exercise", the user information is acquired according to the information, the user wants to exercise, and the related information of the question about exercise or exercise mentioned by the previous session is acquired from the historical session, and when the related feature information includes the user intention understanding, the historical session information, and the user preference information, the scene is filtered through the discriminant model according to the feature information, and the answer at this time may be "go x exercise club go to the bar" or "go to protect the city river side bar" instead of "i don't want to exercise".

For another example, when the user conversation information is "what is eaten at dinner", there may exist related scenes in the scene that "dinner goes to a western-style restaurant to eat steak", "dinner goes to a Hunan dish", "dinner does an egg by itself to cook rice", or "I do not eat dinner". The four scenes have a similarity that the user eats dinner, but the fourth scene and the first three scenes obviously have a problem that the object eating dinner is different, and the user becomes a robot while the fourth scene becomes a robot. So when the user says "what dinner" it is obvious that the fourth is not compliant with the scenario at that time.

It should be noted that the order of executing steps S20 and S30 is not limited, for example, step S20 is executed first and then step S30 is executed, or step S30 is executed first and then step S20 is executed, or steps S20 and S30 are executed simultaneously.

Step S40, based on the first type candidate scene and the second type candidate scene, performing reinforcement learning processing on the user dialogue information to obtain an optimal dialogue scene corresponding to the user dialogue information;

in this embodiment, reinforcement learning is an autonomous learning process, and an optimal action that can achieve the target is selected through continuous learning. The reinforcement learning may be implemented in various ways, and since there are a value function (state value function), a Q function (action value function), or a policy, etc. in the reinforcement learning process, and the value function, the Q function, or the policy are defined differently according to different tasks, the implementation manner thereof in the present embodiment is not limited.

In this embodiment, the first category candidate scene obtained in step S20 and the second category candidate scene obtained in step S30 are used as the candidate scene set for reinforcement learning. When the user dialogue information action is obtained, the candidate scene is taken as a state set, and the candidate scene is screened through reinforcement learning.

For example, when the user dialog information is 'what is eaten at dinner', a first-class candidate scene and a second-class candidate scene are obtained after the first screening of the scene recognition rule and the scene discrimination model. At this time, the candidate scenes may include: the dinner party has no need of eating steak in a western-style food restaurant, the dinner party has no need of eating Hunan dishes, the dinner party has no need of cooking eggs and rice, the dinner party has no need of cooking dishes, and the like.

When the user asks the problem of 'what the dinner eaten', the intelligent robot clearly knows the target of the user, and through the calculation of a value function, each scene corresponds to one value, for example, the value corresponding to 'eating steak in a western-style restaurant at dinner' is 54, the value corresponding to 'eating Hunan dish at dinner' is 63, the value corresponding to 'the dinner self-cooking bar' is 72, the value corresponding to 'eating the dinner bar after six hours' is 81, and the value corresponding to 'doing egg and cooking at dinner self' is 90, at the moment, different values represent the degree of proximity to the target (full score is 100, namely the target is reached), so the optimal answer can be 'doing egg and cooking at dinner self'.

In this embodiment, through calculation of a value function in reinforcement learning, a scoring ranking is performed on scenes by using a numerical value, and a scene with the highest score is selected as an optimal scene to be output.

Step S50, judging whether the optimal dialog scene is the same as the current dialog scene, if not, taking the optimal dialog scene as the current dialog scene;

in this embodiment, when an optimal dialog scene is screened out through reinforcement learning, the optimal dialog scene is compared with the current dialog scene, if the optimal dialog scene is not changed from the current dialog scene, the current dialog scene is not changed, and if the optimal dialog scene is different from the current dialog scene through comparison, the optimal dialog scene is output as the latest current dialog scene. For example, the scene in which the user has previously interacted with the robot is a topic scene about weather, and when the user has interacted with the robot again, a topic scene about eating is screened out according to the conversation information, and at this time, the conversation scene is changed from the previous topic scene about weather to the topic scene about eating.

In the embodiment, the most suitable dialog scene in the optimal dialog scene and the current dialog scene is output, and corresponding operation is performed according to the most suitable dialog scene.

Example two:

referring to fig. 2, fig. 2 is a detailed flowchart of step S20 in fig. 1. Based on the first embodiment, in this embodiment, the step S20 further includes:

step S201, extracting keywords in the user dialogue information;

step S202, based on the keywords, selecting scenes related to the keywords from the preset candidate scenes, and taking the related scenes as the first-class candidate scenes corresponding to the user dialog information.

In this embodiment, when receiving dialog information input by a user, extracting a keyword from the dialog information, then screening a scene with a higher degree of correlation with the keyword from preset candidate scenes according to the extracted keyword, and using a part of the scene with the higher degree of correlation as a first type of candidate scenes.

For example, the dialog information input by the user is 'i want to eat dishes', when the input information is received, the keywords 'eating' and 'dishes' in the information are extracted, and at the moment, the keywords are matched with the preset candidate scenes to obtain partial scenes with high correlation degree. When the keywords 'eat' and 'food' are obtained, scene information about nearby brassica oleracea may exist in the matched scene, and more scenes about 'eat' and 'brassica oleracea' are obtained as first-class candidate scenes.

For another example, the dialog information input by the user is "i want to go to fitness", and when receiving the dialog information, the keyword "fitness" in the dialog information can be clearly acquired, and the acquired scene information may be: a series of scenes related to the "fitness" such as "what the fitness item is", "where the fitness is located", "how to exercise", and the like, appear in the candidate scenes, for example, the dialog output by the intelligent robot is "you want to run or go to a gym", and the user may select "i want to run" again. At this point, the scene of the conversation is selected.

In this embodiment, the first-class candidate scene information is obtained through the recognition rule, the obtained keyword of the dialog information, and the matched dialog scene is also a dialog scene that mentions the keyword, which is the set dialog scene information and will not change with the change of the user.

Example three:

referring to fig. 3, fig. 3 is a detailed flowchart of step S30 in fig. 1. Based on the first embodiment, in this embodiment, the step S30 further includes:

step S301, extracting characteristic information in the user dialogue information;

step S302, based on the feature information, calculating the matching degree of the preset candidate scene and the feature information through the scene discrimination model, and taking the partial scene with higher matching degree as a second type candidate scene corresponding to the user dialogue information.

In this embodiment, when receiving dialog information input by a user, extracting feature information in the dialog information, and performing calculation on the matching degree between a preset candidate scene and the feature information to obtain a part of scenes with a higher matching degree as a second class of candidate scenes, where the extracted feature information includes a user intention, a relevance between the dialog information and a previous segment of dialog information, a preference of the user about such a problem in the preset candidate scene, and the like.

For example, when the dialog information input by the user is "what to eat at noon", the user information is obtained, and at the same time, according to the information and similar questions asked by the user before, the characteristic information at this time may have frequent consumption behaviors of the user, where the user frequently eats, taste of the user, etc., by comparing the characteristic information to select a suitable eating place, at this time, the user may reply to "go x restaurant to eat a kitchen bar" or "make a meal at home" etc.

For another example, the user session information is "i want to exercise", and the decision model obtains the user information, and obtains the exercise required by the user according to the information, and also obtains the information in the session information alone, and further obtains a set of problems related to exercise or exercise mentioned by the previous session from the historical session according to the historical information of the session, and extracts the selection preference, the exercise mode concerned, the frequent place and the like of the user from the set.

Through the second embodiment and the third embodiment, it can be found that, because the preset scene recognition rule is a set rule which cannot be changed, certain flexibility is lacked in the recognition process, and only one or a plurality of fixed answers can be replied when a problem is faced; the user intention can be accurately identified through the preset discrimination model, and a more appropriate answer is given, but a certain amount of dialogue information data needs to be accumulated by the method. Therefore, the preset candidate scenes are screened for the first time by the two methods, and a proper part of scenes are selected to narrow the screening range for the accurate screening for the second time.

Example four:

referring to fig. 4, fig. 4 is a detailed flowchart of step S40 in fig. 1. Based on the first embodiment, in this embodiment, the step S40 further includes:

step S401, receiving a first type of candidate scene and a second type of candidate scene, and taking the received candidate scenes as a state set of enhancement;

step S402, using the dialogue information as the action of reinforcement learning, screening out the best state in the state set through reinforcement learning, and further determining the best dialogue scene.

In this embodiment, after receiving a first-class candidate scene and a second-class candidate scene, regarding each scene in the candidate scenes as a state, regarding user session information as an action, calculating each state by combining a value function or a Q function in reinforcement learning with a current action, sorting the calculated values, and selecting a scene with a largest value as an optimal state, where the scene corresponding to the optimal state is the optimal session scene.

For example, when the user dialog information is "i want to go to fitness", the scene set of the first type candidate scene and the second type candidate scene obtained by the rule and scene discrimination model may have "go x exercise club to go to a bar", "go to a river side running bar", "go to a fitness coach bar", "go to a fitness card bar", "i do not want to exercise", and the like. The intelligent robot knows the user's goal clearly, and through the calculation of the value function, each scene corresponds to a value, for example, the value corresponding to "i do not do exercise" is 54, the value corresponding to "go to protect city and river side running bar" is 63, the value corresponding to "go x exercise club go to bar" is 72, the value corresponding to "go to do a fitness card bar" is 81, the value corresponding to "go to find a fitness coach bar" is 90, so the optimal answer at this time may be "go to find a fitness coach bar".

The invention also protects a conversation scene recognition terminal.

In an embodiment of the terminal of the present invention, the dialog scene recognition terminal includes:

a memory storing a dialogue scene recognition program; a processor configured to execute the dialog scene recognition program to perform the following operations:

receiving input user dialogue information;

For example, scene recognition is performed by acquiring keywords in the dialog information. For example, when the user inputs the dialog information "who is you and is old this year", the obtained rules are "identity" and "age", and the scene recognition rule can be set as the basic introduction of the person, so that the corresponding answer can be "you are good, i call small white, and is old this year". For another example, when the user inputs the dialog information that "i want to inquire about an air ticket", the acquired dialog keyword is "inquire about an air ticket", and at this time, the scene recognition rule is set up to be provided in a manner of inquiring about an air ticket, so that the corresponding answer may be "which day at the departure time".

Further optionally, in an embodiment of the terminal of the present invention, the operation of the processor performing screening on preset candidate scenes by using a preset scene recognition rule based on the user session information to obtain a first class of candidate scenes corresponding to the user session information includes:

extracting key words in the user dialogue information;

Optionally, in an embodiment of the terminal of the present invention, the operation of the processor performing, based on the user session information, a scene discrimination model to filter the preset candidate scenes to obtain a second type of candidate scenes corresponding to the user session information includes:

extracting characteristic information in the user dialogue information;

Further optionally, in an embodiment of the terminal of the present invention, the executing, by the processor, the reinforcement learning processing on the user dialog information based on the first class of candidate scenes and the second class of candidate scenes to obtain an optimal dialog scene corresponding to the user dialog information includes:

receiving a first class candidate scene and a second class candidate scene, and taking the received candidate scenes as a state set of enhancement;

and taking the dialogue information as an action of reinforcement learning, screening out the optimal state in the state set through the reinforcement learning, and further determining the optimal dialogue scene.

The embodiment of the invention also provides a computer readable storage medium.

The computer-readable storage medium of the present invention stores thereon a dialog scene recognition program that, when executed by a processor, implements the steps of the dialog scene recognition method in the embodiments described above.

In a specific scene change process, when a dialog scene changes, a preset scene recognition rule and a scene discrimination model are firstly adopted to carry out first screening on preset candidate scenes to obtain corresponding candidate scenes, then an enhanced learning strategy is utilized to carry out second screening on the candidate scenes to further obtain an optimal dialog scene, and finally a proper dialog result is selected according to the optimal dialog scene to reply a user, so that accurate recognition of the current dialog scene change is realized, and the use experience of human-computer interaction of the user is further improved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A conversation scene recognition method, characterized by comprising the steps of:

receiving input user dialogue information;

screening preset candidate scenes by adopting a preset scene recognition rule based on the user dialogue information to obtain a first type of candidate scenes corresponding to the user dialogue information; extracting feature information in the user dialogue information, and screening the preset candidate scene by adopting a scene discrimination model based on the feature information to obtain a second type of candidate scene corresponding to the user dialogue information, wherein the feature information comprises user intention, relevance between the user dialogue information and the previous section of dialogue information and preference of a user about the problems in the preset candidate scene;

performing reinforcement learning processing on the user dialogue information based on the first-class candidate scene and the second-class candidate scene to obtain an optimal dialogue scene corresponding to the user dialogue information, specifically including determining a value corresponding to the first-class candidate scene and a value corresponding to the second-class candidate scene through value function calculation in the reinforcement learning processing, performing scoring sorting based on the values corresponding to the first-class candidate scene and the second-class candidate scene, and determining a scene with the highest score as the optimal dialogue scene corresponding to the user dialogue information according to the scoring sorting;

2. The method for recognizing conversation scene according to claim 1, wherein the step of obtaining the first kind of candidate scene corresponding to the user conversation information by screening the candidate scene based on the user conversation information by using a preset scene recognition rule comprises:

extracting key words in the user dialogue information;

3. The method for recognizing conversation scene according to claim 1, wherein the step of obtaining the second type of candidate scene corresponding to the user conversation information by screening the preset candidate scene with a scene discrimination model based on the feature information comprises:

4. The method for recognizing conversation scene according to claim 1, wherein the step of performing reinforcement learning processing on the user conversation information based on the first category candidate scene and the second category candidate scene to obtain an optimal conversation scene corresponding to the user conversation information comprises:

5. A conversation scene recognition terminal, characterized in that the conversation scene recognition terminal comprises:

a memory storing a dialogue scene recognition program;

receiving input user dialogue information;

6. The dialog scene recognition terminal of claim 5, wherein the performing of the operation of filtering preset candidate scenes by using a preset scene recognition rule based on the user dialog information to obtain a first type of candidate scenes corresponding to the user dialog information comprises:

extracting key words in the user dialogue information;

7. The dialog scene recognition terminal of claim 5, wherein the performing of the operation of filtering the preset candidate scenes by using a scene discrimination model based on the feature information to obtain the second type of candidate scenes corresponding to the user dialog information includes:

8. The dialog scene recognition terminal of claim 5, wherein performing the operation of performing reinforcement learning processing on the user dialog information based on the first category candidate scenes and the second category candidate scenes to obtain an optimal dialog scene corresponding to the user dialog information comprises:

9. A computer-readable storage medium, characterized in that a dialog scene recognition program is stored on the computer-readable storage medium, which dialog scene recognition program, when executed by a processor, carries out the steps of the dialog scene recognition method according to one of claims 1 to 4.