CN111639168B

CN111639168B - Multi-round dialogue processing method and device, electronic equipment and storage medium

Info

Publication number: CN111639168B
Application number: CN202010437955.9A
Authority: CN
Inventors: 赵筱军; 罗雪峰; 白常福; 范良煌; 何谐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2023-06-09
Anticipated expiration: 2040-05-21
Also published as: CN111639168A

Abstract

The application discloses a method and a device for processing multiple rounds of conversations, electronic equipment and a storage medium, and relates to the fields of natural language processing and cloud computing. One embodiment of the method comprises the following steps: acquiring current dialogue data of a user; identifying a current intention, a current entity and a current session scene in the current dialogue data; when the current intention and/or the current entity determine that the scene switching condition is met, taking the current session scene as a target session scene; multiple rounds of conversations are handled in the target conversation scenario. The method and the device can improve the universality of the multi-round dialogue system, so that the intelligence and fluency of the multi-round dialogue are improved.

Description

Multi-round dialogue processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to natural language processing and cloud computing technologies.

Background

Multi-round dialog systems are increasingly being used in many fields as a very popular and promising application. The multi-turn dialogue system adopts relevant technologies such as voice recognition, natural language understanding and the like, and provides various services for users in a voice interaction mode, such as consultation, guidance, daily chat and the like.

In a multi-round dialog application scenario, the user is still dominant. In the continuous dialogue, the chat topics of the user context have acceptance and supplement, and the jumping among topics has the characteristic of randomness.

However, in the current multi-round dialogue system, a simple question-answer library matching mechanism is generally used, most of the multi-round dialogue systems use single-round dialogue and task-type dialogue realized based on a card slot, only simple dialogue and service can be provided, the multi-round dialogue system is not suitable for dialogue of scene switching application scenes, and phenomena such as topic scene transfer, question answering, even incapacity of answering and the like often occur in the process of multi-round interaction with users.

Disclosure of Invention

The embodiment of the application provides a processing method, a device, electronic equipment and a storage medium for multi-round conversations, so that the universality of a multi-round conversations system is improved, and the intelligence and fluency of the multi-round conversations are improved.

In a first aspect, an embodiment of the present application provides a method for processing a multi-round dialogue, including:

acquiring current dialogue data of a user;

identifying a current intention, a current entity and a current session scene in the current dialogue data;

when the current intention and/or the current entity determine that the scene switching condition is met, taking the current session scene as a target session scene;

Multiple rounds of conversations are handled in the target conversation scenario.

In a second aspect, an embodiment of the present application provides a processing apparatus for a multi-round dialogue, including:

the current dialogue data acquisition module is used for acquiring current dialogue data of a user;

the information identification module is used for identifying the current intention, the current entity and the current session scene in the current dialogue data;

the first target session scene determining module is used for taking the current session scene as a target session scene when the scene switching condition is met according to the current intention and/or the current entity;

and the multi-round dialogue processing module is used for processing multi-round dialogue in the target dialogue scene.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing multiple rounds of conversations provided by the embodiments of the first aspect.

In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions, where the computer instructions are configured to cause the computer to perform the method for processing a multi-round dialog provided in the embodiments of the first aspect.

According to the method and the device for processing the multi-round dialogue, the current intention, the current entity and the current dialogue scene in the current dialogue data of the user are identified, so that when the scene switching condition is met according to the current intention and/or the current entity, the current dialogue scene is used as the target dialogue scene and the multi-round dialogue is processed, the problem that the scene switching adaptability of the existing multi-round dialogue system is poor is solved, the universality of the multi-round dialogue system is improved, and the intelligence and fluency of the multi-round dialogue are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flowchart of a method for processing a multi-round dialogue according to an embodiment of the present application;

FIG. 2a is a flowchart of a method for processing a multi-round dialogue according to an embodiment of the present application;

FIG. 2b is a flowchart illustrating a method for processing a multi-round dialogue according to an embodiment of the present application;

FIG. 3 is a block diagram of a multi-round dialogue processing device according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an electronic device for implementing a method for processing a multi-round dialogue according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In an example, fig. 1 is a flowchart of a method for processing a multi-round dialogue according to an embodiment of the present application, where the embodiment may be applicable to a case of accurately determining a session scene to perform the multi-round dialogue, the method may be performed by a processing device of the multi-round dialogue, and the device may be implemented by software and/or hardware, and may be generally integrated in an electronic device. The electronic device may be a terminal device for human-computer interaction. Accordingly, as shown in fig. 1, the method includes the following operations:

s110, acquiring current dialogue data of the user.

The current dialogue data may be dialogue data currently input by the user.

In the embodiment of the application, the multi-round dialogue system can collect the current dialogue data of the user in real time through a voice collection technology.

S120, identifying the current intention, the current entity and the current session scene in the current dialogue data.

Wherein intent represents a business action to be performed by the user, such as checking weather, checking balance, or transferring accounts, etc. The intents can be distinguished into top-level intents and sub-intents, wherein the top-level intents are intents which can be triggered at any time in the conversation process, and the sub-intents can be triggered only when in the corresponding scene. An entity may be a parameter required to complete a business action, such as time, place, or card number. The entity is the core concept of dialog data, to some extent related to the user's intent. An intention plus several entities may complete the handling of a service. Correspondingly, the current intention is the intention corresponding to the current dialogue data, and the current entity is the entity corresponding to the current dialogue data. The current session scene may be one scene composed of the current intention and all dialog interactions under the current intention.

In the embodiment of the application, after the current dialogue data of the user is acquired, the current intention, the current entity and the current dialogue scene in the current dialogue data need to be identified. Alternatively, a learning model, such as a convolutional neural network or a machine learning model, may be employed to identify the current intent, current entity, and current session context in the current dialog data. Alternatively, the current intention may be identified by a method based on template matching or text classification, or the current entity may be identified by a method based on keyword matching, template matching or statistical model, which is not limited by the specific identification technology type in the embodiment of the present application.

And S130, when the scene switching condition is met according to the current intention and/or the current entity, taking the current session scene as a target session scene.

The scene change condition may be a condition for determining whether or not scene change is performed. Alternatively, in the embodiment of the present application, the scene switching condition may be set according to the current intention and/or the current entity, and the embodiment of the present application does not limit the specific content of the scene switching condition.

Since the session scene is composed of the intention and all the dialogue interactions under the intention, scene switching can be performed by recognizing the current intention in the current dialogue data in a normal case. However, if the current dialogue data can identify both the current intention and the current entity, the dialogue effect of switching the scene according to the current intention is not ideal.

In a specific example, assuming that the current session scenario is a rental car scenario, the multi-turn dialog system may require the user to continue providing date information in accordance with a predetermined configuration after the user provided city information in the previous session scenario. Assuming that the user wants to decide a specific rental date according to weather conditions, the current dialogue data of the user may express the intention of inquiring about weather, for example, "help me check for the acquired weather". The current dialog data may identify both the weather query intent and the temporal entity. It is apparent that it is more appropriate to switch the current session to the scene of the weather query at this time. Assuming that the user determines that the vehicle is required to be rented on the acquired day, and inputs a 'acquired bar' continuously, the user does not know what the acquired bar is, but the user is on the acquired day no matter what the weather is, and the current dialogue data can also identify the weather inquiry intention and the time entity at the same time. Obviously, it is more appropriate to keep the rental car scene of the current session at this time, that is, not to switch the scenes.

Therefore, judging whether to switch to the current session scene is not strong in universality only according to the current intention, and the dialogue effect including the current intention and the current entity in part of scenes can be influenced, so that the intelligence and fluency of the multi-turn dialogue do not meet the dialogue requirement. Therefore, the embodiment of the application combines the current intention and the current entity to judge whether the scene switching condition is met or not, and performs scene switching when the scene switching condition is confirmed to be met, and the current session scene is used as the target session scene.

S140, processing multiple rounds of conversations in the target conversation scene.

In the embodiment of the application, when the scene switching condition is determined to be met according to the current intention and/or the current entity, the confidence coefficient of the current session scene is higher, the current session scene is taken as the target session scene, and the effect of processing multiple rounds of conversations in the target session scene is more ideal.

In an alternative embodiment of the present application, the target session scenario may include: consultation scenes, guidance scenes, daily voice interaction scenes, or learning coaching scenes.

Alternatively, the counseling scenario may be a scenario for counseling information, wherein the type of counseling information may include, but is not limited to, weather counseling or product counseling, etc. The guiding scene can be a scene related to guiding information such as a navigation route, the daily voice interaction scene can be a boring scene, the learning and guiding scene can be a scene such as test question searching, test question answering and online teaching, and the specific scene type of the target session scene is not limited in the embodiment of the application.

In an example, fig. 2a is a flowchart of a method for processing a multi-round dialogue provided by an embodiment of the present application, which is an optimization improvement based on the technical solutions of the foregoing embodiments, and provides a specific implementation manner of determining whether a scene switching condition is met according to a current intention and/or a current entity, processing the multi-round dialogue with a current session scene as a target session scene when the scene switching condition is met, and processing the multi-round dialogue with a previous session scene as the target session scene when the scene switching condition is not met.

A method for processing a multi-round dialogue as shown in fig. 2a, comprising:

s210, acquiring current dialogue data of the user.

S220, identifying the current intention, the current entity and the current session scene in the current dialogue data.

S230, judging whether the current intention and/or the current entity meet the scene switching conditions, if so, executing S240, otherwise, executing S250.

In an optional embodiment of the present application, determining that the scene change condition is satisfied according to the current intention and/or the current entity may include: determining that a scene switching condition is satisfied when the current dialog data includes the current intent and does not include the current entity; and when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to meet the scene switching sub-condition, determining that the scene switching condition is met.

In an optional embodiment of the present application, determining that the scene change condition is not satisfied according to the current intention and/or the current entity may include: determining that a scene switching condition is not satisfied when the current dialog data includes the current entity and does not include the current intent; when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to not meet the scene switching sub-condition, determining that the scene switching condition is not met; and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

The scene switching sub-condition may be a condition for determining whether to perform scene switching when the current dialogue data includes the current intention and the current entity at the same time.

Specifically, when the current dialogue data includes only the current intention, it is explained that the dialogue data of the user does need to switch the dialogue scene, and it is determined that the scene switching condition is satisfied. When the current dialogue data simultaneously comprises the current intention and the current entity, if the confidence degrees of the current intention and the current entity meet the scene switching sub-conditions, determining that the scene switching conditions are met, otherwise, determining that the scene switching conditions are not met. When the current dialogue data includes only the current entity or neither current intention nor current entity, i.e., the current intention is not recognized, it is determined that the scene switching condition is not satisfied.

In an optional embodiment of the present application, determining that the current intent and the confidence level of the current entity satisfy a scene-switching sub-condition may include: determining the intention confidence of the current intention and the entity confidence of the current entity; and when the intention confidence coefficient is determined to be larger than the entity confidence coefficient, determining that the confidence coefficient of the current intention and the current entity meets a scene switching sub-condition.

The intention confidence may be a confidence calculated on the current intention, and the entity confidence may be a confidence calculated on the current entity.

Accordingly, when the current dialog data includes both the current intention and the current entity, it is necessary to determine the intention confidence of the current intention and the entity confidence of the current entity, respectively. If the intention confidence is greater than the entity confidence, determining that the current intention and the confidence of the current entity meet the scene switching sub-condition, namely determining that the scene switching condition is met; otherwise, determining that the confidence of the current intention and the current entity does not meet the scene-switching sub-condition, i.e., determining that the scene-switching condition is not met.

In the above scheme, as long as the current dialogue data does not include the current intention, it is determined that the scene switching condition is not satisfied, and under the condition that the current intention and the current entity are included simultaneously, whether scene switching is needed or not is judged according to the intention confidence of the current intention and the entity confidence of the current entity, so that the problem that scene switching is still performed when the entity confidence is smaller than the intention confidence, and the dialogue effect is not ideal can be effectively solved.

In an optional embodiment of the present application, determining the intent confidence of the current intent, and the entity confidence of the current entity may include: determining the intention confidence level of the current intention according to a set classification model; filtering the interference words included in the current dialogue data to obtain filtered dialogue data; and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to a calculation result.

The set classification model may be, for example, a rule-based classification, a conventional machine learning algorithm, or a deep learning algorithm, and the model type of the set classification model is not limited in the present application. The interfering words may be words of a type such as stop words or mood words, and the filtered dialogue data may be data obtained by filtering the interfering words in the current dialogue data.

Specifically, the intention recognition is to recognize the whole sentence, and can recognize the current dialogue data according to a set classification model and calculate the intention confidence of the current intention. When calculating the entity confidence coefficient, firstly, filtering the interference words included in the current dialogue data to obtain filtered dialogue data, so as to calculate the text similarity between the entity value of the current entity and the filtered dialogue data on the basis of the filtered dialogue data, thereby obtaining the entity confidence coefficient of the current entity.

S240, taking the current session scene as a target session scene.

S250, taking the previous session scene as the target session scene.

Accordingly, when it is determined that the scene switching condition is not satisfied according to the current intention and/or the current entity, the previous session scene may be taken as the target session scene. I.e. no scene cuts are made.

In the above scheme, when the scene switching condition is not met according to the current intention and/or the current entity, the previous session scene is used as the target session scene, so that the problem of unsatisfactory dialogue effect caused by scene recognition, namely switching, can be avoided.

S260, processing multiple rounds of conversations in the target conversation scene.

In an alternative embodiment of the present application, processing multiple rounds of conversations in the target session scenario may include: acquiring dialogue related data stored in a previous dialogue scene; the dialogue associated data comprises preamble entity data and corresponding relations of dialogue nodes and entity collecting operations; updating target entity data of the target session scene according to the preamble entity data; and determining conversation according to the target entity data and the corresponding relation collected by the nodes and the entities.

The preamble entity data may be entity data collected in a previous session scenario. The dialogue node may be a round of interaction between a user and a machine device in a dialogue process, and dialogue data of the user may trigger a certain dialogue node to process a request and obtain an answer, and a specific processing process may include entity collection, context inheritance, context transition, service system docking, answer generation and the like. The corresponding relation between the dialogue node and the entity collecting operation can be used for switching the entity or correcting the collected entity data. The target entity data may be entity data collected in a target session scenario.

In the embodiment of the present application, if the target session scene is the current session scene of the handover, when multiple rounds of conversations are processed in the target session scene, the preamble entity data stored in the previous session scene and the corresponding relationship between the conversational node and the entity collecting operation may be obtained. And then updating the target entity data of the target session scene according to the preamble entity data stored in the previous session scene, and determining the conversation according to the updated target entity data and the corresponding relation collected by the nodes and the entities. For example, the node migration is performed according to the session node jump configuration, the entity switching or modification is performed on the target entity data according to the corresponding relation between the node and the entity collection, and then the entity collection session of the target session scene is generated according to the finally obtained target entity data, wherein the entity collection session is generated according to the collected entity data.

In a specific example, assuming that the current session scenario is a rental car scenario, the multi-turn dialog system may require the user to continue providing date information in accordance with a predetermined configuration after the user provides city information in the previous session scenario. Assuming that the user wants to decide a specific rental date according to weather conditions, the current dialogue data of the user may express the intention of inquiring about weather, for example, "help me check for the acquired weather". This sentence includes both the weather query intent and the temporal entity, and the confidence of the intent will be greater than the entity, thus switching the session to the context of the weather query. Taking the session experience and the correlation of the two scenes into consideration, the location information of the weather inquiry scene can be enabled to inherit the collected location information in the last car renting scene or guide the clarified location information. After the user finishes the weather inquiry, the session continues to switch back to the renting scene by expressing the intention of renting the car, and the previous process continues to collect the date information of renting the car.

In the technical scheme, the target entity data of the target session scene is updated by utilizing the preamble entity data of the previous session scene, so that the data inheritance of the previous session scene can be realized, the multi-round dialogue is processed based on the scene context, and the multi-round dialogue is more intelligent and smoother.

In an optional embodiment of the present application, updating the target entity data of the target session scene according to the preamble entity data may include: and when the preamble entity data meets the entity data inheritance condition, updating the target entity data of the target session scene according to the preamble entity data.

The entity data inheritance condition may be a condition for judging whether inheritance of the preamble entity data is required.

Accordingly, only when the preamble entity data meets the entity data inheritance condition, the target entity data of the target session scene is updated according to the preamble entity data. If the preamble entity data does not meet the data inheritance condition, the entity data of the current session scene is directly collected according to the current session data, and the collected entity data of the current session scene is used as target entity data. The advantages of this arrangement are: the problem of mismatching of conversational operation caused by inheritance of the preamble entity data which does not meet the inheritance condition of the entity data is avoided.

In an optional embodiment of the present application, determining that the preamble entity data meets an entity data inheritance condition may include: and when the preamble entity data configuration data inheritance label is determined, determining that the preamble entity data meets an entity data inheritance condition.

Wherein, the data inheritance tag can be a manually configured data tag for identifying that the data can be inherited. For example, assuming that a worker previously configures a data inheritance tag for a time entity of a multi-turn dialog system, the multi-turn dialog system may configure the data inheritance tag for the time entity of the preamble when the preamble entity data includes the time entity of the preamble.

Optionally, determining whether the preamble entity data meets the entity data inheritance condition may specifically be: when the preamble entity data configuration data inheritance tag is determined, the preamble entity data is determined to satisfy the entity data inheritance condition.

In the scheme, the data inheritance labels of the entity data are manually configured, so that the problem that conversation is not matched due to inheritance of the entity data which is not inherited due to the influence of automatic configuration errors of the system can be avoided.

In an alternative embodiment of the present application, processing multiple rounds of conversations in the target session scenario may include: determining a scene type of the target session scene; when the scene type of the target session scene is determined to be the target scene type, determining a conversation operation according to the current conversation data; and when the scene type of the target session scene is determined to be empty, determining the conversation according to a preset unmatched conversation list.

The target scene type may be any scene type. The preset unmatched conversation list may be a preset unmatched successful conversation, for example, "sorry, i don't understand your meaning", and specifically may be set according to actual requirements, and the embodiment of the present application does not limit the content of the conversation and the number of conversations in the preset unmatched conversation list.

Accordingly, if the target session scene is the previous session scene, when multiple rounds of conversations are processed in the target session scene, it may be first determined whether the previous session scene belongs to the target scene type. That is, it is determined whether the session is in a certain scene. If the scene type of the target session scene is the target scene type, that is, it is determined that the session is in a certain scene, the conversation may be determined according to the current conversation data. If the scene type of the target session scene is empty, that is, it is determined that the session is not in a certain scene, the conversation may be determined according to a preset unmatched conversation list. The advantages of this arrangement are: when the scene switching is not determined and the previous conversation scene is identified to be empty, conversation is determined through the preset unmatched conversation list, user experience and the intelligence of multiple conversations can be improved, and the method is suitable for the scene when the user and the machine start the first-round conversations.

In an alternative embodiment of the present application, determining a dialogue from the current dialogue data may include: when the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to clarify the entity, determining dialogue according to the matched entity clarification dialogue; when the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to reject entity clarification, determining dialogue according to entity collection dialogue; when the current intention and the current entity are not uniform and the semantics of the user expression are determined to be non-entity inquiry according to the current dialogue data, collecting the current entity data and determining dialogue according to the current entity data; when the current intention and the current entity are empty and the entity clarification is determined, determining conversation according to the matched entity clarification conversation; and when the current intention and the current entity are empty and entity clarification is determined to be refused, determining conversation operation according to entity collection conversation operation.

Wherein an entity asks, i.e. asks, a specific entity, e.g. "what is an SUV (Sport Utility Vehicle, suburban utility vehicle? ". Entity clarification, i.e., clarify the entity to the user, e.g., "do you ask you want to ask what is SUV? ". The list of clear words may include multiple types of clear words. For example, the clarifying method for entity clarification, or clarifying method for clarifying intention, etc., where the clarifying method is a question used when the machine equipment wants to obtain certain information, specifically may be set according to actual requirements, and the embodiment of the present application does not limit the content of the clarifying method and the number of the clarifying methods in the clarifying method list.

Accordingly, if the scene type of the previous session scene belongs to the target scene type, that is, when the session is in a certain scene, the conversation operation needs to be determined according to the current meaning, the current entity, the semantics of the current conversation data expression, whether entity clarification is needed, and the like. Specifically, under the condition that the current intention and the current entity are not uniform, and the semantics expressed by the user are determined to be entity inquiry according to the current dialogue data, if the entity clarification is determined, the dialogue is determined according to the matched entity clarification dialogue; otherwise, the conversation may be determined directly from the entity collection conversation. In the case where the current intent and the current entity are not uniformly null and it is determined that the semantics of the user expression are non-entity queries from the current dialogue data, the current entity data may be collected and the dialogue is determined from the current entity data. For example, node migration is performed according to a dialogue node hop configuration, and other entity collection call or service completion call is generated. For example, business completion may be "good, bye". If the current intention and the current entity are empty, determining conversation according to the matched entity clarification conversation if the entity clarification is determined; otherwise, determining the conversation based on the entity collected conversation.

In the scheme, the conversation operation is determined according to the current meaning, the current entity, the semantics of the current conversation data expression, whether entity clarification is needed or not and the like, so that the conversation requirement of a user can be clarified, the accurate conversation operation is determined, and the intelligence and fluency of multiple conversations can be further improved.

In an alternative embodiment of the present application, determining to clarify the entity may include: fuzzy matching is carried out according to the candidate clear phone list and the current dialogue data, and entity clarification is determined when the matched clear phone exists; determining to reject entity clarification may include: and carrying out fuzzy matching on the current dialogue data according to the candidate clear phone list, and refusing to carry out entity clarification when determining that no matched clear phone exists.

The candidate clear phone list may be a preset list including a plurality of clear phones.

Specifically, fuzzy matching can be performed according to the candidate clear phone list and the current dialogue data, when the matched clear phone exists, entity clarification is determined to be performed, and when the matched clear phone does not exist, entity clarification is refused to be performed.

Fig. 2b is a flow chart of a method for processing a multi-round dialogue according to an embodiment of the present application, in a specific example, as shown in fig. 2b, an intention and an entity of current dialogue data (query) expressed by a user are identified, and according to the identified situation, the following 4 scenarios are classified:

(1) The user expresses the intention, and if the user does not express the entity, the session is switched to a new scene corresponding to the intention, and session related data collected from the previous scene is temporarily stored, wherein the session related data comprises preamble entity data and the corresponding relation between the session node and the entity collecting operation. And updating the entity data of the new scene according to the entity data inheritable by the previous scene, and performing node migration according to the dialogue node jump configuration to generate the entity collection conversation of the new scene.

(2) The user does not express the intention, expresses the entity, and judges whether the semantic meaning expressed by the user is an entity inquiry according to the current session if the session is in a certain scene. If the entity inquiry is carried out, the entity is considered not to be identified, whether entity clarification can be carried out or not is judged, and if the matched clarification operation exists, the entity clarification operation is reverted; otherwise, the replying entity collects the speech. If the semantics of the user expression are not entity inquiry, collecting entity data, performing node migration according to the dialogue node jump configuration, and generating other entity collection call or business completion call. If the session is not in a certain scenario, a preset unmatched session is replied.

(3) The user expresses the intention and also expresses the entity, and the confidence of the intention and the confidence of the entity need to be compared respectively. If the intention confidence is greater than the entity confidence, the session is switched to a new scenario corresponding to the intention, and subsequent flows Cheng Ru are the same as in (1) above. If the intention confidence is equal to or less than the entity confidence, the session scene is not switched, and the following flow Cheng Rushang is the same as the (2).

(4) The user does not express the intention or express the entity, if the session is in a certain scene, judging whether entity clarification can be performed, and if the matched clarification operation exists, replying to the entity clarification operation; otherwise, the replying entity collects the speech. If the session is not in a certain scenario, a preset unmatched session is replied.

Therefore, the technical scheme supports switching and recovery of a plurality of scenes, scene switching is determined according to the identified intention and entity and the confidence degree of the intention and the entity, and the universality of the multi-scene switching is good. Meanwhile, cross-scene data inheritance is supported and configured according to service requirements, so that the intelligence and fluency of the dialogue are improved, and the man-machine dialogue has good experience similar to that of a man-machine dialogue.

According to the technical scheme, the current intention, the current entity and the current session scene in the current session data of the user are identified, so that when the scene switching condition is met according to the current intention and/or the current entity, the current session scene is used as the target session scene and the multi-round session is processed, and when the scene switching condition is not met according to the current intention and/or the current entity, the previous session scene is used as the target session scene and the multi-round session is processed, so that the universality of the multi-round session system is improved, and the intelligence and fluency of the multi-round session are improved.

In an example, fig. 3 is a block diagram of a processing apparatus for multi-round conversations provided in the embodiments of the present application, where the embodiments of the present application may be applicable to a case where a conversation scenario is accurately determined to perform multi-round conversations, where the apparatus is implemented by software and/or hardware, and is specifically configured in an electronic device. The electronic device may be a terminal device for human-computer interaction.

A multi-round dialog processing device 300 as shown in fig. 3, comprising: a current session data acquisition module 310, an information identification module 320, a first target session scene determination module 330, and a multi-round session processing module 340. Wherein,,

a current dialogue data obtaining module 310, configured to obtain current dialogue data of a user;

an information identifying module 320, configured to identify a current intention, a current entity, and a current session scene in the current dialogue data;

a first target session scene determining module 330, configured to take the current session scene as a target session scene when it is determined that a scene switching condition is satisfied according to the current intention and/or the current entity;

a multi-round dialogue processing module 340, configured to process multi-round dialogues in the target session scenario.

Optionally, the first target session scene determining module includes: a scene switching condition determining unit configured to determine that a scene switching condition is satisfied when the current dialogue data includes the current intention and does not include the current entity; and when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to meet the scene switching sub-condition, determining that the scene switching condition is met.

Optionally, the scene change condition determining unit is specifically configured to: determining the intention confidence of the current intention and the entity confidence of the current entity; and when the intention confidence coefficient is determined to be larger than the entity confidence coefficient, determining that the confidence coefficient of the current intention and the current entity meets a scene switching sub-condition.

Optionally, the scene change condition determining unit is specifically configured to: determining the intention confidence level of the current intention according to a set classification model; filtering the interference words included in the current dialogue data to obtain filtered dialogue data; and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to a calculation result.

Optionally, the multi-round dialog processing module 340 includes: a dialogue related data acquisition unit, configured to acquire dialogue related data stored in a previous dialogue scene; the dialogue associated data comprises preamble entity data and corresponding relations of dialogue nodes and entity collecting operations; a target entity data updating unit, configured to update target entity data of the target session scene according to the preamble entity data; and the first conversation determining unit is used for determining conversation according to the target entity data and the corresponding relation collected by the nodes and the entities.

Optionally, the target entity data updating unit is specifically configured to: and when the preamble entity data meets the entity data inheritance condition, updating the target entity data of the target session scene according to the preamble entity data.

Optionally, the target entity data updating unit is specifically configured to: and when the preamble entity data configuration data inheritance label is determined, determining that the preamble entity data meets an entity data inheritance condition.

Optionally, the processing apparatus 300 for multi-round conversations further includes: and the second target session scene determining module is used for taking the previous session scene as the target session scene when the scene switching condition is not met according to the current intention and/or the current entity.

Optionally, the second target session scene determining module is specifically configured to: determining that a scene switching condition is not satisfied when the current dialog data includes the current entity and does not include the current intent; when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to not meet the scene switching sub-condition, determining that the scene switching condition is not met; and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

Optionally, the multi-round dialog processing module 340 includes: a scene type determining unit, configured to determine a scene type of the target session scene; a second dialogue determination unit, configured to determine a dialogue according to the current dialogue data when determining that the scene type of the target session scene is a target scene type; and the third conversation determination unit is used for determining conversation according to a preset unmatched conversation list when determining that the scene type of the target conversation scene is empty.

Optionally, the second conversation determination unit is specifically configured to: when the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to clarify the entity, determining dialogue according to the matched entity clarification dialogue; when the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to reject entity clarification, determining dialogue according to entity collection dialogue; when the current intention and the current entity are not uniform and the semantics of the user expression are determined to be non-entity inquiry according to the current dialogue data, collecting the current entity data and determining dialogue according to the current entity data; when the current intention and the current entity are empty and the entity clarification is determined, determining conversation according to the matched entity clarification conversation; and when the current intention and the current entity are empty and entity clarification is determined to be refused, determining conversation operation according to entity collection conversation operation.

Optionally, the second conversation determination unit is specifically configured to: fuzzy matching is carried out according to the candidate clear phone list and the current dialogue data, and entity clarification is determined when the matched clear phone exists; and carrying out fuzzy matching on the current dialogue data according to the candidate clear phone list, and refusing to carry out entity clarification when determining that no matched clear phone exists.

Optionally, the target session scene includes: consultation scenes, guidance scenes, daily voice interaction scenes, or learning coaching scenes.

The processing device for multi-round conversations can execute the processing method for multi-round conversations provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be referred to the method for processing multiple rounds of conversations provided in any embodiment of the present application.

Since the above-described multi-session processing apparatus is an apparatus capable of executing the multi-session processing method in the embodiment of the present application, based on the multi-session processing method described in the embodiment of the present application, those skilled in the art can understand the specific implementation manner of the multi-session processing apparatus in the embodiment of the present application and various modifications thereof, so how the multi-session processing apparatus implements the multi-session processing method in the embodiment of the present application will not be described in detail herein. The apparatus used by those skilled in the art to implement the method for processing multiple rounds of conversations in the embodiments of the present application is within the scope of the present application.

In one example, the present application also provides an electronic device and a readable storage medium.

Fig. 4 is a schematic structural diagram of an electronic device for implementing a method for processing a multi-round dialogue according to an embodiment of the present application. As shown in fig. 4, a block diagram of an electronic device according to a method for processing a multi-round dialog according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.

Memory 402 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of processing multiple rounds of conversations provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the processing method of the multi-round dialog provided by the present application.

The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the processing method of the multi-round session in the embodiments of the present application (e.g., the current session data acquisition module 310, the information identification module 320, the first target session scene determination module 330, and the multi-round session processing module 340 shown in fig. 3). The processor 401 executes various functional applications of the server and data processing, i.e., a processing method for implementing the multi-round dialog in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 402.

Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device implementing a processing method of a multi-round conversation, and the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected via a network to an electronic device implementing the processing method of the multi-round dialog. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the processing method of the multi-round dialogue may further include: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.

The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device implementing the multi-round dialog processing method, such as input devices for a touch screen, keypad, mouse, trackpad, touch pad, joystick, one or more mouse buttons, trackball, joystick, etc. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of processing a multi-round dialog, comprising:

acquiring current dialogue data of a user;

processing multiple rounds of conversations in the target conversation scene;

wherein processing multiple rounds of conversations in the target conversation scene includes:

acquiring dialogue related data stored in a previous dialogue scene; the dialogue associated data comprises preamble entity data and corresponding relations of dialogue nodes and entity collecting operations;

updating target entity data of the target session scene according to the preamble entity data;

And determining conversation according to the target entity data and the corresponding relation collected by the nodes and the entities.

2. The method of claim 1, wherein determining that a scene cut condition is met according to the current intent and/or current entity comprises:

determining that a scene switching condition is satisfied when the current dialog data includes the current intent and does not include the current entity;

and when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to meet the scene switching sub-condition, determining that the scene switching condition is met.

3. The method of claim 2, wherein determining that the current intent and the confidence of the current entity satisfy a scene-cut sub-condition comprises:

determining the intention confidence of the current intention and the entity confidence of the current entity;

and when the intention confidence coefficient is determined to be larger than the entity confidence coefficient, determining that the confidence coefficient of the current intention and the current entity meets a scene switching sub-condition.

4. The method of claim 3, wherein determining the intent confidence of the current intent, and the entity confidence of the current entity, comprises:

Determining the intention confidence level of the current intention according to a set classification model;

filtering the interference words included in the current dialogue data to obtain filtered dialogue data;

and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to a calculation result.

5. The method of claim 1, wherein updating the target entity data of the target session scene according to the preamble entity data comprises:

and when the preamble entity data meets the entity data inheritance condition, updating the target entity data of the target session scene according to the preamble entity data.

6. The method of claim 5, wherein determining that the preamble entity data satisfies an entity data inheritance condition comprises:

and when the preamble entity data configuration data inheritance label is determined, determining that the preamble entity data meets an entity data inheritance condition.

7. The method of claim 1 or 4, the method further comprising:

and when the scene switching condition is not met according to the current intention and/or the current entity, taking the previous session scene as the target session scene.

8. The method of claim 7, wherein determining that a scene cut condition is not met based on the current intent and/or current entity comprises:

determining that a scene switching condition is not satisfied when the current dialog data includes the current entity and does not include the current intent;

when the current dialogue data simultaneously comprises the current intention and the current entity and the confidence degrees of the current intention and the current entity are determined to not meet the scene switching sub-condition, determining that the scene switching condition is not met;

and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

9. The method of claim 7, wherein processing multiple rounds of conversations in the target conversation scene comprises:

determining a scene type of the target session scene;

when the scene type of the target session scene is determined to be the target scene type, determining a conversation operation according to the current conversation data;

and when the scene type of the target session scene is determined to be empty, determining the conversation according to a preset unmatched conversation list.

10. The method of claim 9, wherein determining a conversation from the current conversation data comprises:

When the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to clarify the entity, determining dialogue according to the matched entity clarification dialogue;

when the current intention and the current entity are not uniform and empty, determining the semantics of the user expression as entity inquiry according to the current dialogue data and determining to reject entity clarification, determining dialogue according to entity collection dialogue;

when the current intention and the current entity are not uniform and the semantics of the user expression are determined to be non-entity inquiry according to the current dialogue data, collecting the current entity data and determining dialogue according to the current entity data;

when the current intention and the current entity are empty and the entity clarification is determined, determining conversation according to the matched entity clarification conversation;

and when the current intention and the current entity are empty and entity clarification is determined to be refused, determining conversation operation according to entity collection conversation operation.

11. The method of claim 10, wherein determining to perform entity clarification comprises:

Fuzzy matching is carried out according to the candidate clear phone list and the current dialogue data, and entity clarification is determined when the matched clear phone exists;

determining to reject entity clarification, comprising:

and carrying out fuzzy matching on the current dialogue data according to the candidate clear phone list, and refusing to carry out entity clarification when determining that no matched clear phone exists.

12. The method of claim 1, wherein the target session scene comprises: consultation scenes, guidance scenes, daily voice interaction scenes, or learning coaching scenes.

13. A multi-round dialog processing device, comprising:

the multi-round dialogue processing module is used for processing multi-round dialogue in the target dialogue scene;

wherein, the multi-round dialogue processing module includes:

A dialogue related data acquisition unit, configured to acquire dialogue related data stored in a previous dialogue scene; the dialogue associated data comprises preamble entity data and corresponding relations of dialogue nodes and entity collecting operations;

a target entity data updating unit, configured to update target entity data of the target session scene according to the preamble entity data;

and the first conversation determining unit is used for determining conversation according to the target entity data and the corresponding relation collected by the nodes and the entities.

14. The apparatus of claim 13, wherein the first target session scene determination module comprises:

a scene switching condition determining unit configured to determine that a scene switching condition is satisfied when the current dialogue data includes the current intention and does not include the current entity;

15. The apparatus of claim 14, wherein the scene-switching-condition determining unit is specifically configured to:

16. The apparatus of claim 15, wherein the scene-switching-condition determining unit is specifically configured to:

17. The apparatus of claim 13, wherein the target entity data updating unit is specifically configured to:

18. The apparatus of claim 17, wherein the target entity data updating unit is specifically configured to:

19. The apparatus according to claim 13 or 16, wherein the apparatus further comprises:

and the second target session scene determining module is used for taking the previous session scene as the target session scene when the scene switching condition is not met according to the current intention and/or the current entity.

20. The apparatus of claim 19, wherein the second target session scene determination module is specifically configured to:

21. The apparatus of claim 19, wherein the multi-round dialog processing module comprises:

a scene type determining unit, configured to determine a scene type of the target session scene;

a second dialogue determination unit, configured to determine a dialogue according to the current dialogue data when determining that the scene type of the target session scene is a target scene type;

And the third conversation determination unit is used for determining conversation according to a preset unmatched conversation list when determining that the scene type of the target conversation scene is empty.

22. The apparatus of claim 21, wherein the second conversation determination unit is specifically configured to:

23. The apparatus of claim 22, wherein the second conversation determination unit is specifically configured to:

24. The apparatus of claim 13, wherein the target session scene comprises: consultation scenes, guidance scenes, daily voice interaction scenes, or learning coaching scenes.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.