CN111159371A

CN111159371A - Dialogue strategy method for task-oriented dialogue system

Info

Publication number: CN111159371A
Application number: CN201911331882.9A
Authority: CN
Inventors: 赵阳洋; 王振宇; 王佩
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-12-21
Filing date: 2019-12-21
Publication date: 2020-05-15
Anticipated expiration: 2039-12-21
Also published as: CN111159371B; WO2021121436A1

Abstract

The invention discloses a conversation strategy method facing a task type conversation system, which is applied to a music intelligent search scene based on a knowledge graph and comprises the following steps: s1, constructing a Markov decision model aiming at a specific field; s2, calculating a state value function matrix by using a Bellman equation; s3, matching the knowledge graph and searching the knowledge base by combining the conversation state at the current moment to obtain a music result meeting the user target; s4, carrying out attribute information entropy calculation on the search result; s5, analyzing the calculated attribute information entropy; and S6, calculating the next round of action through the state transition matrix. The invention overcomes the difficulty of complete cold start in a task-based dialogue system, calculates the state value function matrix by constructing the reinforcement learning model, obtains the next round of action by combining the result of the state value function matrix and the attribute information entropy of the state, completes the knowledge search task with fewer dialogue rounds, and has good usability.

Description

Dialogue strategy method for task-oriented dialogue system

Technical Field

The invention relates to the field of intellectual search based on a knowledge graph of a task-based dialog system, in particular to a dialog strategy method facing the task-based dialog system.

Background

With the rapid development of the related artificial intelligence technology, the interaction mode between people and intelligent equipment tends to be intelligent, and gradually changes from traditional Graphical User Interface (GUI) to human-computer interactive User Interface (Conversational User Interface), that is, an artificial intelligence assistant is used to help users to complete multiple tasks or multiple services. The man-machine dialog system can be divided into two main categories of non-task-oriented dialog systems and task-oriented dialog systems in terms of functions. Task-based dialog systems, also known as target-driven (goaldriven) dialog systems, such as customer service robots, airline ticket booking systems, etc., provide users with domain-specific services intended to assist users in completing tasks such as shopping and booking airline tickets. The man-machine conversation system can greatly reduce the labor cost, simplify the man-machine interaction process and improve the intelligent degree of application, thereby having wide research and application values.

In a task-based dialog system, a user makes multiple rounds of dialog with the system to complete a particular task. In the field of intellectual search based on knowledge maps of multi-turn conversations, the system needs to help a user to quickly search knowledge items meeting constraint conditions through the turns as few as possible. In this process, the guidance of the system plays a decisive role in the path followed by the dialog. Good dialogue strategies can directly and simply guide users to express target attributes, thereby determining constraints of knowledge-graph matching and knowledge-base searching. Therefore, the intelligence of the strategy of the dialogue system directly relates to the searching efficiency of the system. The industrial application of task-based dialog systems often faces the problem of lacking domain-specific training data sets, and therefore supervised training cannot be performed on the training data sets. Currently, most dialog systems solve the problem of a completely cold start of the system by manually formulating dialog rules. Mainstream manual dialogue strategy establishment can quickly establish a dialogue strategy mechanism, but the establishment process consumes a large amount of manpower and lacks the capacity of generalization and domain migration. Therefore, how to construct a dialogue robot suitable for a complete cold start scene in such a scene, and having intelligence degree and domain migration capability is the background of the present invention.

The currently mainstream models for implementing the dialog strategy can be mainly classified into the following models: a dialog strategy based on the finite state automata is strong (Zhu Xiao Yan. method research [ J ] based on the slot characteristic finite state automata in the dialog management, 2004,27(8): 1092-; slot or fill-in methods (free-switches, Tianhuafeng, Dubo, et al. study and implementation of framework-based dialog management models [ J ] computer engineering, 2005(13): 221-; and probabilistic model-based dialog strategies (POMDP model and solution [ J ] for zhangbo, zai celebration, guobaining. spoken dialog systems. computer research and development, 2002(02): 90-97). The interaction process between the user and the system is defined as a process of alternating states of 'initial state- > action- > update state- > … - > termination state' and trigger actions based on the dialog strategy of the finite state automata, and the method is a typical system-dominant method, the rhythm of the dialog is completely determined by the system, the user needs to supplement information according to the process specified by the system, and the flexibility and the expandability are lacked. The slot-filling based dialog strategy improves to some extent the finite state automata based approach, which models the dialog as a slot-filling process. The method provides a relatively flexible input mode for the user, supports a system with mixed dominance of the user and the system, and is suitable for relatively complex information acquisition scenes. However, due to the limitation of slot positions, when the number of slots is too large, the complexity of the algorithm also increases sharply, and thus the method is not suitable for more complex scenes. For complex scenes with a large number of grooves, the method based on the probability model has a good expansion mode. In the face of excessive states or action spaces, when the traditional reinforcement learning is difficult to efficiently explore, the convergence rate of the model can be greatly improved by deep reinforcement learning.

On the basis of the three conversation strategy methods, the invention provides a multi-round conversation strategy method integrating reinforcement learning and information entropy aiming at two problems in a knowledge graph-based search type conversation system, and the two problems are solved as follows:

(1) in a task-based multi-turn dialogue system, due to the specificity of a domain, large-scale dialogue data for a specific domain is usually lacked, and therefore training of a supervision model cannot be performed. Before the system collects dialogue data in a real application environment online, the system has an important problem of how to construct a dialogue strategy model for cold start.

(2) For knowledge search type dialogue system based on knowledge graph, the system needs to generate knowledge base query sentence through user target, and combine external knowledge base and knowledge graph to help user query required information, and give response reply of the system. The conversation strategy task needs to consider the current conversation state and also needs to make a conversation strategy by combining the inquiry result of the knowledge base and the matching result of the knowledge graph. How to construct a conversation strategy model considering knowledge base search results based on a knowledge graph is a big problem faced by a conversation strategy task.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a conversation strategy method facing a task type conversation system.

The invention is realized by at least one of the following technical schemes.

A dialogue strategy method for a task-oriented dialogue system comprises the following steps:

s1, constructing a Markov decision model aiming at a knowledge graph-based search type dialogue system in the vertical field;

s2, acquiring a state value function matrix by using a Bellman equation according to the step S1;

s3, matching the knowledge graph and searching the knowledge base by combining the conversation state at the current moment to obtain a result meeting the user target;

s4, carrying out attribute information entropy calculation on the search result;

s5, analyzing the calculated attribute information entropy;

and S6, obtaining the action of the next round of system through the state transition matrix.

The multi-round conversation strategy optimization method is applied to a conversation strategy module in a multi-round conversation music search system, a complete multi-round conversation music search system is realized, and the system is packaged with a WeChat public number for system demonstration.

Further, the step S1 specifically includes the following steps:

s11, defining quintuple (S, a, p, r, gamma) in the field according to the number of slots in the conversation, wherein S is all states containing termination states, a is all actions, p is state transition probability, r is a reward function, and gamma is a discount factor in an interval of 0-1;

and S12, self-defining the example, searching in the database, and defining the termination state of the conversation according to the search result of the database.

Further, the step S2 is specifically the expectation that the state value function of the state S is returned in the markov decision process, i.e. v (S) ═ E [ G [ ]_t|S_t＝s]Wherein G is_tFor a return at time t in state S, S_tBellman equation based on state-value functions for actions taken at time t

Iteratively calculating a state value function matrix v(s), wherein pi (a | s) represents the probability distribution of the behavior of the strategy in a given state,

Representing the immediate reward from performing action a when the status is s, gamma being a discount factor,

Represents the probability that the state at the next moment is changed to s' when the state at the current moment is s, v_π(s ') is a state value function of the next state s', a representing the set of all actions a.

Further, the step S3 specifically includes the following steps:

s31, receiving triples output by a natural language understanding module in the dialog system, namely field identification, intention identification and slot-value pair, and obtaining a single sentence understanding analysis result;

s32, carrying out conversation state tracking by combining the historical slot value state, updating the current conversation state, and converting the current conversation state into a state St;

s33, taking out the constraint of the current user target from the dialog state tracker, namely, the slot-value pair list, converting the constraint into a knowledge base query statement, and carrying out knowledge map matching and knowledge search.

Further, the step S4 specifically includes the following steps:

s41, judging the number of the search results, if the number is larger than N, calculating the attribute information entropy of the results, and if the number is not larger than N, directly informing the system to give a search result list;

s42, according to formula h (attr) ═ Σ_x∈χAnd p (x) logp (x), calculating the information entropy of the attribute attr, wherein χ represents the attribute attr, attr represents a possible value set, and p (x) represents the probability that the attribute attr takes the value x.

Further, the step S5 specifically includes the following steps:

s51, judging the number of attributes with the information entropy larger than 0, if the number of attributes is not larger than 1, indicating that the distinguishable attributes are 1, so that the next conversation should inquire the target constraint of the slot from the user;

s52, if the attribute number of the information entropy larger than 0 is larger than 1, searching the column vector P corresponding to the current state S in the state transition matrix P_sThe transition probability vector P of the state s_sConversion to 01 vector T_sTransition probability>The value of node 0 takes 1 and T is used_sFiltering the state value function matrix v to obtain a next vector s' which is possible to be transferred and a corresponding state value;

s53, the next state S' maximizes the prize value for the next state, i.e., v_*V (s') max, v_*Representing the maximum state value function, comparing s with s ', and finding out the slot position with s being 0 and s' being 1; if the values on a plurality of slot positions are different, the slot positions with the information entropy of 0 are filtered out by full permutation and combination to obtain a new s', and then the comparison of the sizes of the state values is carried out, and the slot positions with the information entropy of 0 are used as the slot positions queried by the system action.

Further, the step S6 is specifically to splice the slots into the action required to be queried by the next round system.

Compared with the prior art, the invention has at least the following beneficial effects:

1. the method defines a Markov decision model, and constructs the Markov decision model of a dialogue strategy by defining a dialogue state set S, a system action set A, a state transition probability P, a return function R and a discount factor gamma;

2. the method combines the music search result attribute information entropy and the state value function to search the slot attribute with the highest music search value, thereby determining the inquiry action of the system;

3. the invention overcomes the difficulty of cold start of a multi-turn dialogue system, constructs a knowledge base search statement based on the dialogue state of each turn of dialogue under the condition of no training of a dialogue data set in a specific field, calculates a dynamic dialogue strategy of attribute information entropy in the knowledge base search result and the result matched with a knowledge map, combines reinforcement learning and attribute information entropy, constructs a dialogue strategy submodule in a dialogue management module in the dialogue system, and improves the intelligence degree of the system.

Drawings

FIG. 1 is a flowchart illustrating a dialog strategy method for a task-oriented dialog system according to an embodiment of the present invention;

fig. 2 is a diagram illustrating the process of calculating and selecting the music search result information entropy according to this embodiment.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present invention is further described in detail with reference to the drawings and the embodiments.

Fig. 1 shows a conversation strategy method for a task-oriented conversation system, which takes a music search task as an example and includes the following steps:

s1, constructing a Markov decision model aiming at a knowledge graph-based search type dialogue system (music search, book search and the like) in the vertical field, and defining five tuples (S, a, p, r and gamma) in the field, wherein S is all states (including termination state), a is all actions, p is state transition probability, r is a reward function, and gamma is a discount function (generally defaults to 0.9) in an interval of 0-1;

the step S1 specifically includes the following steps:

s11, defining quintuple (S, a, p, r, gamma) in the field according to the number of slots in the conversation, a conversation state set S, a system action set a, a state transition probability p, a return function r and a discount factor gamma;

(1) set of states s

In the music search task, the dialog state is represented by the value taking condition of 6 slots, the state of each slot is divided into filled state and unfilled state, the dialog state output by the dialog state tracking module (table 2 is enumeration of state representation of the dialog state tracking module) is converted into number representation, and the total number of the dialog state is 2⁶The 64 states are sequentially encoded according to the subscript 01, the six-bit 01 encoding sequentially indicates whether the song, singer, album, lyricwrite, compoer, and label slots are filled, and the number of states and corresponding state numbers are shown in table 2. For example, the current dialog state is<singer (Zhoujilun, song (rice incense)>Then the corresponding state code should be S₁₁₀₀₀₀. Then, the state set S ═ S₀₀₀₀₀₀,S₁₀₀₀₀₀,…,S₁₁₁₁₁₁}

TABLE 2 numbered representation of dialog states

(2) Set of actions a

The system action is divided into a query action request () and a provide song list action offer (), and the query action can be divided into six actions of a query song name request (song), a query singer request (singer), a query album request (album), a query speaker request (lyricwriter), a query composer request (composer) and a query song type request (label) according to different slots of the query. Thus, action set a ═ { offer (songs), request (attrs) }, where attrs ═ song, singer, album, lyric, composer, label.

(3) Transition probability p between states

Defining the transition probability P (s, s ') between the states (s, s ') as 1/N, wherein N is the possible value number of the next state s ', and the current state s is a non-termination state. A user may give information on more than one slot in a single round of dialog, so transition probabilities between dialog states are defined according to tables 3 and 4:

TABLE 3 dialog state transition probability example table

Table 4 dialog state transition probability example table

(4) Real-time report r

Defining when the dialog state reaches 49 set termination states, which means that the user completes the current task, the reward value after the transition is set as 100, and the reward value for the transition of each other pair of dialog states is-1, as shown in table 5 and table 6, the termination state is shown in bold:

TABLE 5 example State transition reward matrix

TABLE 6 example State transition reward matrix

(5) Discount factor gamma

The discount factor represents the importance of the future profit to the current state, γ ∈ [0,1], and the present embodiment sets the discount factor γ to 0.8.

S12, defining an example (as shown in table 1), searching the database, and defining the termination state of the dialog according to the database search result.

Table 1 example of finding termination state

The termination state represents the end of the session, and if the termination state is reached, it represents that the system should give the song list offer () to end the session. From the experience and common knowledge, the following rules are established to define the termination state of the session:

1. when the user gives song name song information and any other attribute information of the song, the state is a termination state, and the number of the states is 5;

2. when the user gives album name album of the song and lyricist lyricwriter or composer, the state is termination state, 2 kinds in total;

3. if any three or more attributes are known in the six attributes, the state is a termination state, and 20+15+6+1 is 42 types. Therefore, 42 termination states are defined, as shown in table 7 and table 8:

TABLE 7 description of the termination status of a conversation

TABLE 8 description of the termination status of a conversation

S2, according to the step S1, the state value function matrix is obtained by the aid of the Bellman equation, and the method specifically comprises the following steps:

s21, in the markov decision process, the state value function of the state S is the expectation of its return, i.e., v (S) ═ E [ G [ ]_t|S_t＝s]Wherein G is_tFor a return at time t in state S, S_tBellman equation based on state-value functions for actions taken at time t

Iteratively calculating a state value function matrix v(s), wherein pi (a | s) tableThe probability distribution showing the behavior of the strategy in a given state,

S3, matching the knowledge graph and the search knowledge base according to the conversation state at the current moment to obtain a result meeting the user target, which comprises the following steps:

s31, receiving triples output by a natural language understanding module of the dialog system, namely field identification, intention identification and slot-value pair, and obtaining a single sentence understanding analysis result;

s33, taking out the constraint of the current user target from the dialog state tracker, namely, the slot-value pair list, converting the constraint into a knowledge base query statement, and carrying out knowledge map matching and knowledge search. The conversion process is to generate corresponding constraint conditions for query according to the value of each slot.

S4, as shown in fig. 2, performing attribute information entropy calculation on the search result, specifically including the following steps:

s41, judging the number of the search results, if the number is larger than 10, calculating the attribute information entropy of the results, and if the number is not larger than 10, directly informing the system to provide a search result list;

s42, according to formula h (attr) ═ Σ_x∈χAnd p (x) logp (x), calculating the information entropy of the attribute attr, wherein χ represents the attribute attr (wherein attr is a name), attr represents a possible value set, and p (x) represents the probability that the attribute attr takes the value of x.

S5, analyzing the calculated attribute information entropy, specifically comprising the following steps:

s51, judging the number of attributes with the information entropy larger than 0, if the number of attributes is not larger than 1, indicating that the distinguishable attributes are 1, so that the next dialog of the system should inquire the target constraint of the slot to the user;

s52, if the attribute number of the information entropy larger than 0 is larger than 1, searching the column vector P corresponding to the current state S in the state transition matrix P_sThe transition probability vector P of the state s_sConversion to 01 vector T_sTransition probability>The value of node 0 takes 1 and T is used_s(01 vector, i.e. transition probability of state s if inside the state transition matrix>The value of the node 0 is 1, a 01 matrix which is consistent with the dimension of the state transition matrix is constructed and defined as a 01 vector T_s) Filtering the state value function matrix v to obtain a next vector s' which is possible to be transferred and a corresponding state value; the filtering mode is that all nodes with the state transition probability of 0 are set to be 0 by constructing 01 vector filtering;

s53, the next state S' maximizes the prize value for the next state, i.e., v_*V (s') max (v)_*Representing the maximum state value function), comparing s with s ', and finding out the slot position with s being 0 and s' being 1; if the values on a plurality of slot positions are different, the slot positions with the information entropy of 0 are filtered out by full permutation and combination to obtain a new s', and then the comparison of the sizes of the state values is carried out, and the slot positions with the large information entropy are used as the slot positions queried by the system action.

And S6, obtaining the action of the next round of system through the state transition matrix, specifically splicing the slots into the action required to be inquired by the next round of system.

The method constructs an effective multi-turn dialogue management model and has good usability.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A conversation strategy method for a task-oriented conversation system is characterized by comprising the following steps:

s5, analyzing the calculated attribute information entropy;

2. The dialog strategy method for a task-oriented dialog system according to claim 1, wherein the step S1 specifically comprises the steps of:

3. A dialog strategy method for a task-oriented dialog system according to claim 1, characterized in that said step S2 is specifically a method for rewarding a state value function of a state S in a markov decision process, i.e. v (S) -E [ G ],_t|S_t＝s]wherein G is_tFor a return at time t in state S, S_tBellman equation based on state-value functions for actions taken at time t

4. The dialog strategy method for a task-oriented dialog system according to claim 1, wherein the step S3 specifically comprises the steps of:

s32, combining the historical slot value state, tracking the conversation state, updating the current conversation state, and converting into the state S_t；

5. The dialog strategy method for a task-oriented dialog system according to claim 1, wherein the step S4 specifically comprises the steps of:

6. The dialog strategy method for a task-oriented dialog system according to claim 1, wherein the step S5 specifically comprises the steps of:

7. The dialog strategy method for task-oriented dialog systems of claim 1, wherein the step S6 is specifically to splice the slots into the actions required to be queried by the next round of system.