CN115617997A

CN115617997A - Dialog state tracking method, device, equipment and medium

Info

Publication number: CN115617997A
Application number: CN202211286338.9A
Authority: CN
Inventors: 王丹; 陶高峰; 邢凯; 陈力; 孙仕康; 黄超; 侯晓晖; 孙羽; 朱静; 夏丹丹; 罗永璨; 秦树鑫; 刘杨; 周婷婷
Original assignee: Network Communication and Security Zijinshan Laboratory
Current assignee: Network Communication and Security Zijinshan Laboratory
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-01-17

Abstract

The application discloses a method, a device, equipment and a medium for tracking a conversation state, which relate to the field of computers and comprise the following steps: obtaining a first coding result by utilizing a first BiGRU neural network to carry out historical dialogue coding, and carrying out feature extraction on the first coding result based on attention and a domain vector to obtain a first extraction result containing at least one domain; coding the first extraction result by using a second BiGRU neural network to obtain a second coding result, and extracting the characteristics of the second coding result based on attention and the slot position word vector to obtain a second extraction result; and inputting the second extraction result, the field-slot word vector and the word list vector into a decoder to obtain a decoded vector, calculating prediction probability distribution of slot positions based on the decoded vector, and filling slots in the field based on the prediction probability distribution.

Description

Dialog state tracking method, device, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for tracking a dialog state.

Background

With the continuous development of artificial intelligence, man-machine conversation techniques are widely applied in various aspects such as navigation, entertainment, communication and the like in various forms such as personal assistants, customer service robots, voice control systems and the like. The dialog systems can be classified into two types, namely task-based and non-task-based, and there are two main methods for task-based dialog systems, namely a pipeline method and an end-to-end method. The pipeline method divides the whole human-computer conversation process into five modules, namely, an Automatic Speech Recognition (ASR), a Natural Language processing (NLU), a Dialogue Management (DM), a Natural Language Generation (NLG) and a Speech synthesis (Text To Speech, TTS). The dialog management module is equivalent to the brain of the dialog system, plays a very important role in the whole dialog system, and determines corresponding actions to be taken by the system, including question hunting, clarification, confirmation and the like, and the main subtasks thereof are Dialog State Tracking (DST) and dialog Policy generation (DP), respectively. The pipeline method can model and optimize each part separately, but is easy to generate transmission and accumulation of errors in a multi-turn dialogue system, so that the overall performance is poor. The end-to-end method reduces the number of modules, simplifies the model structure and is beneficial to global optimization. Since in the task-based dialog system, the user's needs are difficult to express completely and clearly in a single-theory dialog, the user is required to perform multiple rounds of dialog to gradually express his or her needs.

One of the prior art is the early dialogue tracking technology, which mainly uses a method based on artificial rules. Dialog management based on finite state machines, slot filling and information state updating all require a large number of rules to be defined manually. The method based on manual rule making needs to make rules according to task scenes, but cannot ensure that all possible conditions and conversation rules can be exhausted. In general, the rule template-based dialog state tracking method is only suitable for dialog state tracking of simple tasks, is not suitable for complex tasks, and cannot reuse rules after tasks are changed.

In view of the limitation of the rule-based method, most of the studies on models by students adopt a data-driven method, thereby generating a generative method and a discriminant method, i.e., the second prior art. It has the following disadvantages: (1) the historical dialog information features are not fully extracted. In deep learning, a time sequence may be modeled by using a Recurrent Neural Network (RNN) to extract historical dialogue information features. However, RNNs have the problems of gradient disappearance and gradient explosion, and Long Short Term Memory networks (LSTM) and Gated cyclic units (GRU) can solve the problem of sequence length dependence through a gating mechanism, but they have only forward context information and ignore backward context information; (2) In a multi-domain scenario, domain-bin value pair information is not fully shared.

Therefore, how to solve the problem of long dependence of historical conversations in a multi-domain scene on the premise of not ignoring forward and backward context information and realize information sharing between different domain-slot pairs is a problem to be solved urgently in the field.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method, an apparatus, a device, and a medium for tracking a dialog state, which can solve a problem of long dependency on a historical dialog in a multi-domain scenario without ignoring forward and backward context information, and implement information sharing between slot pairs in different domains, and a specific scheme of the method is as follows:

in a first aspect, the present application discloses a dialog state tracking method, including:

encoding the historical dialogue records by using a first BiGRU neural network to obtain a first encoding result, and performing domain-level feature extraction on the first encoding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain;

coding the first feature extraction result by using a second BiGRU neural network to obtain a second coding result, and performing slot position level feature extraction on the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result containing slot positions;

inputting the second feature extraction result, a preset field-slot position word vector and a preset word table dictionary vector into a decoder to obtain a decoded vector, calculating the prediction probability distribution of slot position values to be filled based on the decoded vector, and filling the slot positions in at least one field based on the prediction probability distribution of the slot position values to be filled so as to realize dialogue state tracking; the preset field-slot position word vector is formed by splicing the preset field vector and the preset slot position word vector.

Optionally, the encoding the historical dialog record by using the first BiGRU neural network further includes, before obtaining a first encoding result:

and acquiring a plurality of turns of conversation records, and splicing the plurality of turns of conversation records to obtain the historical conversation record.

Optionally, the inputting the second feature extraction result, the preset field-slot word vector, and the preset word list dictionary vector to a decoder, and after obtaining the decoded backward vector, the method further includes:

mapping the decoded vector to a target probability distribution based on a dialog state tracking policy;

and when the target probability distribution is a first probability distribution, indicating that the user does not refer to the slot position in the at least one field, when the target probability distribution is a second probability distribution, indicating that the user does not refer to the slot position in the at least one field, and when the target probability distribution is a third probability distribution, indicating that the user refers to the slot position in the at least one field.

Optionally, the calculating a prediction probability distribution of slot position values to be filled based on the decoded vector includes:

when the target probability distribution is the third probability distribution, calculating the probability distribution of slot position values in the preset word list dictionary and the probability distribution of slot position values in the historical dialogue record based on the decoded backward quantity;

and calculating the prediction probability distribution of the slot values to be filled according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records.

Optionally, the formula for calculating the probability distribution of slot values in the preset vocabulary dictionary and the probability distribution of slot values in the historical dialog record based on the decoded backward quantity is as follows:

wherein V represents the predetermined vocabulary dictionary vector, H _decode Representing said decoded vector, h representing said first coding result, p ^vocab Representing a probability distribution, p, of slot values in said predetermined vocabulary dictionary ^history Representing a probability distribution of slot-level values in the historical conversation record.

Optionally, the formula for calculating the predicted probability distribution of the slot value to be filled according to the probability distribution of the slot value in the preset vocabulary dictionary and the probability distribution of the slot value in the historical dialogue record is as follows:

p ^value ＝p ^gen ×p ^vocab +(1-p ^gen )×p ^history ；

wherein p is ^gen And representing the weight for generating the slot value to be filled from the preset word list dictionary.

Optionally, the inputting the second feature extraction result, the preset field-slot word vector, and the preset word list dictionary vector into a decoder to obtain a decoded vector includes:

and inputting the second feature extraction result, the preset field-slot position word vector and the preset word table dictionary vector into a decoder constructed by a third BiGRU neural network to obtain a decoded vector.

In a second aspect, the present application discloses a dialog state tracking apparatus, comprising:

the first coding module is used for coding the historical dialogue record by utilizing a first BiGRU neural network to obtain a first coding result;

the domain level feature extraction module is used for performing domain level feature extraction on the first coding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain;

the second coding module is used for coding the first feature extraction result by utilizing a second BiGRU neural network to obtain a second coding result;

the slot position level feature extraction module is used for extracting the features of the slot position level of the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result containing the slot position;

the decoding module is used for inputting the second feature extraction result, a preset field-slot position word vector and a preset word list dictionary vector into a decoder to obtain a decoded vector;

the conversation state tracking module is used for calculating the prediction probability distribution of the slot position value to be filled based on the decoded vector, and then filling the slot position in at least one field based on the prediction probability distribution of the slot position value to be filled so as to realize conversation state tracking; the preset field-slot position word vector is formed by splicing the preset field vector and the preset slot position word vector.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the dialog state tracking method disclosed above.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the dialog state tracking method disclosed above.

Therefore, the present application provides a dialog state tracking method, including: encoding the historical dialogue records by using a first BiGRU neural network to obtain a first encoding result, and performing domain-level feature extraction on the first encoding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain; coding the first feature extraction result by using a second BiGRU neural network to obtain a second coding result, and performing slot position level feature extraction on the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result containing a slot position; inputting the second feature extraction result, a preset field-slot position word vector and a preset word table dictionary vector into a decoder to obtain a decoded vector, calculating the prediction probability distribution of slot position values to be filled based on the decoded vector, and filling the slot positions in at least one field based on the prediction probability distribution of the slot position values to be filled so as to realize dialogue state tracking; the preset field-slot word vector is formed by splicing the preset field vector and the preset slot word vector. In summary, the BiGRU (bidirectional gated cyclic unit) neural network can extract context information by using forward and backward gated cyclic unit (GRU) structures, so that the application solves the problem of long dependence on historical conversation in a multi-field scene on the premise that the forward and backward context information is not ignored; in addition, the traditional dialogue state tracking is to extract the integral features of the field-slot pairs, so that different slot information in the same field is irrelevant, and the same slot information in different fields is irrelevant.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a dialog state tracking method disclosed herein;

FIG. 2 is a flow chart of a specific session state tracking method disclosed herein;

FIG. 3 is a schematic diagram of a dialog state tracking device according to the present disclosure;

fig. 4 is a block diagram of an electronic device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The generative method and the discriminant method have the following disadvantages: (1) the historical dialog information features are not fully extracted. In deep learning, a time sequence can be modeled by using a recurrent neural network, and historical dialogue information characteristics can be extracted. However, RNN has the problems of gradient disappearance and gradient explosion, and long-term memory networks and gated cyclic units can solve the problem of sequence long dependence through a gating mechanism, but they only have forward context information and ignore backward context information; (2) In a multi-domain scenario, domain-slot pair information is not fully shared.

Therefore, the embodiment of the application provides a dialog state tracking scheme, which can solve the problem of long dependence of historical dialog in a multi-domain scene on the premise of not ignoring forward and backward context information, and realize information sharing between different domain-slot pairs.

The embodiment of the application discloses a dialog state tracking method, and as shown in fig. 1, the method includes:

step S11: the method comprises the steps of coding a historical dialogue record by utilizing a first BiGRU neural network to obtain a first coding result, and carrying out domain-level feature extraction on the first coding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain.

It should be noted that, during a multi-turn conversation task, meaningless conversations often occur, and topic shift also occurs, which may cause unclear domain discrimination in a multi-domain scene and mutual influence between contexts, in order to capture historical conversation information in different domains, so that the historical conversation information in different domains better focuses on respective domain information without being interfered by information of other domains, the application adopts a domain attention mechanism to solve the problem that the historical conversation information in the traditional technology cannot better focus on respective domain information and is interfered by information of other domains, and because the attention mechanism can ignore distances of elements in an input sequence, key information in the task is directly captured, so the application can better focus on respective domain information without being interfered by information of other domains. The specific process is as follows:

in this embodiment, first, a plurality of turns of session records need to be obtained, and the plurality of turns of session records are spliced to obtain the historical session record. In particular, a plurality of rounds of dialogues D of T rounds of dialogues _T Sequentially spliced into x = { u = ₁ ,s ₁ ,u ₂ ,s ₂ ,...u _T ,s _T In which u _t Representing the user utterance at time t, s _t Representing the system utterance at time t. Further, using a first BiGRU neural network to set x = { u = ₁ ,s ₁ ,u ₂ ,s ₂ ,...u _T ,s _T Encoding to obtain a first encoding result, which is denoted as h = { h = ₁ ,h ₂ ,...h _T And the first encoding result is the hidden layer output of the BiGRU neural network, and it should be noted that the encoding is performed for the purpose of subsequently extracting the features at the domain level. The BiGRU neural network can extract context information by utilizing forward and backward GRU structures, and fully captures semantic information contained in conversation sentences. After encoding the spliced conversation records, encoding the domain words in a preset domain word database to obtain the preset domain vector, specifically, encoding a specific domain word in the preset domain word databaseField word d _i It can be coded into

It should be noted that the encoding is here for encoding the domain words into domain vectors. Furthermore, attention calculation is carried out on the coded dialogue records and the coded domain vectors, and therefore domain feature extraction is completed. The process of calculating the encoded dialog record and the encoded domain vector can be expressed in the following form by a formula:

wherein, y ⁱ Representing the correlation between the ith domain in the preset domain word vector and the historical conversation record; y is ⁱ _softmax Denotes normalized y ⁱ That is, each field i corresponds to the weight of the historical dialogue record; y is ⁱ _context Representing a history vector h with a domain weight; the historical vector is a context vector, namely a first coding result obtained after the historical dialogue record is coded; t denotes transposition.

Due to the fact that

Contextual information containing the entire historical dialog record, and thus, the present application uses

As a specific region d _i Then computing the context using an attention mechanism, by giving weightThe method extracts the implicit expression of the neural network output, in this embodiment, the first feature extraction result is also y ⁱ _context 。

Step S12: and coding the first feature extraction result by using a second BiGRU neural network to obtain a second coding result, and performing slot position level feature extraction on the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result containing a slot position.

In this embodiment, after obtaining the first feature extraction result with the domain key information, the second BiGRU neural network is used to encode the first feature extraction result to obtain a second encoding result, which is denoted as h', and it should be noted that the encoding is performed for subsequently extracting the features at the slot level. After the first feature extraction result is coded, the slot position words in a preset slot position word database are coded to obtain the preset slot position word vectors, specifically, a certain specific slot position word s in the preset slot position word database _j It can be coded into

It should be noted that the encoding here is to encode the slot word into the slot word vector, and further, the attention calculation is performed on the second encoding result and the slot word vector obtained after the encoding, so as to complete the slot word feature extraction. The process of calculating attention of the second encoding result and the encoded slot position word vector can be expressed as the following form by a formula:

wherein z is ^j Representing the correlation between the jth field in the preset slot word vector and the historical dialogue records, z ^j _softmax Represents normalized z ^j That is, each slot position word corresponds to the weight of the historical conversation record; z is a radical of ^j _context Representing a history vector h with slot word weights.

See the processing mode of the field characteristics, because

Contextual information containing the entire historical dialog information, and therefore, the present application uses

As a specific slot s _j Is shown. Then, computing the context by using an attention mechanism, and extracting an implicit expression of the neural network output in a manner of giving weight, wherein in the embodiment, the second feature extraction result is also z ^j _context . Therefore, the information about the slot position in the historical dialogue information can be highlighted.

It should be noted that in session state tracking, the conventional operation is to encode the field-slot pair as a whole, so that different slots in the same field are not related to each other, and the same slots in different fields are also not related to each other. But in the case of multi-domain conversations, all domain-slot pairs are not completely unrelated, e.g., the venue slot information can be present in the taxi-taking and order domains. Moreover, the data scale of the multi-field data set between different fields is often not completely balanced, and the data volume of some fields is less, so that model training is insufficient, that is, the information between the field-slot pairs is not fully shared by traditional operations. For example, assuming that a field a and a field B exist, a slot value corresponding to a certain slot in the field a is A1 and A2, a slot value corresponding to the same slot in the field B is B1 and B2, and a conventional operation is to integrally encode a field-slot pair, that is, a-A1, a-A2, B-B1 and B-B2, so that the encoding result makes no relationship between the same slots in the field a and the field B, but in the case of multi-field conversation, all the field-slot pairs are not completely unrelated, so that the field and the slot are processed separately, and the field and the slot are subjected to attention operation sequentially, that is, the field a and the field B are extracted, and then the attention operation is performed on the slot, and then it is determined that the slot value to be filled may be A1, A2, B1 and B2, and at this time, the slot value corresponding to the field slot may include not only the fields A1 and A2, but also may include the B1 and B2, and thus, and information sharing between the slot pairs is realized.

Step S13: inputting a second feature extraction result, a preset field-slot position word vector and a preset word table dictionary vector into a decoder to obtain a decoded vector, calculating prediction probability distribution of slot position values to be filled based on the decoded vector, and filling the slot positions in at least one field based on the prediction probability distribution of the slot position values to be filled so as to realize dialogue state tracking; the preset field-slot position word vector is formed by splicing the preset field vector and the preset slot position word vector.

Specifically, the second feature extraction result, the preset domain-slot word vector and the preset word list dictionary vector are input to a decoder constructed by a third BiGRU neural network, and a decoded vector is obtained.

In this embodiment, after inputting the second feature extraction result, the preset field-slot word vector, and the preset word list dictionary vector into the decoder and obtaining the backward amount after decoding, the method further includes: mapping the decoded vector to a target probability distribution based on a dialog state tracking policy; when the target probability distribution is a first probability distribution, the slot position in the at least one domain is not referred to by the user, when the target probability distribution is a second probability distribution, the slot position in the at least one domain is not referred to by the user, and when the target probability distribution is a third probability distribution, the slot position in the at least one domain is referred to by the user. The dialog State Tracking policy is specifically a DST (dialog State Tracking) update policy of a current main stream, that is, the classifier maps the decoded vector to probability distributions of NONE, DOTCARE, and MENTIONED, where when the probability distribution of NONE (a first probability distribution) indicates that the user does not refer to the slot in the at least one domain, when the probability distribution of dotare (a second probability distribution) indicates that the user does not refer to the slot in the at least one domain, and when the probability distribution of mantained (a third probability distribution) indicates that the user refers to the slot in the at least one domain.

In the embodiment, in the history dialogue record, the expression of the user may be ambiguous, and therefore the slot value to be filled cannot be directly calculated according to the expression of the user, a preset vocabulary dictionary is introduced, the preset vocabulary dictionary comprises a plurality of slot values, and further, the vocabulary dictionary cannot comprise all the slot values, so that the probability distribution of the slot value to be filled is calculated based on the vocabulary dictionary and the history dialogue record. Specifically, when the mapping is the third probability distribution, the probability distribution of slot values in the preset vocabulary dictionary and the probability distribution of slot values in the historical dialogue record are calculated based on the decoded backward quantity; and calculating the prediction probability distribution of the slot values to be filled according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records. Wherein the formula for calculating the probability distribution of slot position values in the preset vocabulary dictionary and the probability distribution of slot position values in the historical dialogue records based on the decoded backward quantities is as follows:

wherein V represents the predetermined vocabulary dictionary vector, H _decode Representing said decoded vector, h representing said first coding result, p ^vocab Representing a probability distribution, p, of slot values in said predetermined vocabulary dictionary ^history Representing a probability distribution of slot level values in the historical dialog record.

The formula for calculating the prediction probability distribution of the slot values to be filled according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records is as follows:

p ^value ＝p ^gen ×p ^vocab +(1-p ^gen )×p ^history ；

Therefore, the corresponding slot position value is determined according to the probability distribution of the slot position value to be filled, and the slot position in at least one field is filled by the slot position value.

Therefore, the present application provides a dialog state tracking method, including: encoding the historical dialogue records by using a first BiGRU neural network to obtain a first encoding result, and performing domain-level feature extraction on the first encoding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain; coding the first feature extraction result by using a second BiGRU neural network to obtain a second coding result, and performing slot position level feature extraction on the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result containing a slot position; inputting the second feature extraction result, a preset field-slot position word vector and a preset word table dictionary vector into a decoder to obtain a decoded vector, calculating the prediction probability distribution of slot position values to be filled based on the decoded vector, and filling the slot positions in at least one field based on the prediction probability distribution of the slot position values to be filled so as to realize dialogue state tracking; the preset field-slot position word vector is formed by splicing the preset field vector and the preset slot position word vector. In summary, because the BiGRU (bidirectional gated cyclic unit) neural network can extract context information by using forward and backward gated cyclic unit (GRU) structures, the application solves the problem of long dependence on historical conversations in a multi-field scene on the basis of the BiGRU neural network without ignoring forward and backward context information; in addition, the traditional dialogue state tracking is to extract the integral features of the field-slot pairs, so that different slot information in the same field is irrelevant, and the same slot information in different fields is irrelevant.

Fig. 2 is a flowchart of a specific dialog state tracking method disclosed in the present application, and referring to fig. 2, in the present application (1), a first BiGRU neural network is first used to encode a spliced historical dialog to obtain a first encoding result; (2) Performing domain-level feature extraction on the first coding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result; (3) Coding the first feature extraction result by using a second BiGRU neural network structure to obtain a second coding result; (4) Performing feature extraction on a slot position level on the second coding result based on an attention mechanism and a preset slot position word vector to obtain a second feature extraction result; (5) Inputting the second feature extraction result, the preset word list dictionary vector and the field-slot position word vector into a decoder to obtain a decoded vector, wherein the decoder adopts a third BiGRU neural network structure; (6) Calculating the probability of slot position values in a preset word list dictionary and the probability of slot position values in historical dialogue records; and calculating final prediction probability distribution of slot position values based on the probability of the slot position values in the preset vocabulary dictionary, the probability of the slot position values in the historical dialogue records and the decoded backward quantity, determining corresponding slot position values according to the final probability distribution of the slot position values, and then finishing filling.

Therefore, the bidirectional gating unit can extract the context information by utilizing the forward and backward gating cycle unit structures, so that the problem of long dependence of historical conversation in a multi-field scene is solved on the premise that the forward and backward context information is not ignored on the basis of the BiGRU neural network; the method and the device realize information sharing between different field-slot position pairs by separately processing the field and the slot position and performing attention operation on the field and the slot position in sequence.

Correspondingly, the embodiment of the present application further discloses a dialog state tracking device, as shown in fig. 3, the device includes:

the first coding module 11 is configured to code the historical dialogue record by using a first BiGRU neural network to obtain a first coding result;

a domain level feature extraction module 12, configured to perform domain level feature extraction on the first coding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result including at least one domain;

a second encoding module 13, configured to encode the first feature extraction result by using a second BiGRU neural network to obtain a second encoding result;

a slot position level feature extraction module 14, configured to perform slot position level feature extraction on the second coding result based on an attention mechanism and a preset slot position word vector, to obtain a second feature extraction result including a slot position;

the decoding module 15 is configured to input the second feature extraction result, a preset field-slot word vector and a preset word list dictionary vector to a decoder to obtain a decoded vector;

the session state tracking module 16 is configured to calculate a prediction probability distribution of slot position values to be filled based on the decoded vector, and then fill the slot positions in the at least one field based on the prediction probability distribution of the slot position values to be filled, so as to implement session state tracking; the preset field-slot position word vector is formed by splicing the preset field vector and the preset slot position word vector.

For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Further, the embodiment of the application also provides electronic equipment. Fig. 4 is a block diagram of electronic device 20 shown in accordance with an exemplary embodiment, and the contents of the diagram should not be construed as limiting the scope of use of the present application in any way.

Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a display 23, an input output interface 24, a communication interface 25, a power supply 26, and a communication bus 27. Wherein the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the dialog state tracking method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may be specifically an electronic computer.

In this embodiment, the power supply 26 is used for providing an operating voltage for each hardware device on the electronic device 20; the communication interface 25 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 24 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the memory 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resource stored thereon may include the computer program 221, and the storage manner may be a transient storage or a permanent storage. The computer program 221 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the dialog state tracking method executed by the electronic device 20 disclosed in any of the foregoing embodiments.

Further, the embodiment of the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the dialog state tracking method disclosed above.

For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.

The embodiments in the present application are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other, that is, for the apparatus disclosed in the embodiments, since the apparatus corresponds to the method disclosed in the embodiments, the description is simple, and for the relevant parts, the method is referred to the method part.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above is a detailed description of a method, an apparatus, a device, and a storage medium for tracking a dialog state provided by the present application, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A dialog state tracking method, comprising:

inputting the second feature extraction result, a preset field-slot position word vector and a preset word table dictionary vector into a decoder to obtain a decoded vector, calculating the prediction probability distribution of slot position values to be filled based on the decoded vector, and filling the slot positions in at least one field based on the prediction probability distribution of the slot position values to be filled so as to realize dialogue state tracking; the preset field-slot word vector is formed by splicing the preset field vector and the preset slot word vector.

2. The dialog state tracking method of claim 1, wherein before encoding the historical dialog record using the first BiGRU neural network to obtain the first encoding result, the method further comprises:

and acquiring a plurality of turn conversation records, and splicing the turn conversation records to obtain the historical conversation record.

3. The dialog state tracking method according to claim 1, wherein the step of inputting the second feature extraction result, the predetermined field-slot word vector and the predetermined vocabulary dictionary vector to a decoder to obtain a decoded backward vector further comprises:

when the target probability distribution is a first probability distribution, the slot position in the at least one domain is not referred to by the user, when the target probability distribution is a second probability distribution, the slot position in the at least one domain is not referred to by the user, and when the target probability distribution is a third probability distribution, the slot position in the at least one domain is referred to by the user.

4. The dialog state tracking method of claim 3 wherein said computing a predictive probability distribution of slot values to be filled based on the decoded vector comprises:

5. The dialog state tracking method of claim 4,

the formula for calculating the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records based on the decoded backward quantity is as follows:

6. The dialog state tracking method of claim 5,

calculating a formula of the prediction probability distribution of the slot values to be filled according to the probability distribution of the slot values in the preset vocabulary dictionary and the probability distribution of the slot values in the historical dialogue records, wherein the formula comprises the following steps:

p ^value ＝p ^gen ×p ^vocab +(1-p ^gen )×p ^history ；

wherein p is ^gen And representing the weight of the slot value to be filled generated from the preset vocabulary dictionary.

7. The method according to any one of claims 1 to 6, wherein the step of inputting the second feature extraction result, the predetermined field-slot word vector and the predetermined vocabulary dictionary vector into a decoder to obtain a decoded vector comprises:

8. A dialog state tracking device, comprising:

the domain level feature extraction module is used for extracting the features of the domain level of the first coding result based on an attention mechanism and a preset domain vector to obtain a first feature extraction result containing at least one domain;

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the dialog state tracking method of any of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a dialog state tracking method according to any of claims 1 to 7.