CN112445902A

CN112445902A - Method for identifying user intention in multi-turn conversation and related equipment

Info

Publication number: CN112445902A
Application number: CN201910833404.1A
Authority: CN
Inventors: 陈涛; 张毅
Original assignee: Shenzhen TCL Digital Technology Co Ltd
Current assignee: Shenzhen TCL Digital Technology Co Ltd
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2021-03-05
Also published as: WO2021042902A1

Abstract

The invention discloses a method for identifying user intention in multi-turn conversations and related equipment, wherein a first conversation state of the upper information and a second conversation state of the lower information in the front and rear wheel conversations in the multi-turn conversations are obtained; and calculating a first correlation between the first conversation state and the second conversation state, and judging whether to execute the identification of the single-turn conversation identification user intention according to the magnitude of the first correlation. Because the information correlation between the front and rear wheel conversations is fully considered in the embodiment, when the information difference between the front and rear wheel conversations is large, the rear wheel conversation is used as a single message to carry out user intention analysis, so that a more accurate analysis result can be obtained, and a basis is provided for realizing accurate feedback of the information sent by the user.

Description

Method for identifying user intention in multi-turn conversation and related equipment

Technical Field

The invention relates to the technical field of voice interaction, in particular to a method for identifying user intention in multi-turn conversations and related equipment.

Background

In the prior art, natural language parsing technology generally adopts data driving and is based on machine learning. The natural language parsing based dialog technique is divided into a single-turn dialog and a multi-turn dialog.

The multi-turn dialog is a way to acquire necessary information to finally get a clear user instruction after preliminarily making clear the user's intention in the man-machine dialog. Multiple rounds of conversation correspond to the handling of one thing. The multi-turn dialogue system has modules of language understanding, language generation, dialogue management, knowledge base and the like. The dialogue management also comprises a state tracking and action selection submodule, which can be regarded as a multi-turn dialogue system and is based on the extension of a single turn of dialogue of analysis, and in each turn of dialogue, the dialogue is semantically understood and internal representation is generated. The session management uses a finite state machine to represent the entire process of obtaining information in a session. Through several sessions, the system gradually acquires the required information and performs the task.

However, the existing multi-round conversations in the prior art are all based on the previous conversation search, and the subsequent conversation is queried and matched, for example: when the user has a previous round of conversation: the 'piggy cake' and the next round of conversation are 'horror', the 'piggy cake' is searched first, then information related to 'horror' is searched on the search result of the piggy cake, and the last analyzed user intention is inaccurate due to the fact that the two words are not related, the user intention analysis is inaccurate, the system behavior is inaccurate to return, communication between the user and the conversation equipment is not smooth, or the conversation equipment instruction execution is wrong, and inconvenience is brought to the use of the conversation equipment by the user directly.

Therefore, the prior art is subject to further improvement.

Disclosure of Invention

In view of the defects in the prior art, the invention provides a method for identifying user intention in multiple rounds of conversations and related equipment, and overcomes the defect that the accuracy of query and matching of a next round of conversations is low because the incidence relation between a previous conversation and a next conversation is not considered in the multiple rounds of conversations in the prior art and the query and matching of the next round of conversations are always performed on the basis of the search of the previous round of conversations.

In a first aspect, the present embodiment discloses a method for identifying a user's intention in multiple rounds of conversations, wherein the method includes the following steps:

acquiring a first conversation state of the context information and a second conversation state of the context information in a plurality of rounds of conversations;

and if the first correlation between the first dialogue state and the second dialogue state is smaller than a preset first threshold value, identifying the user intention according to the following information to obtain a user intention identification result.

Optionally, before the step of acquiring the first session state of the context information and the second session state of the context information in the multiple rounds of sessions, the method further includes:

judging whether the voice information corresponding to the above information and the below information is the same;

and if so, re-acquiring the session information in the next round after the context information, and replacing the re-acquired session information with the context information.

Optionally, the step of respectively acquiring a first session state of the context information and a second session state of the context information in the multiple rounds of sessions includes:

acquiring the above information and the below information in a plurality of rounds of conversations;

and respectively carrying out voice recognition and language analysis on the above information and the below information to obtain a first dialogue state and a second dialogue state.

Optionally, if the first correlation between the first dialog state and the second dialog state is smaller than a preset first threshold, identifying the user intention according to the context information to obtain a user intention identification result, including:

if the correlation between the first dialogue state and the second dialogue state is larger than or equal to a preset first threshold value, obtaining system feedback information corresponding to the first dialogue state;

and if the correlation between the system feedback information and the second dialogue state is smaller than a preset second threshold, identifying the user intention according to the following information to obtain a user intention identification result.

Optionally, if the first correlation between the first dialog state and the second dialog state is smaller than a preset first threshold, identifying the user intention according to the context information, and before the step of obtaining the user intention identification result, the method further includes:

acquiring first slot position information of the first conversation state;

acquiring second slot position information of the second dialogue state;

calculating the first correlation according to the first slot position information and the second slot position information;

and judging whether the first correlation is smaller than a preset first threshold value or not.

Optionally, the step of calculating the first correlation according to the first slot position information and the second slot position information includes:

acquiring character strings contained in each slot position in the first slot position information, and merging the character strings into first character string information;

acquiring character strings contained in each slot position in the second slot position information, and merging the character strings into second character string information;

calculating the editing distance between each character string in the first character string information and each character string in the second character string information;

and calculating the first correlation according to the editing distance.

Optionally, if the correlation between the system feedback information and the second session state is smaller than a preset second threshold, identifying the user intention according to the following information, and before the step of obtaining a user intention identification result, the method further includes:

acquiring third slot position information of the system feedback information;

acquiring second slot position information of the second dialogue state;

calculating the second relativity according to the third slot position information and the second slot position information;

and judging whether the second correlation is smaller than a preset second threshold value.

Optionally, the step of calculating the second correlation according to the third slot information and the second slot information includes:

acquiring character strings contained in each slot position in the third slot position information, and merging the character strings into third character string information;

calculating the editing distance between each character string in the third character string information and the second character string information;

and calculating the second correlation according to the size of the editing distance.

Optionally, after the step of respectively obtaining the first slot information and the second slot information of the first session state and the second session state, the method further includes:

judging whether the slot positions contained in the first conversation state and/or the second conversation state are completely filled or not;

if the filling is not complete, obtaining the keyword information missing from the slot position which is not completely filled in the first dialogue state and/or the second dialogue state, and completely filling the slot position contained in the first dialogue state and/or the second dialogue state according to the obtained keyword information.

Optionally, after the step of respectively obtaining the system feedback information and the third slot position information and the second slot position information of the second session state, the method further includes:

judging whether slot positions contained in the system feedback information and/or the second dialogue state are completely filled or not;

if the filling is not complete, obtaining keyword information which is missing from the slot position which is not completely filled in the system feedback information and/or the second dialogue state, and completely filling the slot position contained in the system feedback information and/or the second dialogue state according to the obtained keyword information.

Optionally, the step of identifying the user intention according to the following information to obtain a user intention identification result includes:

acquiring character information of information contained in the second dialogue state, and extracting a second keyword set;

determining second user instruction information corresponding to the second keyword set;

and obtaining a user intention identification result according to the second user instruction information.

Optionally, the identification method further includes:

and if the first correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold value, combining the first dialogue state, the system feedback information and the following information to identify the user intention to obtain a user intention identification result.

Optionally, the identification method further includes:

and if the correlation between the system feedback information and the second dialogue state is greater than or equal to a preset second threshold value, combining the first dialogue state, the system feedback information and the following information to identify the user intention to obtain a user intention identification result.

Optionally, the step of identifying the user intention by combining the first dialogue state, the system feedback information, and the following information to obtain a user intention identification result includes:

acquiring character information of information contained in the first dialogue state and the system feedback information, and extracting a first keyword set;

searching for first user instruction information corresponding to the first keyword set;

searching second user instruction information corresponding to the second keyword set in the first user instruction information;

In a second aspect, this embodiment further discloses a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

In a third aspect, the present embodiment also discloses a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method.

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the method provided by the embodiment of the invention, a first dialogue state of the upper information and a second dialogue state of the lower information in the front and rear wheel dialogues in the multi-wheel dialogue are obtained; and respectively calculating first correlation between the first dialogue state and the second dialogue state, comparing the first correlation with a preset first threshold, and if the first correlation is smaller than the preset first threshold, identifying the user intention only according to the following information to obtain a user intention identification result. Because the relevance of the interactive information between the front and rear wheel conversations is fully considered in the embodiment, when the difference of the interactive information between the front and rear wheel conversations is large, the rear wheel conversation is used as a single piece of information to carry out user intention analysis, so that a more accurate analysis result can be obtained, and a basis is provided for realizing accurate feedback of the information sent by the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating steps of a method for identifying user intent in multiple sessions in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of the information flow of a multi-turn dialog system;

FIG. 3 is a schematic structural diagram of a multi-turn dialog system;

FIG. 4 is a block diagram of an exemplary application scenario in an embodiment of the present invention;

fig. 5 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The multi-turn conversation is a mode of acquiring necessary information to finally obtain a determined user instruction after preliminarily defining the user intention in the man-machine interaction process, corresponds to the processing of one thing, and can be represented as multi-turn conversation interaction between the man-machines, and if the user instruction can be defined through one-turn conversation, the multi-turn conversation can be represented as one-turn conversation interaction between the man-machines.

The inventor finds that, in the current multi-turn conversation, the randomness between words is ignored, whether decision is needed to be made according to the previous conversation context needs to be judged according to the correlation between the contents before and after the conversation, and the situations that the contents before and after the conversation are irrelevant exist in the multi-turn conversation.

The embodiment discloses a method for identifying user intention in multiple rounds of conversations, which judges whether the inquiry of the next round of conversation information needs to be carried out on the basis of the inquiry result of the previous round of conversation by analyzing the correlation of information between the previous round of conversation and the next round of conversation, for example: when a user says 'piggy caramel' first and then 'terror', the relevance of the two is judged first, if the two are not relevant, a single round of search is executed, namely 'piggy caramel' and 'terror' is searched separately, if the two are relevant, the search is carried out according to the 'piggy caramel', and then 'terror' is searched on the search result of the 'piggy caramel'. Because of the non-correlation between "piglet" and "terror", better and more accurate results can be obtained with the method disclosed in this example.

Exemplary method

Referring to fig. 1, a method for identifying a user intention in multiple rounds of conversations is shown in an embodiment of the present invention. In this embodiment, the method may include, for example, the steps of:

step S101: a first conversation state of the context information and a second conversation state of the context information in the multiple rounds of conversations are respectively obtained.

In the multi-turn conversation, a first conversation state corresponding to the fact that the user sends the above information and a second conversation state of the below information are obtained respectively. The following information is voice information sent by a user during the next man-machine interaction in the multi-round conversation, and the previous information is compared with the voice information sent by the user during the last man-machine interaction in the multi-round conversation. The previous information and the following information belong to the previous speech information and the next speech information sent by the user, and the previous information and the following information belong to natural speech conversations, so that multiple rounds of conversations can be carried out between Chinese, English or other natural speech and a conversation system. The dialog system, upon receiving the voice message from the user, assists the user in performing a task, which is typically a task of accessing information.

The dialog state comprises a text converted from voice information sent by a user and information related to the text information analyzed according to the text. After the upper information and the lower information are obtained, voice recognition and semantic recognition are respectively carried out on the upper information and the lower information to obtain a first dialogue state of the upper information and a second dialogue state of the lower information.

As shown in fig. 2, the multiple rounds of dialog include: speech understanding, speech generation, dialogue management, knowledge base searching and the like. The dialogue management comprises the steps of dialogue state tracking, action selection and the like. Multiple rounds of dialog can be considered as an extension of a single round of dialog based on analysis, where in each round of dialog the utterance is semantically understood, resulting in an internal characterization. The session management uses a finite state machine to represent the entire process of obtaining information in a session. Through several sessions, the system gradually acquires the required information and performs tasks, such as flight information inquiry.

With reference to fig. 3, first, the text information sent by the user is obtained, and a speech recognition result, that is, the text information corresponding to the text information is generated by performing speech recognition ASR on the text information sent by the user; the semantic parsing module NLU maps the text information into a user dialogue state, namely a first dialogue state; similarly, the obtained context information sent by the user is subjected to speech recognition ASR to obtain a speech recognition result, and the speech recognition result is mapped into a user conversation state to obtain a second conversation state.

In this step, in order to obtain the first dialogue state of the text information, it is first necessary to perform speech recognition on the text information, recognize text information included in the text information, and perform semantic analysis on the recognized text information to obtain information included in the text information. In general, there are two processing methods when analyzing the semantic, one is to retrieve the information corresponding to the text information, and the other is to generate the information corresponding to the text information based on the generation method. The method for obtaining information corresponding to the text information by searching generally requires a storage database, a large amount of dialogue data is stored in the storage database, an index is established between the dialogue data and dialogue keywords, and after the keywords contained in the text information are identified, the corresponding dialogue data in the database is output according to the keywords, that is, the analyzed first dialogue state corresponding to the text information. By means of the mode of generating and performing semantic analysis, the semantic analysis processing module constructs a voice analysis model by means of a large amount of data. And after a user inputs a text message, the voice analysis model outputs an analysis result corresponding to the text message. The voice analysis model is constructed on the basis of a large amount of dialogue data based on a neural network of deep learning. The result analyzed by the speech analysis model is the dialogue state corresponding to the above information or the below information.

And respectively carrying out voice recognition and semantic analysis on the following information by using the same voice recognition and semantic analysis methods to obtain a first dialogue state corresponding to the above information and a second dialogue state corresponding to the following information.

Step S102: and if the first correlation between the first dialogue state and the second dialogue state is smaller than a preset first threshold value, identifying the user intention according to the following information to obtain a user intention identification result.

After the first dialogue state and the second dialogue state are obtained in the above steps, calculating a first correlation between the first dialogue state and the second dialogue state, and judging whether the first correlation is smaller than the preset first threshold, if so, judging that the correlation between the upper information and the lower information is small, and directly identifying the user intention according to the lower information to obtain a user intention identification result.

Specifically, since the calculation of the correlation between the session states is the correlation between the slot information corresponding to the information included in the session states, the step of calculating the first correlation between the first session state and the second session state includes:

acquiring first slot position information of the first conversation state;

acquiring second slot position information of the second dialogue state;

and calculating the first correlation according to the first slot position information and the second slot position information.

The slot position information is: the preliminary user intent is translated into information needed to specify the user's instructions in a multi-turn dialog process, one slot corresponding to one type of information needed to be obtained in the processing of a thing. The slot information is information which must be acquired, and is not required to be completely filled in a multi-turn conversation, and is divided into necessary slot information and unnecessary slot information. Since the non-mandatory fill slot information may be derived based on context information, it may exist in the form of a default value.

For example: today is the weather? In this dialog, since the weather conditions are different regions in different weather states, the search for the weather conditions must be made according to the geographical location, but since it can be known that the location where the user is located is: the weather condition of Beijing can be inquired by default corresponding to the dialogue, so the system can directly make feedback information: and searching for Beijing weather.

Specifically, the step of calculating the first correlation according to the first slot position information and the second slot position information includes:

respectively acquiring first character string information and second character string information of information contained in each slot position in the first slot position information and the second slot position information;

and calculating the first correlation according to the editing distance.

For example, in a paired wheel session, the following information is: "what the exchange rate of the RMB to U.S. dollars is". The slot information that the following information contains in large quantity is the form of "inquiry (slot 1 is renminbi, slot 2 is dollar)", which is used as the input of the dialogue management module, at this time, the state tracking module judges the inquiry state of the round according to the information of the previous round and combines the input, and determines that the user state of the previous round is the money information inquiry, and then according to the character strings "renminbi" and "dollar" corresponding to the two slots of the following information and the slot information character string corresponding to the system feedback information of the previous round: the currency information inquiry calculates the editing distance between the two, obtains the minimum editing operation times required for converting the character string information corresponding to the following information into the character string corresponding to the preceding information, and further obtains the correlation between the two.

Specifically, the algorithm for calculating the edit distance between character strings includes:

assuming we use d [ i, j ] steps (this value can be saved using a two-dimensional array) to represent the minimum number of steps required to convert the string s [1 … i ] to the string t [1 … j ], then in the most basic case, i.e. when i equals 0, i.e. the string s is empty, the corresponding d [0, j ] is incremented by j characters, so that s is converted to t, and when j equals 0, i.e. the string t is empty, the corresponding d [ i,0] is decremented by i characters, so that s is converted to t.

Then, considering the general situation and an algorithm of dynamic programming, if we want to convert s [1.. i ] into t [1.. j ] through the least times of addition, deletion or replacement operation, we must add, delete or replace operation before the least times, so that the conversion from s [1.. i ] to t [1.. j ] can be completed only by doing the operation once again or not. The so-called "before" is divided into the following three cases:

1) we can convert s [1 … i ] to t [1 … j-1] in k operations

2) We can convert s [1.. i-1] to t [1.. j ] within k operations

3) We can convert s [1 … i-1] into t [1 … j-1] in k steps

For case 1, we only need to complete the matching by adding t [ j ] to s [1.. i ] at the end, thus requiring k +1 operations in total.

For case 2 we only need to remove s [ i ] at the end and then do the k operations, so a total of k +1 operations are needed.

For the case 3, we only need to replace s [ i ] with t [ j ] at the end so that s [1.. i ] ═ t [1.. j ] is satisfied, thus requiring a total of k +1 operations. If in case 3 s i is exactly equal to t j, we can do this using only k operations.

Finally, to ensure that the number of operations obtained is always the minimum, we can choose from the above three cases the least expensive one of the minimum number of operations needed to convert s [1.. i ] to t [1.. j ].

The algorithm comprises the following basic steps:

(1) constructing a matrix with m +1 rows and n +1 columns, wherein the matrix is used for storing the times of operations required to be executed for completing certain conversion, and the times of the operations required to be executed for converting the character string s [1.. n ] into the character string t [1 … m ] is the value of matrix [ n ] [ m ];

(2) matrix is initialized in first rows 0 to n and in first columns 0 to m.

Matrix [0] [ j ] represents the value of row 1, column j-1, which represents the number of operations that need to be performed to convert string s [1 … 0] to t [1.. j ], and it is clear that converting an empty string to a string of length j requires only j add operations, so the value of Matrix [0] [ j ] should be j, and so on.

(3) Examining each s [ i ] character from 1 to n;

(4) examining each s [ i ] character from 1 to m;

(5) comparing every two characters of the character string s and the character string t, and if the characters are equal, making cost be 0, and if the characters are not equal, making cost be 1;

(6) a, if we can convert s [1.. i-1] to t [1.. j ] within k operations, we can remove s [ i ] and then do the k operations, so a total of k +1 operations are required.

(6) b, if we can convert s [1 … i ] into t [1 … j-1] within k operations, that is d [ i, j-1] ═ k, we can add t [ j ] to s [1.. i ], thus requiring k +1 operations in total.

(6) c, if we can convert s [1 … i-1] into t [1 … j-1] within k steps, we can convert s [ i ] into t [ j ] so as to satisfy s [1.. i ] ═ t [1.. j ], thus k +1 operations are required in total. (cost is added here because if s [ i ] is exactly equal to t [ j ], then no further replacement operation is needed, which can be satisfied, if not, then a further replacement operation is needed, then k +1 operations are needed)

Because we need to obtain the minimum number of operations, we finally need to compare the operation numbers of the three cases, and take the minimum value as the value of d [ i, j ];

then (3), (4), (5), (6) are repeated, and the final result is in d [ n, m ].

And identifying the user intention by performing single-round conversation or multi-round conversation according to the size between the calculated first correlation and a preset first threshold value, if the first correlation is greater than the preset first threshold value, identifying the user intention by performing multi-round conversation, and if the first correlation is less than the preset first threshold value, identifying the user intention by performing single-round conversation.

In an implementation manner of this embodiment, in order to obtain a more accurate judgment result of correlation, if a first correlation between the first dialog state and the second dialog state is smaller than a preset first threshold, identifying a user intention according to the context information to obtain a user intention identification result, including:

step 103, if the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, obtaining system feedback information corresponding to the first dialogue state;

and automatically feeding back a reply message aiming at the machine system in the first conversation state and the user conversation, wherein the reply message is a system feedback message which is realized by a conversation management module. Referring to fig. 3, the dialog management module DM selects a system feedback behavior to be executed, that is, system feedback information, according to the first dialog state and the second dialog state; if the system feedback information needs to interact with the user, the language generation module NLG is triggered to generate natural language or speak a system utterance; finally, the generated language is read by the speech synthesis module TTS for the user to listen to.

The tasks of dialog management mainly include: dialog state maintenance and generation system decisions. Dialog state maintenance involves maintaining and updating the dialog state, such as: the dialog state at time t +1 is S_t+1Dependent on previous timeState S of moment t_tAnd the system behavior a of the preceding time t_tAnd the user behavior a corresponding to the current time t +1_t+1. Can be written as S_t+1←S_t+a_t+a_t+1And generating a system decision to generate a system feedback behavior according to the dialog state in the dialog state tracking and decide what to do next, wherein the system feedback behavior represents the dialog state input by the user and the feedback behavior made by the system, so that the input information of the dialog management model is the voice information of the user and the current dialog state obtained by analyzing the voice information of the user, and the output of the system feedback behavior is the system feedback behavior of the next step and the updated dialog state. Therefore, the more semantic information carried in the input information, the more accurate the information fed back by the dialogue management module.

For example: when the content of the above information is: i want to see the piglet cookie, and the corresponding dialog state content includes: movies, actors as animals, comedy and family types, and other information related to pig cookies, and system feedback information corresponding to the dialog state is: and (5) searching films and televisions. If the following information is: the 3 rd set in the first season, which can be obtained by speech recognition and semantic analysis of the context information, the second dialogue state corresponding to the context information is: a television play or an animation, a 3 rd episode, a multi-season scenario, etc. by calculating the similarity between the first dialog state and the second dialog state, it can be obtained that if the correlation between the first dialog state and the second dialog state is greater than a preset first threshold, system feedback information for the first dialog state needs to be acquired: the third set of the first season of piglet pecky was searched.

The dialog management module controls the process of man-machine dialog, and determines the reaction to the user at the moment according to the dialog history information. Most commonly, task-driven multi-turn conversations are provided, users have definite purposes such as meal ordering, ticket ordering and the like, user requirements are complex, and have many limiting conditions, so consultation responses with relatively complex contents need to be stated in multiple turns, on one hand, users can continuously modify or perfect own requirements in the conversation process, on the other hand, when the stated requirements of users are not specific or definite, a machine can help users to find satisfactory results through inquiry, clarification or confirmation, the conversation process is as shown in fig. 4, information communication is realized between users and a system through question answering, and the users send voice information: hi, i want to order a meal, and implement transmission of a voice instruction, after a system receives a voice instruction of a user (the system may be a voice robot, or other equipment capable of recognizing voice information of the user), the voice instruction is analyzed, and key information contained in the voice instruction is recognized: restaurant, then the system feeds back the query information: what kind of food you like, with the feedback dialog: and the food, after receiving the information, the user sends out the voice information again: if I like to eat the palace chicken dices, the system receives the keywords contained in the voice information: and D, the chicken bouillon is fed back and confirmed to the user according to the received information, and a satisfactory food ordering effect is finally obtained.

And step S104, if the correlation between the system feedback information and the second dialogue state is smaller than a preset second threshold, identifying the user intention according to the following information to obtain a user intention identification result.

If the correlation between the system feedback information corresponding to the first session state and the second session state acquired in the step S103 is smaller than a preset second threshold, it is determined that the correlation between the previous session information and the next session information is low, and only the user intention is identified with the following information, otherwise, it is determined that the correlation between the previous session information and the next session information is high, and the user intention is identified by combining the related contents of the following information and the preceding information.

Specifically, if the correlation between the system feedback information and the second session state is smaller than a preset second threshold in step S104, identifying the user intention according to the following information, and before the step of obtaining a user intention identification result, the method further includes:

acquiring third slot position information of the system feedback information;

acquiring second slot position information of the second dialogue state;

The method comprises the steps of obtaining first slot position information of a first conversation state and second slot position information of a second conversation state, obtaining third slot position information of system feedback information and second slot position information of the second conversation state respectively, and calculating second correlation between the system feedback information and the second conversation state according to the obtained slot position information.

Specifically, the step of calculating the second correlation according to the third slot position information and the second slot position information includes:

And calculating the correlation between the slot position information, specifically calculating the correlation between the slot position information and the character string information of the information contained in the slot position information, wherein the correlation between the character string information is embodied by calculating the editing distance between the character strings, and the calculation principle is the same as the principle of calculating the correlation between the first slot position information and the second slot position information in the steps.

In one implementation, the step of calculating the first correlation between the first dialog state and the second dialog state is preceded by:

If the slot corresponding to the first dialog state and/or the second dialog state is not completely filled, the calculation accuracy of the correlation may be low, so that in the above step, whether the slot included in the first dialog state and the second dialog state is completely filled or not is judged, if not, the slot is completely filled, and then the correlation between the two slots is calculated.

Similarly, before the step of calculating the correlation between the third slot position information of the system feedback information and the second slot position information of the second session state, the method further includes:

judging whether the third slot position information and the second slot position information are complete or not;

If the slot corresponding to the system feedback information and/or the second session state is not completely filled, the calculation accuracy of the correlation may be low, so that in the above step, it is determined whether the slot included in the system feedback information and the second session state is completely filled, if not, the slot is completely filled, and then the correlation between the two is calculated.

When the correlation between the previous dialog and the next dialog in the multiple dialogs is high, that is, when the first correlation is higher than a preset first preset threshold, or when the first correlation is less than or equal to the preset first threshold but the second correlation is less than a preset second threshold, recognizing the user intention by the multiple dialogs, that is, recognizing the user intention by combining the first dialog state, the system feedback information and the following information to obtain a user intention recognition result.

Specifically, the step of identifying the user intention by combining the first dialogue state, the system feedback information and the following information to obtain the user intention identification result includes:

searching second user instruction information corresponding to the second keyword set under the searched first user instruction information;

When the first dialogue state and/or the correlation between the system feedback information and the second dialogue state meet a preset threshold condition, recognizing the intention of the user for the character information contained in the first dialogue state and the system feedback information, obtaining a search result aiming at the above information, and searching the following information on the basis of the search result of the above information, thereby feeding back a user instruction and the search result which are sent by the user and correspond to the above information and the following information.

When the correlation between the previous dialog and the next dialog in the multi-turn dialog is low, namely when the first correlation is smaller than a preset first preset threshold value, the user intention is identified by the single-turn dialog, namely the user intention is identified only by combining the following information, and the user intention identification result is obtained.

When the correlation among the first dialog state, the system feedback information, and the second dialog state does not satisfy a preset threshold condition, executing a single-turn dialog, and recognizing the user intention only according to the following information, specifically, the step of recognizing the user intention according to the following information to obtain the user intention recognition result includes:

searching second user instruction information corresponding to the second keyword set;

In the single-round dialog, the result of the user intention recognition is obtained only for the following information, and the search is not limited in the related content of the following information, so that the search information which is more suitable for the user intention can be obtained.

In the method disclosed by the embodiment, because the relevance calculation is performed on the conversation state and the system feedback information corresponding to the adjacent context information in the multiple rounds of conversations, the situation that the search result of the next sentence of conversation is generated on the basis of the search result of the previous sentence of conversation when no relevance exists between the previous sentence of conversation and the next sentence of conversation can be avoided, and the corresponding search is performed again according to the content of the next sentence of conversation, so that the accuracy of the user intention identification is improved.

In one embodiment, in order to avoid that the user sends the same voice information back and forth, the dialog system repeatedly calculates the correlation of the same voice, thereby increasing the task amount of system information processing, before the step of acquiring a first dialog state of the above information in the multiple rounds of dialog, system feedback information corresponding to the first dialog state, and a second dialog state of the below information, the method further comprises:

and if so, ignoring the context information, re-acquiring the session information one round after the context information, and replacing the re-acquired session information with the context information.

In the above steps, after the context information and the context information are obtained, firstly, whether the context information and the context information are the same is compared, if the context information and the context information are the same, it is determined that the user repeatedly speaks the same voice information, the received context information is ignored, and the information in the next round of the context information is received again, so that the unnecessary information processing process of the system is avoided. Such as: when the user repeatedly speaks in a contextual dialog: i want to order an air ticket, because the information of the two times is the same, the second received information can be directly ignored: the voice information of 'I wants to order an air ticket' is not subjected to semantic analysis, user intention identification and other processing, but the voice information of 'from Xian to Beijing' sent by the user after sending the second voice information 'I wants to order the air ticket' is directly obtained again, and the newly received voice information of 'from Xian to Beijing' is used as the following information to judge the similarity.

In an application embodiment of the present embodiment, the following steps may be adopted for user intention identification;

step H1, first determining whether the above information is the same as the below information, if so, executing step H2, otherwise, executing step H3;

step H2, retrieve the following information.

Step H3, acquiring a first dialogue state of the context information and a second dialogue state of the context information;

a step H4 of calculating a first correlation between the first dialog state and the second dialog state; judging whether the first correlation is larger than a preset first threshold value or not; if yes, go to step H5, otherwise go to step H6;

step H5, obtaining system feedback information corresponding to the first dialog state, and calculating a second correlation between the system feedback information and the second dialog state, and whether the second correlation between the second dialog state and the system feedback information is lower than a preset second threshold, if yes, executing step H7, and if no, executing step H8;

and step H6, judging that the two-wheel conversations have correlation, entering the two-wheel conversation, acquiring system feedback information corresponding to the first conversation state, and combining the first conversation state, the system feedback information and the second conversation state to identify the user intention.

In step H7, the present dialog may be used as a single-turn dialog to identify the user's intention.

And step H8, executing multiple rounds of conversations for the current conversation, and identifying the user intention by combining the first conversation state, the system feedback information and the second conversation state.

For example: when the content of the above information is: i want to see the piglet cookie, and the corresponding dialog state content includes: movies, actors as animals, comedy and family types, and other information related to pig cookies, and system feedback information corresponding to the dialog state is: and (5) searching films and televisions. If the following information is: how today, it can be obtained by speech recognition and semantic analysis of the context information, and the second dialogue state corresponding to the context information is: the method comprises the steps of calculating the similarity between a first conversation state and a second conversation state according to the region position, today, temperature, rain and the like, obtaining that the correlation between the first conversation state and the second conversation state is lower than a preset first threshold value, and judging that the following information and the previous information do not have the correlation. Therefore, system feedback information cannot be made on the context information based on the result of the context information, and the second dialogue state needs to be searched again to make system feedback information for the second dialogue state: and searching the weather condition of the region where the current user is located today.

For example: when the content of the above information is: i want to watch comedy, the corresponding dialog state content includes: the movie and the type are comedy and family drama, and the system feedback information corresponding to the conversation state is as follows: and (5) searching films and televisions. If the following information is: the piglet cookie corresponds to the conversation state content comprising: movies, actors as animals, comedy and family types, and other information related to pig cookies, and system feedback information corresponding to the dialog state is: and searching comedy movies. The method can be obtained by performing voice recognition and semantic analysis on the context information, and if the correlation between the first dialogue state and the second dialogue state is higher than a preset first threshold value, the context information and the context information are judged to have the correlation. Therefore, system feedback information needs to be made on the context information based on the result of the context information, and the second dialog state needs to be searched again to make system feedback information for the second dialog state: and searching relevant movie and television data of the piglet pecks.

Exemplary device

On the basis of the above method, the embodiment further discloses a computer device, as shown in fig. 5, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

On the basis of the above method, the present embodiment also discloses a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method.

In an exemplary embodiment, a computer device may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

according to the method provided by the embodiment of the invention, the voice information of front and back wheel conversations in multiple rounds of conversations is acquired; calculating the correlation between the conversation states corresponding to the voice information of the front and rear wheel conversations, judging whether the correlation exceeds a certain threshold value, if not, identifying the user intention only for the conversation state of the rear wheel conversation, if so, judging whether the correlation between the system behavior of the front wheel conversation and the conversation state of the rear wheel conversation exceeds a certain threshold value, if not, identifying the user intention solely according to the conversation state of the rear wheel conversation; if the answer is over, the user intention is identified by combining the conversation state of the previous round with the second conversation state. Because the information correlation between the front and rear wheel conversations is fully considered in the embodiment, when the information difference between the front and rear wheel conversations is large, the rear wheel conversation is used as a single message to carry out user intention analysis, so that a more accurate analysis result can be obtained, and a basis is provided for realizing accurate feedback of the information sent by the user.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of recognition of a user's intent in a multi-turn conversation, comprising:

2. The method of claim 1, wherein the step of obtaining a first session state of context information and a second session state of context information in a multi-turn dialog is preceded by the step of:

3. The method of claim 1, wherein the step of obtaining a first dialog state of context information and a second dialog state of context information in a plurality of dialogs respectively comprises:

4. The method for identifying the user intention in the multi-turn dialog according to claim 1, wherein if the first correlation between the first dialog state and the second dialog state is smaller than a preset first threshold, identifying the user intention according to the context information to obtain a user intention identification result comprises:

5. The method for recognizing user's intention in multiple turns of conversations according to any of claims 1-4, wherein if the first correlation between the first conversation state and the second conversation state is smaller than a preset first threshold, then the step of recognizing the user's intention according to the context information to obtain the user's intention recognition result further comprises:

acquiring first slot position information of the first conversation state;

acquiring second slot position information of the second dialogue state;

6. The method of claim 5, wherein the calculating the first correlation from the first slot information and the second slot information comprises:

and calculating the first correlation according to the editing distance.

7. The method for recognizing user's intention in multiple rounds of conversations as claimed in claim 4, wherein if the correlation between the system feedback information and the second conversation state is smaller than a preset second threshold, before the step of recognizing the user's intention according to the context information and obtaining the user's intention recognition result, further comprising:

acquiring third slot position information of the system feedback information;

acquiring second slot position information of the second dialogue state;

8. The method of claim 7, wherein the step of calculating the second correlation according to the third slot information and the second slot information comprises:

9. The method for identifying user intent in multiple sessions according to claim 5, wherein the step of obtaining the first slot information and the second slot information of the first session state and the second session state respectively further comprises:

10. The method for identifying user intention in multiple rounds of conversations according to claim 7, after the step of respectively obtaining the system feedback information and the third slot information and the second slot information of the second conversation state, further comprising:

11. The method for recognizing the user's intention in multiple dialogs according to any one of claims 1-4 and 6-10, wherein the step of recognizing the user's intention according to the context information and obtaining the recognition result of the user's intention comprises:

12. The method for recognizing user's intention in multiple dialogs according to claim 1, wherein the recognition method further comprises:

13. The method for recognizing user's intention in multiple dialogs according to claim 4, wherein the recognition method further comprises:

14. The method for recognizing user's intention in multiple dialogs according to claim 12 or 13, wherein the step of combining the first dialog state, the system feedback information and the following information to recognize the user's intention and obtaining the user's intention recognition result includes:

15. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 14 when executing the computer program.

16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.