CN112069302A

CN112069302A - Training method of conversation intention recognition model, conversation intention recognition method and device

Info

Publication number: CN112069302A
Application number: CN202010968515.6A
Authority: CN
Inventors: 童丽霞; 雷植程; 吴俊江; 陈岁迪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-11
Anticipated expiration: 2040-09-15
Also published as: CN112069302B

Abstract

The application discloses a training method of a conversation intention recognition model, a conversation intention recognition method and a conversation intention recognition device, and relates to the field of machine learning. The method comprises the following steps: acquiring sample conversation sentences carrying reference label data, reference classification data and generation time; generating a sample conversation sentence marked with a fixed weight and a learning weight corresponding to a generation time; training a conversation intention recognition model based on the sample conversation sentences to obtain sample characteristic vectors carrying sample label data and sample classification data; calculating the recognition loss of the conversation intention according to the sample label data and the reference label data and the sample classification data and the reference classification data; and carrying out propagation training on the conversation intention recognition model according to the recognition loss to obtain the trained conversation intention recognition model. The method distributes sample weights according to a time sequence, and uses the samples marked with the weights to train the model, so that the model can more accurately learn new meanings of new knowledge and original knowledge.

Description

Training method of conversation intention recognition model, conversation intention recognition method and device

Technical Field

The present application relates to the field of machine learning, and in particular, to a training method for a conversation intention recognition model, a conversation intention recognition method, and a device.

Background

The intelligent customer service system can automatically answer the input problem of the user, so that the intelligent customer service system is widely applied to various industries, and a quick and effective technical means based on natural language is established for communication between enterprises and users.

When the intelligent customer service system replies the conversation sentences input by the user, firstly identifying the conversation intention of the conversation sentences; for example, a data stream of tag data of a huge amount of conversation intents is set in the intelligent customer service system, and based on the data stream of tag data, a conversation sentence input by a user can be encoded by calling a conversation intention recognition model, the tag data of the conversation sentence is determined, and then a conversation intention is determined based on the tag data of the conversation sentence to reply to the conversation sentence.

However, as time goes on, part of tag data in the data stream changes, and problems such as change, invalidation, addition and the like of tag meanings occur, so that the real conversation intention of the user cannot be accurately identified by the original conversation intention identification model.

Disclosure of Invention

The embodiment of the application provides a training method of a conversation intention recognition model, a conversation intention recognition method and a device, when the conversation intention recognition model is subjected to incremental learning, the importance degree of a historical sample conversation sentence and a newly added historical sample conversation sentence in the model training process is marked by adopting weights, so that the trained conversation intention recognition model can more accurately learn new knowledge and new meanings of the original knowledge on the basis of the original knowledge. The technical scheme is as follows:

according to an aspect of the present application, there is provided a training method of a conversation intention recognition model, the method including:

acquiring sample conversation sentences from a database, wherein the sample conversation sentences comprise historical sample conversation sentences and newly-added sample conversation sentences, and the sample conversation sentences carry reference tag data, reference classification data and conversation sentence generation time;

determining a fixed weight and a learning weight corresponding to the generation time, and generating a sample conversation sentence marked with the fixed weight and the learning weight, wherein the fixed weight and the learning weight are both used for indicating the importance degree of the sample conversation sentence on model training;

inputting the sample conversation sentences into a conversation intention recognition model to be trained for label learning and classification learning to obtain sample characteristic vectors carrying sample label data and sample classification data;

calculating the recognition loss of the conversation intention according to the sample label data, the reference label data and the sample classification data and the reference classification data;

and carrying out propagation training on the conversation intention recognition model to be trained according to the recognition loss, and finally obtaining the trained conversation intention recognition model.

According to another aspect of the present application, there is provided a conversation intention recognition method applied to a computer device of a conversation intention recognition model according to the above aspect, the method including:

calling a conversation intention recognition model to perform vector conversion processing on a conversation sentence to obtain a sentence coding vector;

calling a conversation intention recognition model to perform coding processing on a sentence coding vector to obtain a conversation characteristic vector, wherein the conversation hierarchy refers to a processing hierarchy for vector coding based on a conversation environment, and the conversation characteristic vector carries tag data;

and calling a conversation intention recognition model to perform label classification on the conversation vector characteristics based on the label data to obtain classified conversation characteristic vectors, wherein the classified conversation characteristic vectors are used for indicating the conversation intention of the conversation sentences.

According to another aspect of the present application, there is provided a training apparatus for a conversation intention recognition model, the apparatus including:

the acquisition module is used for acquiring sample conversation sentences from the database, wherein the sample conversation sentences comprise historical sample conversation sentences and newly-added sample conversation sentences, and the sample conversation sentences carry reference tag data, reference classification data and generation time of the conversation sentences;

the generating module is used for determining fixed weight and learning weight corresponding to the generating time, and generating a sample conversation sentence marked with the fixed weight and the learning weight, wherein the fixed weight and the learning weight are both used for indicating the importance degree of the sample conversation sentence on model training;

the learning module is used for inputting the sample conversation sentences into a conversation intention recognition model to be trained for label learning and classification learning to obtain sample characteristic vectors carrying sample label data and sample classification data; calculating the recognition loss of the conversation intention according to the sample label data, the reference label data and the sample classification data and the reference classification data; and carrying out propagation training on the conversation intention recognition model to be trained according to the recognition loss, and finally obtaining the trained conversation intention recognition model.

According to another aspect of the present application, there is provided a conversation intention identifying apparatus including:

the embedded module is used for carrying out vector conversion processing on the conversation sentences to obtain sentence coding vectors;

the coding module is used for coding the sentence coding vector to obtain a conversation characteristic vector, the conversation level refers to a processing level for vector coding based on a conversation environment, and the conversation characteristic vector carries tag data;

and the classification module is used for performing label classification on the conversation vector characteristics based on the label data to obtain classified conversation characteristic vectors, and the classified conversation characteristic vectors are used for indicating the conversation intention of the conversation sentences.

According to another aspect of the present application, there is provided a computer device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, code set, or instruction set, which is loaded and executed by the processor to implement the training method of the conversation intention recognition model according to the above aspect, or the conversation intention recognition method according to the above aspect.

According to another aspect of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, code set, or set of instructions stored therein, which is loaded and executed by a processor to implement the method for training a conversation intention recognition model according to the above aspect, or the method for conversation intention recognition according to the above aspect.

According to another aspect of the application, a computer program product or a computer program is provided, comprising computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions to cause the computer device to perform the training method of the conversation intention recognition model as described above, or the conversation intention recognition method as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the method adopts the historical sample conversation sentences and the newly added sample conversation sentences as training samples of a conversation intention recognition model to be trained, so that the model can learn original knowledge based on the historical sample conversation sentences and can learn new knowledge based on the newly added sample conversation sentences; secondly, different weights are given to the sample conversation sentences according to the time sequence in the training process so as to highlight the importance degree of the sample conversation sentences at different times, so that the model can more accurately learn new meanings of new added knowledge, original knowledge and original knowledge, and further the model can more accurately master the real conversation intention of the user conversation in the application process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a computer system provided in an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a session intent recognition method provided by an exemplary embodiment of the present application;

FIG. 3 is a block diagram of a session intent recognition model provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart of a method for training a conversational intent recognition model provided by an exemplary embodiment of the present application;

FIG. 5 is a flowchart of a method for training a conversational intent recognition model provided by another exemplary embodiment of the present application;

FIG. 6 is a block diagram of a structure of a conversational intent recognition model to be trained provided by an exemplary embodiment of the present application;

FIG. 7 is a block diagram of a training apparatus for a conversational intent recognition model according to an exemplary embodiment of the present application;

fig. 8 is a block diagram illustrating a session intention recognition apparatus according to an exemplary embodiment of the present application;

fig. 9 is a schematic device structure diagram of a server according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, terms related to embodiments of the present application will be described.

Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The conversation intention identification method provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and machine learning, and can be applied to the following scenes:

intelligent customer service

In the application scenario, the session intention recognition model trained by the method provided by the embodiment of the present application may be applied to applications such as a shopping application, a group purchase application, and a travel management application (e.g., a ticket ordering application and a hotel ordering application). The application program is provided with the intelligent customer service, and the user can obtain a reply sentence of the conversation sentence required by the user through conversation with the intelligent customer service. The intelligent customer service identifies the conversation intention of the user conversation through a conversation intention identification model built in a background server of the application program, and then automatically replies the conversation based on the identified conversation intention, wherein the conversation intention identification model is trained in advance. Illustratively, when receiving a conversation input by a user, the conversation intention recognition model performs conversation intention recognition on the conversation, outputs a feature vector capable of indicating the conversation intention of a conversation sentence, and further performs a decoding operation on the feature vector to obtain a reply sentence of the user conversation. It should be noted that, the decoding operation performed on the feature vector employs a decoder, which is trained corresponding to the above-mentioned conversation intention recognition model and is dedicated to the decoding operation performed on the feature vector to output a reply sentence of a conversation sentence. For example, the intelligent customer service is the customer service of a shopping application, and the user presents a problem about inputting an item a in an image, and the problem is as follows: is item a recently discounted? The intelligent customer service outputs an answer after recognizing the conversation intention: item a participates in discount policy 1, discount policy 2, and discount policy 3.

Two, virtual assistant

In the application scenario, the conversation intention recognition model trained by the method provided by the embodiment of the application can be applied to intelligent equipment such as an intelligent terminal or an intelligent home. Taking a virtual assistant set in the intelligent terminal as an example, the virtual assistant recognizes the conversational intention of the user conversation through a trained conversational intention recognition model, and then outputs a reply sentence based on the recognized conversational intention. Illustratively, when receiving a conversation input by a user, the conversation intention recognition model performs conversation intention recognition on a conversation sentence, outputs a feature vector capable of indicating the conversation intention of the conversation sentence, and further performs a decoding operation on the feature vector to obtain a reply sentence of the user conversation. It should be noted that, the decoding operation performed on the feature vector employs a decoder, which is trained corresponding to the above-mentioned conversation intention recognition model and is dedicated to the decoding operation performed on the feature vector to output a reply sentence of a conversation sentence. For example, user a records travel information in a notepad, and user a presents a problem to the virtual assistant: what travel is there today? The virtual assistant outputs the answer: the day 6 o' clock in the afternoon participates in classmate dinner.

The above description is given by taking only two application scenarios as examples, and the method provided in the embodiment of the present application may also be applied to other scenarios that require session intention identification, for example, a scenario in which a robot guides a visitor to reply to a question of the visitor when the visitor visits a museum, a scenario in which an intelligent home automatically adjusts a setting parameter according to a user session, and the like.

The conversation intention recognition method and the training method of the conversation intention recognition model provided by the embodiment of the application can be applied to computer equipment with stronger data processing capacity. In a possible implementation manner, the conversation intention recognition method and the training method of the conversation intention recognition model provided by the embodiment of the application can be applied to a personal computer, a workstation or a server, namely, the conversation intention recognition and the training of the conversation intention recognition model can be realized through the personal computer, the workstation or the server.

The trained conversation intention recognition model can be realized as a part of an application program and is installed in the terminal, so that the terminal recognizes the conversation intention of the user conversation when receiving the user conversation, and then outputs a reply sentence which accords with the conversation intention; or the trained conversation intention recognition model is arranged in a background server of the application program, so that the terminal provided with the application program can recognize the conversation intention of the user conversation by means of the background server, and further the conversation function with the user is realized.

FIG. 1 illustrates a schematic diagram of a computer system provided by an exemplary embodiment of the present application. The computer system 100 includes a computer device 110 and a server 120, wherein the computer device 110 and the server 120 perform data communication via a communication network, optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network.

The computer device 110 has installed therein an application program supporting a session function, where the application program may be a Virtual Reality application (VR), an Augmented Reality Application (AR), a game application, a picture album application, a social contact application, and the like, which is not limited in this embodiment of the present invention.

Optionally, the computer device 110 may be a mobile terminal such as a smart phone, a smart watch, a tablet computer, a laptop portable notebook computer, and an intelligent robot, or may also be a terminal such as a desktop computer and a projection computer, and the embodiment of the present application does not limit the type of the computer device.

The server 120 may be implemented as one server, or may be implemented as a server cluster formed by a group of servers, which may be physical servers, or may be implemented as a cloud server. In one possible implementation, server 120 is a backend server for applications in computer device 110.

As shown in fig. 1, in the present embodiment, a chat application is running on the computer device 110, and the user can obtain a reply corresponding to the input session by chatting with the smart customer service. Illustratively, a conversation sentence is input in the user interface of the smart customer service, or a conversation voice is input in the user interface of the virtual assistant, and the computer device 110 sends the conversation sentence or the conversation voice to the server 120; the server 120 is provided with a trained conversation intention recognition model 10 and a decoder 20, the conversation intention recognition model 10 encodes a conversation sentence (including a conversation sentence converted from conversation speech) to obtain a sentence coding vector 11, then the sentence coding vector 11 is simultaneously subjected to text-label labeling, global feature learning and local feature learning to respectively obtain a first feature vector 12, a second feature vector 13 and a third feature vector 14, label classification is performed on the basis of the first feature vector 12, the second feature vector 13 and the third feature vector 14 to obtain a classified conversation feature vector 15 in which conversation intents are learned, and further, the decoder 20 decodes the classified conversation feature vector to output a reply sentence corresponding to the conversation sentence; the server 120 returns the reply sentence to the terminal 110, or converts the reply sentence into reply voice and returns it to the terminal 110. For example, the input problem: is item a recently discounted? Outputting an answer: comprises the following steps of; then, the question is input: what discount activities? Outputting an answer: item a participates in discount policy 1, discount policy 2, and discount policy 3.

For training of the conversation intention recognition model 10, the server 120 may store in advance a sample conversation sentence labeled with reference label data and reference classification data, train the conversation intention recognition model 10 based on the sample conversation sentence, output the sample label data and the sample classification data from the conversation intention recognition model 10, calculate a recognition loss of the conversation intention based on the reference label data and the sample label data, the reference classification data and the sample classification data, and further adjust model parameters in the conversation intention recognition model 10 by using the recognition loss of the conversation intention, so as to finally obtain the trained conversation intention recognition model 10.

For convenience of description, the following embodiments are described as examples in which the conversation intention recognition method and the training method of the conversation intention recognition model are performed by the server.

Fig. 2 shows a flowchart of a session intention identification method provided by an exemplary embodiment of the present application. The embodiment is described by taking the method as an example for being used in the server 120 in the computer system 100 shown in fig. 1, and the method includes the following steps:

step 201, performing vector conversion processing on the input conversation sentence to obtain a sentence coding vector.

A conversation intention recognition model is built in the server, and the server calls the conversation intention recognition model to perform vector conversion processing on the input conversation sentence to obtain a sentence coding vector; illustratively, the server calls a conversation intention recognition model to perform sentence-level coding processing on the input conversation sentence to obtain a sentence coding vector. Optionally, the conversation intention recognition model is a model trained by an incremental learning method, which is the training method of the conversation intention recognition model provided by the present application. Incremental learning refers to a learning system continuously learning new knowledge from new samples and storing most of the previously learned knowledge. In this embodiment, the session intention recognition model continuously learns new label data and classification data from new training samples, and stores most of the label data and classification data that have been learned before.

Optionally, the conversation intention recognition model includes a sentence encoder, the sentence encoder includes a word embedding layer, a coding-decoding layer, and a self-attention layer, and the server invokes the word embedding layer to embed words in the conversation sentence to obtain a word vector when invoking the conversation intention recognition model; calling a coding-decoding layer to carry out coding and decoding processing on the word vectors to obtain sentence vectors; and calling a self-attention layer to encode the sentence vector to obtain a sentence encoding vector.

Illustratively, the encoding-decoding layer in the session intention recognition model employs a transform model. As shown in fig. 3, the conversation intention recognition model includes a sentence encoder (sentenceoder) 30 and a conversation encoder (sessionencoder)40, the sentence encoder (sentenceoder) 30 includes a word embedding (word) layer 31, a transform layer 32, and a self-attention (self-attention) layer 33; wherein, the output terminal of the word layer 31 is connected with the input terminal of the transform layer 32, the output terminal of the transform layer 32 is connected with the input terminal of the self-addressing layer 33, and the output terminal of the self-addressing layer 33 is connected with the sessioncoder 40; in the process of coding the conversation sentences, after the server carries out word segmentation processing on the conversation sentences, words in the conversation sentences are input from the input end of the word building layer 31, and the word building layer 31 carries out word embedding on the words in the conversation sentences to obtain word vectors; inputting the word vector into the transform layer 32, and encoding and decoding the word vector by the transform layer 32 to obtain a sentence vector of the conversation sentence; the sentence vector is input into the self-attention layer 33 to carry out self-attention learning, and finally, the sentence coding vector of the conversation sentence is obtained.

Step 202, performing session level coding processing on the sentence coding vectors to obtain session feature vectors, wherein the session feature vectors carry tag data.

The server calls a conversation intention recognition model to perform conversation level coding processing on the sentence coding vector, wherein the conversation level refers to a processing level for performing vector coding based on a conversation environment, namely refers to a level for processing a sentence as a conversation during the coding processing. Illustratively, the server calls the conversation intention recognition model to perform feature learning on the sentence coding vector, learns the conversation sentence as the features of the dialog, including the tag data of the learned conversation sentence, and finally obtains the conversation feature vector, so that the conversation feature vector carries the tag data.

Optionally, if the current time is the recognition of the conversation intention of the ith conversation sentence, the server calls a conversation intention recognition model to splice the ith sentence coding vector and the (i-1) th sentence coding vector (or the (i-1) th sentence coding vector), then performs conversation level coding processing on the spliced sentence coding vectors, and finally obtains the conversation feature vector corresponding to the ith sentence coding vector, wherein i is a positive integer greater than 1.

Optionally, the server encoding session feature vector may also learn the tag data, the global feature, and the local feature of the sentence encoding vector by adopting a parallel learning manner; illustratively, the server calls a conversation intention recognition model, labels label data of sentence coding vectors to obtain first feature vectors, performs global feature learning on the sentence coding vectors to obtain second feature vectors, performs local feature learning on the sentence coding vectors to obtain third feature vectors, and generates conversation feature vectors according to the first feature vectors, the second feature vectors and the third feature vectors.

Illustratively, the server calls the session intention recognition model to combine the first feature vector, the second feature vector and the third feature vector to generate the session feature vector, and the combination mode can adopt a mode of vector addition, a mode of vector weighted addition, or other modes.

In order to fully learn the semantics of words and sentences in the sentence coding vector, the sentence coding vector can be learned by adopting an architecture of an embedded document classification Attention network (HAN), and optionally, the conversation intention recognition model comprises a conversation encoder which comprises a label embedding Attention layer, a bidirectional gating circulation unit and a convolutional neural network layer; when the conversation feature vector is coded, the server calls a conversation intention recognition model, calls a tag embedding attention layer of the conversation intention recognition model, and labels the tag data of a sentence coding vector at a word level and a sentence level in sequence to obtain a first feature vector; calling a bidirectional gating circulation unit of the two-way gating circulation unit, and sequentially performing word-level and sentence-level global feature learning on the sentence coding vector to obtain a second feature vector; and calling a convolutional neural network layer of the convolutional neural network, and sequentially carrying out word level and sentence level local feature learning on the sentence coding vectors to obtain a third feature vector.

Illustratively, as shown in fig. 3, the sessionencoder40 includes a label-embedded attention (label-sensing) layer 41, a bidirectional gated Recurrent Units (Bid-GRUs) 42, and a Convolutional Neural Networks (CNN) layer 43; wherein the output of self-actuation layer 33 is connected to the input of label-sensing actuation layer 41, the input of Bid-GRU 42, and the input of CNN layer 43, respectively. In the process of learning the sentence coding vector, the label-sentenceaction layer 41 sequentially learns the sentence coding vector at a word level and a sentence level, learns the semantics of the important words and important sentences in the sentence coding vector, and labels the tag data of the sentence coding vector based on the semantics of the important words and important sentences. The Bid-GRU 42 performs word-level and sentence-level global feature learning on the sentence coding vectors in sequence, for example, each sentence coding vector in the spliced coding vector corresponding to the ith sentence coding vector and the first i-1 sentence coding vectors is subjected to attention learning, important words and important sentences in the ith sentence coding vector are learned based on the global learning environment of the i sentence coding vectors, and a global feature vector, that is, a second feature vector is obtained. The CNN layer 43 sequentially performs word-level and sentence-level local feature learning on the sentence coding vectors, for example, performs attention learning on the ith and (i-1) th sentence coding vectors in the concatenated coding vectors, and learns the important words and important sentences in the ith sentence coding vector based on the local learning environments of the two sentence coding vectors to obtain a local feature vector, i.e., a third feature vector.

Illustratively, the Label-sensing attribute layer 41 adopts an architecture setting of a Label-Embedding attribute Model (LEAM), learns the text and the Label in the same space, and constructs a text representation by using the correlation between the text and the Label.

And 203, performing label classification on the conversation feature vector based on the label data to obtain a classified conversation feature vector, wherein the classified conversation feature vector is used for indicating the conversation intention of a conversation sentence.

The conversation feature vector carries label data, and the server calls a conversation intention identification model to classify labels based on the label data to obtain the classified conversation feature vector. Optionally, the session feature vector further carries global features and local features, and the server calls the session intention recognition model to perform label classification based on the label data, the global features and the local features to obtain the classified session feature vector. Wherein the classified conversation feature vector is used for indicating the conversation intention of the conversation sentence.

Optionally, the generated classified conversational feature vector may be decoded by a decoder, so as to obtain a reply sentence of a conversational sentence; for example, if a conversation sentence is a question, the decoding obtains an answer to the question. Wherein the decoder is trained on the classified session feature vectors.

Optionally, the session intention recognition model in the server further comprises a classifier; and the server calls the classifier to perform label classification on the session feature vector to generate the classified session feature vector.

In summary, in the conversation intention recognition method provided by this embodiment, the server calls the conversation intention recognition model to recognize the conversation intention expressed by the conversation sentence, the conversation intention recognition model is trained by using the historical sample conversation sentence and the newly added sample conversation sentence as training samples, and different weights are given to the sample conversation sentence according to the time sequence in the training process to highlight the importance degree of the sample conversation sentence at different times, so that the model can learn the new added knowledge and the new meaning of the original knowledge more accurately, and the model can grasp the conversation intention of the user contained in the conversation sentence more accurately when applied.

The method also carries out the coding processing of the conversation hierarchy based on the sentence coding vector of the conversation sentence, namely labels the label data aiming at the sentence coding vector, and can accurately grasp the whole meaning of the conversation sentence, so that the label data labeled on the sentence coding vector is more consistent with the semantics of the sentence, namely the label data carried by the conversation characteristic vector obtained by the coding processing is more consistent with the semantics of the conversation sentence, and further, the label classification based on the conversation characteristic vector is more accurately realized so as to identify the real conversation intention of the user conversation.

The method also calls a conversation intention recognition model to simultaneously perform coding learning of text-labels, global features and local features to generate a conversation feature vector, so that the conversation feature vector can more fully learn the conversation intention of the user conversation; in addition, when the encoding learning of text-label, global characteristic and local characteristic is carried out, the structure of HAN is adopted, and the sentence semanteme can be learned from two levels of words and sentences, so that the meaning of the conversation sentence can be more accurately grasped.

FIG. 4 is a flowchart illustrating a method for training a conversational intent recognition model according to an exemplary embodiment of the present application. The embodiment is described by taking the method as an example for being used in the server 120 in the computer system 100 shown in fig. 1, and the method includes the following steps:

step 301, obtaining sample conversation sentences from the database, where the sample conversation sentences include historical sample conversation sentences and newly added sample conversation sentences, and the sample conversation sentences carry reference tag data, reference classification data, and generation time of the conversation sentences.

The historical sample conversation sentence is a sample conversation sentence which is used for training the conversation intention recognition model to be trained last time, and the newly added sample conversation sentence is a sample conversation sentence which is newly added for training the conversation intention recognition model to be trained compared with the historical sample conversation sentence. For example, the to-be-trained conversation intention recognition model used this time refers to a conversation intention recognition model trained by using historical sample conversation sentences alone, or the to-be-trained conversation intention recognition model used this time is an untrained conversation intention recognition model. The historical sample conversation sentences and the newly added sample conversation sentences both carry reference label data, reference classification data and the generation time of the conversation sentences.

The server acquires a stored training sample from the database, the training sample being a sample conversation sentence that has been subjected to labeling processing, the training sample including n sample conversation sentences, each of the sample conversation sentences being labeled with reference label data and reference classification data. Wherein the reference tag data is correct tag data of the defined sample conversation sentence; optionally, the tag data refers to classification tag data. For example, the tag data may be tag data for labeling types of emotion, business, interest, and behavior included in the sample conversation sentence. For example, the emotion type of the sample conversation sentence is like, annoying, acceptable, or the like, or the business type is consulting service, shopping guide service, information registration service, or the interest type is mountain climbing, swimming, shopping, or the behavior type is purchasing behavior, payment behavior, unsubscribing behavior, or the like. The reference classification data is correct classification data of the defined sample conversation sentence, and the classification data is obtained by classifying based on the tag data.

Step 302, a fixed weight and a learning weight corresponding to the generation time are determined, and a sample conversation sentence marked with the fixed weight and the learning weight is generated.

The fixed weight and the learning weight are used for indicating the importance degree of the sample conversation sentence on model training. Optionally, a mapping relation between the time and the fixed weight and a corresponding relation table between the time period and the learning weight exist in the server; therefore, the determination of the fixed weight and the learning weight can be implemented by the following two steps:

1) a fixed weight corresponding to the generation time is calculated based on the mapping relation.

In the above mapping relation, the time and the fixed weight have a positive correlation, and for example, the closer the generation time of the sample conversation sentence is to the current time, the larger the fixed weight corresponding to the generation time is. Optionally, the generation time of the sample conversational sentence is in a linear relationship with the fixed weight, that is, the fixed weight is a linear weight, and the fixed weight increases linearly with the generation time of the sample conversational sentence from far to near.

Illustratively, the server counts the time interval of the generation time of the sample conversation sentences in the training samples, namely determines the time interval Ds between the earliest generation time Do and the latest generation time Dl; segmenting the time interval Ds according to the specified interval d to obtain a segmentation number Ns; determining Ns time periods between the earliest generation time Do and the latest generation time Dl; calculating a time period Is to which the generation time Dt of the sample conversation sentence belongs, wherein Dt belongs to [ Do, Dl ], that Is, Dt Is greater than or equal to Do and less than or equal to Dl, and Is equal to 0,1, … …, Ns-1; and calculating the fixed weight Wt corresponding to the generation time Dt based on the belonged time period Is and the value range [ Wb, We ] of the fixed weight.

For example, since the fixed weight is considered to participate in optimization in the loss function of the session intention recognition model, the weight of the fixed weight is not zero and is not greater than 1, for example, Wb is 0.3 and We is 1 in the value range of the fixed weight, i.e., the value range of the fixed weight is greater than or equal to 0.3 and less than or equal to 1.

Illustratively, the above calculation process of the fixed weight is expressed by using a formula as follows:

ds ═ Dl-Do + Do; - - - - -formula (1)

Ns ═ ceil (Ds/d); - - - - -formula (2)

Is ═ floor [ (Dl-Do)/d ]; - - - - -formula (3)

Wt ═ Wb + Is [ (We-Wb)/(Ns-1) ]; - - - - -formula (4)

Wherein do refers to a unit time of a specified interval d, for example, if d is a unit of day, do is 1 day, and if d is an unit of hour, do is 1 hour; ceil () means rounding up, i.e., rounding up the quotient of Ds divided by d; floor () refers to rounding down, i.e., the quotient of (Dl-Do) divided by d.

2) And searching the learning weight corresponding to the event list to which the generation time belongs from the corresponding relation table.

Optionally, after determining the time period Is to which the generation time Dt of the sample conversation sentence belongs, the server determines the weight vector VIs of the Ns-dimensional learning weight corresponding to the time period Is.

Optionally, the sample feature vector carries the generation time of the sample conversation sentence and the optimized learning weight, and the optimized learning weight is obtained by performing optimization learning on the learning weight in the model training process; the server updates the learning weight corresponding to the time period to which the generation of the R-1 th sample conversation sentence in the corresponding relation table belongs based on the R-1 th optimized learning weight corresponding to the R-1 th sample conversation sentence to obtain an updated corresponding relation table; and searching the updated corresponding relation table for learning weight corresponding to the time period to which the generation time of the R-th sample conversation sentence belongs, wherein R is an integer larger than 1.

Alternatively, if there is no learning weight corresponding to the time period to which the generation time of the R-th (or R-1-th) sample conversation sentence belongs in the correspondence table, a learning weight corresponding to the generation time of the sample conversation sentence is randomly generated.

Illustratively, time periods are divided based on the generation time of the sample conversation sentences, and the server randomly generates a weight vector V of the learning weight of the Ns dimension corresponding to each time period; for each round of training, the server searches a learning weight V corresponding to the generation time of the R-1 th sample conversation sentence from the corresponding relation table, and the learning weight V Is optimized in the process of model training of each sample conversation sentence, so that when the time period Is to which the sample conversation sentence belongs Is determined, the time period Is to which the sample conversation sentence belongs Is marked for each sample conversation sentence, b learning weights VIs _1, VIs _2, … … and VIs _ b are obtained after b learning weights corresponding to b sample conversation sentences in a batch are optimized, and the time period Is to which the generation time of the corresponding sample conversation sentence belongs Is marked for the b learning weights; and for each round of R-1 training, determining a classified sample feature vector corresponding to the R-1 th sample conversation sentence, wherein the classified sample feature vector carries a corresponding relation between a time period Is to which the generation time of the R-1 th sample conversation sentence belongs and an optimized learning vector VIs, updating the corresponding relation between the Is and the VIs into a corresponding relation table, and during each round of R-1 training, firstly, the server determines a learning weight VIs ' corresponding to the time period Is ' to which the generation time of the R-th sample conversation sentence belongs on the basis of the corresponding relation table, and when the optimized learning weight does not have the learning weight corresponding to the time period Is ', randomly generating a weight vector V of the learning weight of the dimension Ns of the sample conversation sentence, and further continuing model training.

For example, the learning weights V and VIs _1, VIs _2, … …, and VIs _ b may be calculated and updated during the optimization of the loss function, and in order to avoid the model simply reducing the weight value to obtain a lower loss, the learning weights need to be normalized, so as to illustrate the normalization of the learning weight V, where the formula is as follows:

i.e. the weights are normalized by a normalization (softmax) function, where R is^NsThe method refers to a value set of V, and dimensions of values in the value set are Ns.

For example, after determining the fixed weight and the learning weight, the server marks the fixed weight and the learning weight on a sample conversation sentence, and uses the sample conversation sentence marked with the fixed weight and the learning weight in model training, wherein the fixed weight and the learning weight are used for determining the loss of conversation intention.

Step 303, inputting the sample conversation sentence into a conversation intention recognition model to be trained for label learning and classification learning, and obtaining a sample feature vector carrying sample label data and sample classification data.

Illustratively, a server calls a conversation intention recognition model to be trained to perform vector conversion processing on b sample conversation sentences of a batch, each sample conversation sentence is marked with a fixed weight and a learning weight, the conversation intention recognition model to be trained performs vector conversion processing on each sample conversation sentence based on the fixed weight and the learning weight to obtain b sample sentence vectors, and meanwhile, optimization learning is performed on the b learning weights; and marking the time period to which the generation time belongs for each sample conversation sentence, marking the time period to which each sample sentence vector belongs, the fixed weight and the optimized learning weight in the vector conversion process, and continuously transmitting the time period to which each sample sentence vector belongs, the fixed weight and the optimized learning weight downwards. Alternatively, the optimized learning weight may be optimized based on a fixed weight.

The server calls a conversation intention recognition model to be trained, label data of b sample sentence vectors are labeled based on the fixed weight and the optimized learning weight, and a first feature vector is obtained; performing global feature learning on the b sample sentence vectors based on the fixed weights and the optimized learning weights to obtain second feature vectors; local feature learning is carried out on the b sample sentence vectors based on the fixed weight and the optimized learning weight, and a third feature vector is obtained; and generating a sample feature vector according to the first feature vector, the second feature vector and the third feature vector. Illustratively, in the process of generating the sample feature vector, the optimized learning weight is further optimized again, and then the belonged time period, the fixed weight and the optimized learning weight corresponding to each sample conversation sentence are continuously passed downwards.

The sample feature vector comprises k sample label data, the server calls a to-be-trained conversation intention recognition model to perform label classification on the k sample label data by combining a fixed weight and an optimized learning weight carried by the sample feature vector to obtain a classified sample feature vector, and k is a positive integer. In the tag classification process, the optimized learning weight is optimized again, and then the corresponding time period, the fixed weight and the optimized learning weight of each sample conversation sentence are continuously transmitted. For example, the optimized learning weight obtained this time is b learning weights VIs _1, VIs _2, … …, and VIs _ b obtained after optimizing the b learning weights corresponding to b sample conversation sentences of a batch.

Illustratively, the to-be-trained conversation intention recognition model further comprises a classifier, and the server calls the classifier to perform label classification on the sample feature vectors to generate the classified sample feature vectors. The classified sample feature vector still includes the k sample label data and also includes sample classification data.

Step 304, a session intent recognition loss is calculated based on the sample and reference label data and the sample and reference classification data.

The session intention recognition model to be trained further comprises a loss function, the server calls the loss function to calculate the label recognition loss between the reference label data and the sample label data, calculates the classification recognition loss between the reference classification data and the sample classification data, and further calculates the recognition loss of the session intention according to the label recognition loss and the classification recognition loss. Illustratively, the sum of the tag identification penalty and the classification identification penalty is determined as the identification penalty of the conversational intent.

Optionally, the server calls a to-be-trained conversation intention recognition model to add the label recognition loss and the classification recognition loss to obtain conversation intention recognition loss; alternatively, a weighted sum of the tag identification loss and the classification identification loss is calculated to obtain the session intent identification loss. Exemplarily, a first weight corresponding to the tag identification loss and a second weight corresponding to the classification identification loss are set in the conversation intention identification model to be trained; the server calls a conversation intention recognition model to be trained to calculate a first product of the label recognition loss and the first weight, calculates a second product of the classification recognition loss and the second weight, and determines the sum of the first product and the second product as the conversation intention recognition loss.

Optionally, the server further calculates a tag identification loss according to the reference tag data and the sample tag data; calculating classification identification loss according to the reference classification data and the sample classification data; the recognition loss of the conversation intention is calculated according to the tag recognition loss, the classification recognition loss, the fixed weight, and the school weight.

Illustratively, the classified sample feature vector includes an optimized learning weight. Illustratively, for the calculation of the tag identification loss and the classification identification loss corresponding to each sample conversation sentence in the b sample conversation sentences, a conversation intention identification model to be trained in the server generates a first tag matrix of sample tag data according to the classified sample feature vector, and calculates the tag identification loss (Label _ loss) based on a second tag matrix corresponding to the first tag matrix and the reference tag data; calculating a classification recognition loss (Classifier _ loss) according to the reference classification data and the sample classification data; and finally calculating b label identification losses and b classification identification losses corresponding to the b sample conversation sentences.

Illustratively, the conversation intention recognition model to be trained calculates the sum of the label recognition Loss and the classification recognition Loss corresponding to the h sample conversation sentence to obtain the sample Loss of the h sample conversation sentence_hFinally obtaining b sample losses; learning weight based on optimization

Calculating the weighted average Loss of the b sample losses to obtain the average Loss; meanwhile, the conversation intention recognition model to be trained also calculates the variance between the fixed weight Wt _ h of the b sample conversation sentences and the optimized learning weight VIs _ h, namely the weight Loss_W(ii) a To-be-trained conversation intention recognition model calculation hyper-parameter lambda and weight Loss_WA product of the average Loss, determining a sum of the product and the average Loss as a session intent identification Loss'; wherein h is a positive integer less than or equal to b.

Illustratively, the above calculation process of the recognition loss of the conversation intention is expressed by using a formula as follows:

Loss’＝Loss+λ*Loss_W；------(8)

wherein, the corner mark h represents the h-th of the b; the hyper-parameter lambda is preset in a conversation intention recognition model to be trained;

refers to the normalized VIs _ h.

And 305, carrying out propagation training on the session intention recognition model to be trained according to the session intention recognition loss, and finally obtaining the trained session intention recognition model.

And calling the conversation intention recognition model to be trained by the server, carrying out propagation training from the sentence encoder based on the recognition loss of the conversation intention, and adjusting model parameters in the conversation intention recognition model to be trained. Illustratively, the server performs back propagation training on the session intention recognition model to be trained based on the session intention recognition loss, adjusts model parameters in the session intention recognition model to be trained, and finally obtains the trained session intention recognition model.

In summary, in the training method for the conversation intention recognition model provided in this embodiment, the historical sample conversation sentences and the newly added sample conversation sentences are used as the training samples of the conversation intention recognition model to be trained, so that the model can learn the original knowledge based on the historical sample conversation sentences and can also learn the new knowledge based on the newly added sample conversation sentences; secondly, different weights are given to the sample conversation sentences according to the time sequence in the training process so as to highlight the importance degree of the sample conversation sentences at different times, so that the model can more accurately learn new meanings of new added knowledge, original knowledge and original knowledge, and further the model can more accurately master the real conversation intention of the user conversation in the application process.

The conversation intention recognition model to be trained comprises a sentence encoder and a conversation encoder, the server performs label learning and classification learning on the sample conversation sentences by using the sentence encoder and the conversation encoder, and the process of the label learning and classification learning on the sample conversation sentences is described in detail based on fig. 4, as shown in fig. 5, step 303 may include steps 3031 to 3032, as follows:

step 3031, calling a sentence encoder to perform vector conversion processing on the sample conversation sentences to obtain sample sentence vectors.

Illustratively, a conversation intention recognition model to be trained is built in the server, and the server calls the conversation intention recognition model to be trained to perform sentence-level coding processing on an input sample conversation sentence to obtain a sample sentence vector.

Optionally, the sentence encoder comprises a word embedding layer, an encoding-decoding layer, and a self-attention layer; when the server calls the conversation intention recognition model, a word embedding layer is called to carry out word embedding on words in the sample conversation sentences to obtain sample word vectors; calling a coding-decoding layer to carry out coding and decoding processing on the sample word vector to obtain a sample sentence vector; and calling a self-attention layer to encode the sample sentence vector to obtain a final sample sentence vector.

Illustratively, the encoding-decoding layer in the session intention recognition model employs a transform model. As shown in fig. 6, the session intention recognition model includes a sentencelencoder 50 and a sessionencor 60, the sentencelencoder 50 includes a word embedding layer 51, a transform layer 52, and a self-entry layer 53; wherein, the output terminal of the word layer 51 is connected with the input terminal of the transform layer 52, the output terminal of the transform layer 52 is connected with the input terminal of the self-addressing layer 53, and the output terminal of the self-addressing layer 53 is connected with the sessionencoder 60; in the process of coding the sample conversation sentences, after the server carries out word segmentation processing on the sample conversation sentences, words in the sample conversation sentences are input from the input end of the word building layer 51, and the word building layer 51 carries out word embedding on the words in the sample conversation sentences to obtain sample word vectors; inputting the sample word vectors into the transform layer 52, and encoding and decoding the sample word vectors by the transform layer 52 to obtain sample sentence vectors of the sample conversation sentences; the sample sentence vector is input to the self-attention layer 53 for self-attention learning, and a final sample sentence vector is obtained.

Illustratively, in the process of training the conversation intention recognition model, each round of training inputs a batch of b sample conversation sentences into the conversation intention recognition model for training, and simultaneously, a batch of b sample sentence vectors is adopted for training the conversation intention recognition model; and calling a conversation intention recognition model to be trained by the server to perform vector conversion processing on the sample conversation sentences of the r-th batch to obtain b sample sentence vectors corresponding to the b sample conversation sentences in the r-th batch, wherein r and b are positive integers.

Step 3032, calling a conversation encoder to perform conversation level encoding processing on the sample sentence vector to obtain a sample feature vector.

The server calls a conversation intention recognition model to be trained to perform conversation level coding processing on sample sentence vectors, wherein the conversation level refers to a level which takes sentences as dialogue processing during coding processing, illustratively, the conversation level refers to a level which codes and learns the sample conversation sentences under the condition that conversation scenes are constructed by the first j sample conversation sentences, the first j sample conversation sentences and the sample conversation sentences have an incidence relation, and j is a positive integer; and finally, obtaining a sample characteristic vector carrying the sample label data and the sample classification data.

Optionally, the server calls a conversation intention recognition model to be trained to splice b sample sentence vectors in the r-th batch to obtain spliced vectors of the r-th batch, and then performs conversation level coding processing on the spliced vectors of the r-th batch to obtain sample feature vectors.

Optionally, the server calls a session encoder to label sample sentence vectors with sample tag data at a word level and a sentence level in sequence to obtain a first feature vector; calling a conversation encoder to sequentially perform word level and sentence level global feature learning on the sample sentence vector to obtain a second feature vector; calling a conversation encoder to sequentially perform word level and sentence level local feature learning on the sample sentence vector to obtain a third feature vector; and generating a sample feature vector according to the first feature vector, the second feature vector and the third feature vector.

Illustratively, the server calls a conversation intention recognition model to be trained, and labels data of each sample sentence vector in the splicing vectors of the r-th batch to obtain a first feature vector; global feature learning of each sample sentence vector is carried out on the whole of the spliced vectors of the r-th batch to obtain a second feature vector; and local feature learning of each sample sentence vector is carried out aiming at the adjacent sample sentence vector of each sample sentence vector to obtain a third feature vector. For example, the server calls the session intention recognition model to be trained to combine the first feature vector, the second feature vector and the third feature vector to generate a sample feature vector, and the combination mode may be a vector splicing mode, a vector adding mode or other modes.

In order to fully learn the semantics of words and sentences in the sample sentence vector, the sample sentence vector can be learned by adopting an HAN (hierarchical artificial intelligence) architecture, and optionally, the conversation intention recognition model to be trained comprises a label embedding attention layer, a bidirectional gating circulation unit and a convolutional neural network layer; when the sample feature vectors are coded, the server calls a conversation intention recognition model to be trained, calls a tag embedding attention layer, and labels sample sentence vectors in sequence with sample tag data of word level and sentence level to obtain first feature vectors; calling a bidirectional gating circulation unit, and sequentially performing word level and sentence level global feature learning on the sample sentence vector to obtain a second feature vector; and calling a convolutional neural network layer, and sequentially carrying out word level and sentence level local feature learning on the sample sentence vector to obtain a third feature vector.

Illustratively, as shown in FIG. 6, sessionencoder 60 includes a label-sententidentition layer 61, a Bid-GRU 62, and a CNN layer 63; wherein the output of self-actuation layer 53 is connected to the input of label-sensing actuation layer 61, the input of Bid-GRU 62, and the input of CNN layer 63, respectively. In the process of learning the sample sentence vector, the label-sentenceaction layer 61 sequentially learns the word level and the sentence level of the sample sentence vector, learns the semantics of the important words and the important sentences in the sample sentence vector, and labels the sample label data of the sample sentence vector based on the semantics of the important words and the important sentences. The Bid-GRU 62 performs word-level and sentence-level global feature learning on the sample sentence vector in sequence to obtain a global feature vector, i.e., a second feature vector. The CNN layer 63 sequentially performs word-level and sentence-level local feature learning on the sample sentence vector to obtain a local feature vector, i.e., a third feature vector.

Illustratively, the label-sentenceaction layer 61 learns the text and the tags in the same space by adopting an architecture setting of an LEAM, and constructs a text representation by utilizing the correlation between the text and the tags.

Illustratively, the sample feature vector includes k sample label data, the server calls a to-be-trained session intention recognition model to perform label classification on the sample feature vector based on the k sample label data to obtain a classified sample feature vector, and k is a positive integer. The classified sample feature vector still includes the k sample label data and also includes sample classification data.

Illustratively, the to-be-trained conversation intention recognition model further comprises a classifier, and the server calls the classifier to perform label classification on the sample feature vectors to generate the classified sample feature vectors.

To sum up, in the training method of the conversational intent recognition model provided in this embodiment, the conversational intent recognition model to be trained performs the coding process of the conversational hierarchy based on the sentence coding vector of the conversational sentence, that is, the sentence coding vector is labeled with the sample tag data to obtain the conversational coding vector, the conversational intent recognition loss of the classified coding vector is calculated based on the sample tag data and the reference tag data, and the sample classification data and the reference classification data, the model parameters are adjusted by using the conversational intent recognition loss, so that the trained conversational intent recognition model can more accurately grasp the whole meaning of the conversational sentence, and the tag data labeled on the sentence coding vector more conforms to the semantics of the sentence, namely, the label data carried by the conversation feature vector obtained by coding better conforms to the semantics of the conversation sentence, and further, the label classification based on the conversation feature vector is more accurately realized so as to identify the real conversation intention of the user conversation.

The above-mentioned conversation intention recognition model to be trained can also carry out the coding learning of text-label, global feature and local feature at the same time, generate three feature vectors, and then generate the conversation feature vector based on the above-mentioned three feature vectors, so that the conversation feature vector can learn the conversation intention of the user conversation more fully. Especially for multiple rounds of conversations, global feature learning and local feature learning can be carried out for the multiple rounds of conversations, and the meaning of a conversation sentence can be grasped more accurately; for example, as shown in fig. 1, the first conversation is "is item a recently discounted activity? ", the first reply is" present ". "; the second dialog is "what discount activities are all? "the second reply is given in conjunction with the first conversation as" item a participates in discount policy 1, discount policy 2, discount policy 3. ". Moreover, when the to-be-trained conversation intention recognition model is used for coding learning of text-labels, global features and local features, the HAN architecture is adopted, sentence semantics can be learned from two levels of words and sentences, and the trained conversation intention recognition model can grasp the meaning of conversation sentences more accurately.

For example, the data of the latest day on the service platform is used as a model verification sample, the other data on the service platform is used as a model training sample, and the to-be-trained session intention recognition model provided by the present application is trained and verified in a non-weighted, linear-weighted (fixed weight only) and adaptive-weighted (fixed weight and learning weight combined) manner, so as to obtain a verification result shown in table one:

watch 1

Data processing mode	Accuracy	F1
			Without weighting	0.70	0.68
Linear weighting	0.72	0.70
			Adaptive weighting	0.73	0.71

It is found from the above table that when the adaptive weighting method is adopted, the Accuracy (Accuracy) and F1 are improved compared to the other two methods, where F1 is the value of the model estimate value (F-score) when the parameter value in the model estimate formula is 1.

Fig. 7 is a block diagram of a session intention recognition model training apparatus provided in an exemplary embodiment of the present application, which may be a part or all of a server in the form of software, hardware, or a combination of the two, and the apparatus includes:

an obtaining module 401, configured to obtain sample conversation sentences from a database, where the sample conversation sentences include historical sample conversation sentences and newly-added sample conversation sentences, and the sample conversation sentences carry reference tag data, reference classification data, and generation time of the conversation sentences;

a generating module 402, configured to determine a fixed weight and a learning weight corresponding to a generating time, and generate a sample conversation sentence marked with the fixed weight and the learning weight, where both the fixed weight and the learning weight are used to indicate an importance degree of the sample conversation sentence to model training;

a learning module 403, configured to input sample conversational sentences into a conversational intent recognition model to be trained for tag learning and classification learning, so as to obtain sample feature vectors carrying sample tag data and sample classification data;

a learning module 403, configured to calculate a recognition loss of the conversation intention according to the sample label data and the reference label data, and the sample classification data and the reference classification data;

the learning module 403 is configured to perform propagation training on the session intention recognition model to be trained according to the recognition loss, and finally obtain a trained session intention recognition model.

In some embodiments, there is a mapping relationship between time and fixed weight, wherein the mapping relationship has a positive correlation between time and fixed weight, and a correspondence table between time periods and learning weights;

a generating module 402, configured to calculate a fixed weight corresponding to a generation time based on the mapping relation; and searching the learning weight corresponding to the time period to which the generation time belongs from the corresponding relation table.

In some embodiments, the sample feature vector carries the generation time and the optimized learning weight, and the optimized learning weight is obtained by performing optimized learning on the learning weight in the model training process;

a generating module 402, configured to update a learning weight corresponding to a time period to which generation of an R-1 th sample conversation sentence in the correspondence table belongs based on an R-1 th optimized learning weight corresponding to the R-1 th sample conversation sentence, to obtain an updated correspondence table; searching a learning weight corresponding to the time period to which the generation time of the R-th sample conversation sentence belongs from the updated corresponding relation table; r is an integer greater than 1.

In some embodiments, a learning module 403 for calculating a tag identification loss from the reference tag data and the sample tag data; calculating classification identification loss according to the reference classification data and the sample classification data; the recognition loss of the conversation intention is calculated from the tag recognition loss, the classification recognition loss, the fixed weight, and the learning weight.

In some embodiments, the conversational intent recognition model to be trained includes a sentence encoder and a conversation encoder;

a learning module 403, configured to invoke a sentence encoder to perform vector conversion processing on a sample conversation sentence, so as to obtain a sample sentence vector; calling a conversation encoder to label sample sentence vectors in sequence with sample label data at a word level and a sentence level to obtain a first feature vector; calling a conversation encoder to sequentially perform word level and sentence level global feature learning on the sample sentence vector to obtain a second feature vector; calling a conversation encoder to sequentially perform word level and sentence level local feature learning on the sample sentence vector to obtain a third feature vector; and generating a sample feature vector according to the first feature vector, the second feature vector and the third feature vector.

In summary, in the training apparatus for a conversation intention recognition model provided in this embodiment, the historical sample conversation sentences and the newly added sample conversation sentences are used as training samples of the conversation intention recognition model to be trained, so that the model can learn the original knowledge based on the historical sample conversation sentences and can also learn the new knowledge based on the newly added sample conversation sentences; secondly, different weights are given to the sample conversation sentences according to the time sequence in the training process so as to highlight the importance degree of the sample conversation sentences at different times, so that the model can more accurately learn new meanings of new added knowledge, original knowledge and original knowledge, and further the model can more accurately master the real conversation intention of the user conversation in the application process.

Fig. 8 is a block diagram of a session intention recognition apparatus provided in an exemplary embodiment of the present application, which may be a part or all of a server in the form of software, hardware, or a combination of the two, and the apparatus includes:

an embedding module 501, configured to perform vector conversion processing on a conversation sentence to obtain a sentence coding vector;

the encoding module 502 is configured to perform encoding processing on a sentence encoding vector to obtain a conversation feature vector, where the conversation level is a processing level for performing vector encoding based on a conversation environment, and the conversation feature vector carries tag data;

a classifying module 503, configured to perform label classification on the session vector features based on the label data to obtain a classified session feature vector, where the classified session feature vector is used to indicate a session intention of a session sentence.

In summary, the apparatus for recognizing a conversation intention according to this embodiment calls a conversation intention recognition model to recognize a conversation intention expressed by a conversation sentence, the conversation intention recognition model is trained by using a history sample conversation sentence and a newly added sample conversation sentence as training samples, and different weights are given to the sample conversation sentences according to a time sequence in a training process to highlight importance levels of the sample conversation sentences at different times, so that the model can more accurately learn new knowledge and new meanings of the original knowledge, and the model can more accurately grasp the conversation intention of a user included in the conversation sentence when applied.

Fig. 9 shows a schematic structural diagram of a server according to an exemplary embodiment of the present application. The server may be the server 120 in the computer system 100 shown in fig. 1.

The server 600 includes a Central Processing Unit (CPU) 601, a system Memory 604 including a Random Access Memory (RAM) 602 and a Read Only Memory (ROM) 603, and a system bus 605 connecting the system Memory 604 and the Central Processing Unit 601. The server 600 also includes a basic Input/Output System (I/O System)606, which facilitates the transfer of information between devices within the computer, and a mass storage device 607, which stores an operating System 613, application programs 614, and other program modules 615.

The basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse, keyboard, etc. for user input of information. Wherein a display 608 and an input device 609 are connected to the central processing unit 601 through an input output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 610 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the server 600. That is, mass storage device 607 may include a computer-readable medium (not shown) such as a hard disk or Compact Disc Read Only Memory (CD-ROM) drive.

Computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Solid State Memory technology, CD-ROM, Digital Versatile Disks (DVD), or Solid State Drives (SSD), other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 described above may be collectively referred to as memory.

According to various embodiments of the present application, the server 600 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the server 600 may be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

In an alternative embodiment, a computer device is provided that includes a processor and a memory, the memory having at least one instruction, at least one program, set of codes, or set of instructions stored therein, the at least one instruction, at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement the conversation intention recognition method and the training method of the conversation intention recognition model as described above.

In an alternative embodiment, a computer-readable storage medium is provided that has at least one instruction, at least one program, set of codes, or set of instructions stored therein, which is loaded and executed by a processor to implement the conversation intention recognition method and the training method of the conversation intention recognition model as described above.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are for description only and do not represent the merits of the embodiments.

Embodiments of the present application also provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and the computer instructions are executed by the processor to cause the computer device to perform the conversation intention recognition method and the training method of the conversation intention recognition model as described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is intended to be exemplary only, and not to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included therein.

Claims

1. A training method of a conversation intention recognition model, the method comprising:

acquiring sample conversation sentences from a database, wherein the sample conversation sentences comprise historical sample conversation sentences and newly added sample conversation sentences, and the sample conversation sentences carry reference label data, reference classification data and conversation sentence generation time;

determining a fixed weight and a learning weight corresponding to the generation time, and generating the sample conversation sentence marked with the fixed weight and the learning weight, wherein the fixed weight and the learning weight are both used for indicating the importance degree of the sample conversation sentence on model training;

inputting the sample conversation sentence into a conversation intention recognition model to be trained for label learning and classification learning to obtain a sample characteristic vector carrying sample label data and sample classification data;

calculating a recognition loss of conversation intent from the sample label data and the reference label data, and the sample classification data and the reference classification data;

2. The method according to claim 1, wherein there are a mapping relation between time and fixed weight in which time and fixed weight have a positive correlation, and a correspondence table between time period and learning weight;

the determining a fixed weight and a learning weight corresponding to the generation time includes:

calculating the fixed weight corresponding to the generation time based on the mapping relation;

and searching the learning weight corresponding to the time period to which the generation time belongs from the corresponding relation table.

3. The method according to claim 2, wherein the sample feature vector carries the generation time and an optimized learning weight, and the optimized learning weight is obtained by performing optimized learning on the learning weight in a model training process;

the searching the learning weight corresponding to the time period to which the generation time belongs from the corresponding relation table includes:

updating the learning weight corresponding to the time period to which the generation of the R-1 th sample conversation sentence belongs in the corresponding relation table based on the R-1 th optimized learning weight corresponding to the R-1 th sample conversation sentence to obtain an updated corresponding relation table;

searching a learning weight corresponding to the time period to which the generation time of the R-th sample conversation sentence belongs from the updated corresponding relation table; r is an integer greater than 1.

4. The method of any one of claims 1 to 3, wherein said calculating a recognition loss of conversational intent from said sample label data and said reference label data, and said sample classification data and said reference classification data, comprises:

calculating a tag identification loss according to the reference tag data and the sample tag data;

calculating a classification recognition loss from the reference classification data and the sample classification data;

calculating the recognition loss of conversation intent from the tag recognition loss, the classification recognition loss, the fixed weight, and the learning weight.

5. The method of any of claims 1 to 3, wherein the conversational intent recognition model to be trained comprises a sentence coder and a conversation coder;

the step of inputting the sample conversation sentence into a conversation intention recognition model to be trained for label learning and classification learning to obtain a sample feature vector carrying sample label data and sample classification data includes:

calling the sentence encoder to perform vector conversion processing on the sample conversation sentences to obtain sample sentence vectors;

calling the conversation encoder to label the sample sentence vectors in sequence with the sample label data at a word level and a sentence level to obtain the first characteristic vector;

calling the conversation encoder to sequentially perform global feature learning of the word level and the sentence level on the sample sentence vector to obtain a second feature vector;

calling the conversation encoder to sequentially perform word level and sentence level local feature learning on the sample sentence vector to obtain a third feature vector;

generating the sample feature vector according to the first feature vector, the second feature vector and the third feature vector.

6. A conversation intention recognition method applied to a computer device provided with the conversation intention recognition model according to claim 1, the method comprising:

calling the conversation intention recognition model to perform vector conversion processing on a conversation sentence to obtain a sentence coding vector;

calling the conversation intention recognition model to perform coding processing on a conversation level on the sentence coding vector to obtain a conversation characteristic vector, wherein the conversation level refers to a processing level for performing vector coding based on a conversation environment, and the conversation characteristic vector carries tag data;

and calling the conversation intention recognition model to perform label classification on the conversation vector characteristics based on the label data to obtain a classified conversation characteristic vector, wherein the classified conversation characteristic vector is used for indicating the conversation intention of the conversation sentence.

7. An apparatus for training a conversational intent recognition model, the apparatus comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring sample conversation sentences from a database, the sample conversation sentences comprise historical sample conversation sentences and newly added sample conversation sentences, and the sample conversation sentences carry reference tag data, reference classification data and generation time of the conversation sentences;

a generating module, configured to determine a fixed weight and a learning weight corresponding to the generation time, and generate the sample conversation sentence marked with the fixed weight and the learning weight, where both the fixed weight and the learning weight are used to indicate an importance degree of the sample conversation sentence to model training;

the learning module is used for inputting the sample conversation sentences into a conversation intention recognition model to be trained for label learning and classification learning to obtain sample characteristic vectors carrying sample label data and sample classification data;

the learning module is used for calculating the recognition loss of the conversation intention according to the sample label data and the reference label data and the sample classification data and the reference classification data;

and the learning module is used for carrying out propagation training on the session intention recognition model to be trained according to the recognition loss to finally obtain the trained session intention recognition model.

8. An apparatus for recognizing a conversation intention, the apparatus comprising:

the coding module is used for coding the sentence coding vector to obtain a conversation feature vector, wherein the conversation hierarchy is a processing hierarchy for vector coding based on a conversation environment, and the conversation feature vector carries tag data;

and the classification module is used for performing label classification on the conversation vector features based on the label data to obtain classified conversation feature vectors, and the classified conversation feature vectors are used for indicating the conversation intention of the conversation sentences.

9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the training method of the conversation intention recognition model according to any one of claims 1 to 5 or the conversation intention recognition method according to claim 6.

10. A computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of training a conversational intent recognition model according to any of claims 1 to 5 or the method of conversational intent recognition according to claim 6.