CN111640425B

CN111640425B - Model training and intention recognition method, device, equipment and storage medium

Info

Publication number: CN111640425B
Application number: CN202010444204.XA
Authority: CN
Inventors: 王晶; 彭程; 罗雪峰; 王健飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2023-08-15
Anticipated expiration: 2040-05-22
Also published as: CN111640425A

Abstract

The application discloses a model training and intention identifying method, device, equipment and storage medium, and relates to the technical field of artificial intelligence. The model training method comprises the following steps: performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model; the training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network; taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks, wherein the target networks comprise a characteristic recognition network and the prediction layer network; the feature recognition network at least comprises a bottom layer network; extracting target knowledge of a training task data set through a target network of the reinforcement model; and training the distillation model according to the target knowledge and the training task data set to obtain a target learning model so as to improve the efficiency and accuracy of target learning model prediction.

Description

Model training and intention recognition method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an artificial intelligence technology.

Background

With the development of artificial intelligence technology, deep learning models are increasingly widely applied in the field of man-machine interaction. The pre-training model is used as a deep learning model, has a complex structure and huge model parameters, so that the pre-training model can consume long time and has low speed in the operation stage. In order to improve the response speed of the pre-training model, in the prior art, a developer is usually required to manually select a network layer with a smaller weight value from the pre-training model, and cut the network layer from the pre-training model, so that the compression of the pre-training model is realized, and the structural complexity of the pre-training model is reduced. However, the pre-training model cut by the method is greatly influenced by human factors, has low accuracy, seriously influences the human-computer interaction effect, and needs improvement.

Disclosure of Invention

A model training and intention recognition method, apparatus, device and storage medium are provided.

According to a first aspect, there is provided a knowledge distillation based model training method, the method comprising:

performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model; the training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network, and the pre-training model comprises the bottom layer network, at least one middle-high layer network and the prediction layer network from bottom to top;

Taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks, wherein the target networks comprise a characteristic recognition network and the prediction layer network; the feature recognition network at least comprises the bottom layer network;

extracting target knowledge of the training task data set through a target network of the reinforcement model;

and training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

According to a second aspect, there is provided an intention recognition method comprising:

acquiring user voice data acquired by man-machine interaction equipment;

inputting the user voice data into a target learning model to obtain a user intention recognition result output by the target learning model; the target learning model is determined based on training of the knowledge distillation-based model training method according to any embodiment of the application;

and determining a response result of the man-machine interaction device according to the user intention recognition result.

According to a third aspect, there is provided a knowledge distillation based model training apparatus, the apparatus comprising:

The precipitation training module is used for carrying out precipitation training on the pre-training model at least twice according to the training task data set to obtain an enhanced model of the pre-training model; the training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network, and the pre-training model comprises the bottom layer network, at least one middle-high layer network and the prediction layer network from bottom to top;

the distillation model construction module is used for taking at least two networks in the reinforcement model as target networks and constructing a distillation model according to the target networks, wherein the target networks comprise a characteristic recognition network and the prediction layer network; the feature recognition network at least comprises the bottom layer network;

the target knowledge extraction module is used for extracting target knowledge of the training task data set through a target network of the reinforcement model;

and the distillation model training module is used for training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

According to a fourth aspect, there is provided an intention recognition apparatus comprising:

The voice data acquisition module is used for acquiring user voice data acquired by the man-machine interaction equipment;

the intention recognition module is used for inputting the user voice data into a target learning model so as to acquire a user intention recognition result output by the target learning model; the target learning model is determined based on training of the knowledge distillation-based model training method according to any embodiment of the application;

and the response result determining module is used for determining a response result of the man-machine interaction device according to the user intention recognition result.

According to a fifth aspect, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a knowledge distillation based model training method or an intent recognition method in accordance with any of the embodiments of the present application.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions. The computer instructions are for causing the computer to perform a knowledge distillation based model training method or an intent recognition method according to any of the embodiments of the present application.

According to the method and the device for compressing the pre-training model, the problem of low accuracy of the prior art of manually compressing the pre-training model is solved, and the high-precision target learning model can be trained through low-cost automatic compression so as to improve the man-machine interaction effect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1A is a flow chart of a knowledge-based distillation model training method, provided in accordance with an embodiment of the present application;

FIG. 1B is a schematic diagram of a network structure of a pre-training model according to an embodiment of the present application;

FIG. 2 is a flow chart of another knowledge-based distillation model training method provided in accordance with an embodiment of the application;

FIG. 3 is a flow chart of another knowledge-based distillation model training method provided in accordance with an embodiment of the application;

FIGS. 4-5 are flowcharts of two knowledge-based model training methods provided in accordance with an embodiment of the application;

FIG. 6A is a flow chart of another knowledge-based distillation model training method provided in accordance with an embodiment of the application;

FIG. 6B is a schematic diagram of a distillation model training principle according to an embodiment of the present application;

FIG. 7 is a flow chart of a knowledge-based distillation model training method, provided in accordance with an embodiment of the present application;

FIG. 8 is a flow chart of an intent recognition method provided in accordance with an embodiment of the present application;

fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an intent recognition device according to an embodiment of the present application;

FIG. 11 is a block diagram of an electronic device for implementing a knowledge-based model training method or intent recognition method in accordance with an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1A is a flow chart of a knowledge-based distillation model training method, provided in accordance with an embodiment of the present application; fig. 1B is a schematic diagram of a network structure of a pre-training model according to an embodiment of the present application. The method and the device are suitable for the condition that the pre-training model with a complex network structure is compressed and trained into the target learning model with a simple network structure based on the knowledge distillation technology. This embodiment may be performed by a knowledge distillation based model training apparatus configured in an electronic device, which may be implemented in software and/or hardware. As shown in fig. 1A-1B, the method includes:

s101, performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model.

The task training data set in the embodiment of the application can be a prediction task to be executed according to a pre-training model, and sample data related to the prediction task is obtained as the training task data set. For example, if the predictive task to be executed by the pre-training model is to identify the intention of the user voice data in the shopping platform a, all the historical user voice data in the shopping platform a may be obtained at this time, and relevant processing (such as labeling, deleting invalid data, etc.) may be performed to obtain a training task data set corresponding to the predictive task.

The pre-training model of the embodiment of the application can be built based on a deep learning architecture, is a high-precision model which is trained by mass data and can execute a certain learning task, and generally has the characteristics of deeper network layers, wider dimensionality of each layer of network, more model parameters and the like. The training model may be trained by the user by using a large amount of sample data, or may be obtained directly from a pre-training model database, which is not limited in this embodiment. Optionally, the pre-training model may include a bottom-up network, at least one middle-high layer network, and a predictive layer network. The bottom layer network and the middle-high layer network are used for carrying out feature recognition; the prediction layer network is used for carrying out task prediction according to the identified characteristics. Among other things, the underlying network is typically used to identify simple features, and the middle-higher network is typically used to abstract complex features from simple features. For example, if the pre-trained model is a bert model for intent recognition, the underlying network of the bert model is typically used to recognize simpler grammatical features; medium and high level networks are typically used to abstract complex features from grammatical features. The prediction layer network is used for carrying out task prediction according to the features identified by the bottom layer network and the middle-high layer network. Alternatively, the pre-training model of embodiments of the present application may be a bert model.

Illustratively, the pre-training model 1 shown in fig. 1B is composed of 12 network layers, wherein the 1 st to 3 rd network layers are the bottom layer network 10, the 4 th to 11 th network layers are the middle and high layer networks 11, and the 12 th network layer is the prediction layer network 12, wherein the middle and high layer networks 11 further comprise the middle layer network 110 (i.e., the 4 th to 7 th network layers) and the high layer network 111 (i.e., the 8 th to 11 th network layers).

Alternatively, in general, complex features of the middle-high level network abstraction of the pre-training model have low correlation with the prediction task itself, and the prediction task needs to be accurately completed, mainly depending on the underlying network. Therefore, the operation can perform multiple precipitation training on the bottom network of the pre-training model, and continuously adjust the training object (namely the network layer needing training in the pre-training model) in the process of multiple precipitation training. The training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network. That is, although the present embodiment focuses on performing precipitation training on the bottom layer network of the pre-training model, in order to ensure accuracy of the training result, the object to be trained at least includes the bottom layer network and the prediction layer network, and for the middle-high layer network, the number of layers of the middle-high layer network included in the training object decreases as the number of precipitation training increases. For example, assuming that the bottom layer network 10 of the pre-training model 1 shown in fig. 1B is subjected to five precipitation training, the bottom layer network 10 and the prediction layer network 12 are included in the training object of the five precipitation training, and for the middle layer network, all network layers may be included in the training object of the first precipitation training; the training object of the second precipitation training may be decremented to include network layers 4 through 9; the training object for the third precipitation training may be decremented again to include the 4 th network layer through the 7 th network layer, successively decrementing, and by the time of the fifth precipitation training, the training object may have been decremented to not include the middle-higher network 11.

Specifically, when the training task data set is used for carrying out at least two deep exercises on the bottom layer network of the pre-training model, a part of training task data set is input into the pre-training model each time, the part of training task data set is used for carrying out one-time precipitation training on the training object, and then the pre-training model subjected to multiple precipitation training is used as the strengthening model. Because the middle and high-level networks in the training object in the step gradually decrease along with the increase of training times, the operation can carry out more accurate training update on the bottom-level network along with the increase of precipitation training times, so that the parameters of the bottom-level network are more and more accurate. That is, compared with the pre-training model, the network structure of the reinforced model in this embodiment is not changed, if the pre-training model is a bert model, the reinforced model after precipitation training is also a bert model, and only the network parameters of the underlying network are more accurate.

S102, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

The target network may be a network selected from networks included in the reinforcement model and required for completing the current prediction task. The target network comprises a characteristic identification network and a prediction layer network; the feature recognition network is a network for performing feature recognition, and in the embodiment of the present application, the feature recognition network at least includes a bottom layer network. Alternatively, the feature recognition network may include a part or all of a middle-high layer network in addition to the bottom layer network, which is not limited in this embodiment.

Optionally, in this embodiment, at least two networks may be selected from the reinforcement model as the target network, where if two networks are selected, the two networks are a bottom layer network and a prediction layer network of the reinforcement model, and at this time, the feature recognition network in the target network only includes the bottom layer network; if three or more networks are selected, the rest of networks can be selected from the middle and high networks based on the selection of the bottom layer network and the prediction layer network, and the characteristic network in the target network comprises at least one middle and high network besides the bottom layer network. It should be noted that whether the middle-high level network of the reinforcement model is used as the feature recognition network of the target network may depend on factors such as the actual prediction task and the type of target knowledge to be extracted later. This embodiment is not limited.

Alternatively, when the distillation model is constructed according to the target network, the distillation model also including the target network may be constructed according to the target network. The type of the target network of the distillation model constructed in this step is the same as the type of the target network of the reinforcement model. Specifically, the distillation model also comprises a prediction layer network and a feature recognition network, and if the feature recognition network selected from the reinforced model comprises only a bottom layer network, the feature recognition network of the constructed distillation model also comprises only the bottom layer network; if the feature recognition network selected from the reinforcement model comprises not only a bottom layer network but also a middle layer network in a middle-high layer network, the feature recognition network of the constructed distillation model also comprises the bottom layer network and the middle layer network.

Alternatively, when the distillation model is constructed according to the target network, the operation may be to combine the network layer structure of the target network of the reinforcement model to construct a distillation model having the same structure as the reinforcement model, i.e. an isomorphic model of the reinforcement model. For example, if the reinforcement model is a bert model, the distillation model constructed is a bert model containing only the target network structure in the reinforcement model. The distillation model may also be constructed differently from the reinforcement model, but also includes the target network type of the reinforcement model, i.e., the heterogeneous model of the reinforcement model. For example, the reinforcement model is a bert model, and the distillation model is a CNN model, but the CNN model also includes the same type of target network as the reinforcement model. The method of constructing the isomorphic or heterogeneous distillation model will be described in detail in the examples that follow.

It should be noted that the distillation model constructed by the operation can be a machine learning model or a small model based on a neural network, and the distillation model has the characteristics of few parameters, high reasoning speed and good portability.

S103, extracting target knowledge of the training task data set through a target network of the reinforcement model.

The target knowledge can be a result obtained after the target network in the reinforcement model processes the training task data set, and the target knowledge is used for being subsequently injected into the distillation model and used as a supervision signal when the distillation model is trained.

Optionally, when extracting the target knowledge of the training task data set, the step may be to use the training task data set as an input of the enhancement model, obtain a first data feature representation output by a feature recognition network of the enhancement model, and obtain a first prediction probability representation output by a prediction layer network of the enhancement model; and taking the acquired first data characteristic representation and the first prediction probability representation as target knowledge of the training task data set. Specifically, the training task data set may be divided into multiple parts according to a preset size, such as a batch_size. And inputting each piece of divided training task data into the strengthening model, operating the strengthening model, and obtaining the characteristic representation output by the characteristic recognition network of the strengthening model as a first data characteristic representation. If the feature recognition network only has the bottom network, the first data feature representation is only the feature representation output by the bottom network; if the feature recognition network comprises a bottom layer network and a part of a middle and high layer network, the first data feature representation comprises not only the feature representation output by the bottom layer network but also the feature representation output by the part of the middle and high layer network. And obtaining a characteristic representation, such as a prediction probability value, output by a prediction layer network of the reinforcement model as a first prediction probability representation, and further using the obtained first data characteristic representation and the first prediction probability representation as target knowledge corresponding to the training task data input at the time.

And S104, training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

Optionally, the target knowledge acquired in the step S103 is used as a supervisory signal for training the distillation model, and the distillation model is induced to train based on the training task data set, so that the target knowledge is migrated into the distillation model in the training process, and the distillation model learns to strengthen the prediction task of the model. Specifically, the step may be to calculate a soft supervision tag according to the data feature representation and the prediction probability representation in the target knowledge and the data feature representation and the prediction probability representation obtained by processing the training task data by the distillation model, calculate a hard supervision tag according to the processing result of processing the training task data by the distillation model, and further combine the soft supervision tag with the hard supervision tag, thereby performing distillation training on the distillation model with higher learning efficiency through fewer training task data. How to calculate the hard and soft supervision labels, and how to perform the distillation training based on the two supervision labels, will be described in detail in the following examples.

Optionally, the distillation model is trained in the step, and the trained distillation model is the target learning model, and because the target learning model is obtained by distilling knowledge of the pre-training model, the target learning model can accurately execute the prediction task of the pre-training model, and the target learning model has a simple structure relative to the pre-training model, so that the time consumption is short and the response speed is high when the prediction task is executed.

Optionally, after training to obtain the target learning model, the embodiment may deploy the target learning model to an actual human-computer interaction field to perform prediction of an online task. Preferably, if the pre-training model and the target learning model are models for performing intention recognition, correspondingly, after training the distillation model according to the target knowledge and the training task data set to obtain the target learning model, the embodiment of the application further can: and deploying the target learning model into the man-machine interaction equipment so as to identify the intention of the user voice data acquired by the man-machine interaction equipment in real time. Specifically, after the target learning model is deployed in the man-machine interaction device, the man-machine interaction device transmits the user voice data to the target learning model after acquiring the user voice data, the target learning model performs intention recognition on the input user voice data and feeds back an intention recognition result to the man-machine interaction device, and the man-machine interaction device generates a response result corresponding to the user voice data according to the intention recognition result of the target learning model and feeds back the response result to the user. According to the scheme provided by the embodiment of the application, the target learning model is obtained through training in a knowledge distillation mode, the network structure is simpler than that of a pre-training model, the prediction effect can approximate to that of a complex pre-training model, and the purpose recognition can be rapidly and accurately carried out so as to meet the real-time response requirement of the man-machine interaction equipment.

According to the technical scheme of the embodiment, according to a training task data set, a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network are used as training objects, and at least two precipitation training is carried out on a pre-training model to obtain a strengthening model; the distillation model is built based on the target network determined from the reinforcement model. Extracting target knowledge of the training task data set through a target network of the reinforcement model; and training the distillation model based on the extracted target knowledge and the training task data set to obtain a target learning model. In the embodiment, the bottom layer network of the pre-training model is subjected to multiple precipitation training in a gradually decreasing middle-high layer network mode, so that parameters of the bottom layer network of the pre-training model can be more accurate. And constructing a distillation model at least according to the accurate bottom layer network and the prediction layer network after precipitation, and carrying out distillation training on the distillation model based on the extracted target knowledge, so that the target learning model distilled from the pre-training model maintains the prediction accuracy of the pre-training model while simplifying the network structure, and improves the generalization capability of the model. And the whole distillation process is not influenced by human factors, and the target learning model is deployed into the man-machine interaction equipment, so that a quick and accurate execution task can be realized, and the real-time response requirement of the man-machine interaction equipment is met.

Optionally, the pre-training model in the embodiment of the present application is a model that is already trained and can perform a certain prediction task, and when the coverage area of the prediction task is relatively wide, the pre-training model may perform task prediction in multiple fields, but the prediction effect may not be very good for a certain field. For example, if the pre-training model is a model for intent recognition, it may be intended to recognize user speech in numerous areas such as shopping, business handling, and smart furniture control, but for certain areas of these, the predicted effect may not be very accurate. For this case, the present embodiment may perform domain training on the pre-training model according to the training domain data set before performing at least two precipitation exercises on the pre-training model according to the training task data set, and update the pre-training model.

Specifically, the training field data set may be a working field to be deployed based on the pre-training model, and sample data related to the field is specially obtained as the training field data set, for example, if the pre-training model needs to perform intention recognition of the voice of the shopping field user, then all voice data of each shopping platform may be subjected to related processing (such as labeling, deleting invalid data, etc.) to obtain a training field data set corresponding to the field. The training field data set is input into the pre-training model to update and train the pre-training model aiming at the field, and parameters of the pre-training model are finely adjusted, so that the updated pre-training model can more accurately execute the prediction task of the field. According to the method, the training field data set is used for carrying out field training on the pre-training model, after the pre-training model is updated, the precipitation training operation of S101 is carried out on the updated pre-training model, and the setting has the advantage that the prediction precision of the pre-training model in the field to which the prediction task belongs is greatly improved. And a guarantee is provided for distilling out an accurate target learning model based on the pre-training model.

FIG. 2 is a flow chart of another knowledge-based distillation model training method provided in accordance with an embodiment of the application; based on the above embodiment, the present embodiment is further optimized, and a specific description of performing at least two precipitation training on the pre-training model according to the training task data set is given. As shown in fig. 2, the method includes:

s201, dividing the training task data set to determine a plurality of training data subsets.

Optionally, the present operation may divide the training task data set into a plurality of training data subsets according to a preset precipitation strategy, such as the number of network layers for each knowledge extraction. For example, if the pre-training model is the model shown in fig. 1B, and the precipitation strategy is to extract knowledge of one layer of network at a time, the training task data set may be divided into 12 parts at this time. The number of the divisions of the training data subset is smaller than or equal to the total number of layers of the pre-training model. For example, when the total number of layers N of the pre-training model, the number of training data subsets K divided by this step may be equal to half the total number of layers N. Alternatively, the number of training data in each divided training data subset may be the same or different, which is not limited in this embodiment.

S202, according to the set precipitation training times, determining the training objects corresponding to each training data subset.

The training object may be a network layer that needs to be trained in the pre-training model each time sediment training is performed. According to the embodiment of the application, training objects corresponding to each training data subset are different. Specifically, the training objects corresponding to the training data subsets comprise a bottom layer network, a middle-high layer network and a prediction layer network of the pre-training model, and the number of layers of the included middle-high layer network is inversely proportional to the order of precipitation training. And the middle and high-level networks included by each training object are network layers adjacent to the bottom-level network and continuous upwards. That is, in the training objects corresponding to the training data subsets, the bottom layer network and the prediction layer network remain unchanged, the layer number of the middle-high layer network gradually decreases from top to bottom along with the backward shift of the sediment training sequence corresponding to the training data subsets. Optionally, and based on the increase in the number of precipitation exercises, the number of layers of the medium-high layer network included in the exercise object is decremented to zero. Therefore, with the increase of the precipitation training times, only the bottom layer network is updated and trained finally.

Optionally, in this embodiment, when determining the training object corresponding to each training data subset, the bottom layer network and the prediction layer network are unchanged, and the layer number of the middle-high layer network in the training object corresponding to each training data subset may be determined according to the total layer number of the pre-training model and the precipitation training sequence corresponding to each training data subset. For example, if the total layer number of the pre-training model is N, and the precipitation training sequence corresponding to a certain training data subset is the kth time, the highest layer number of the middle-high layer networks included in the training object corresponding to the training data subset is s=n-2*k, that is, the middle-high layer networks below the S layer are all training objects corresponding to the training data set.

And S203, performing one-time precipitation training on the training object corresponding to each training data subset in the pre-training model according to each training data subset to obtain a strengthening model of the pre-training model.

Optionally, after determining the training object corresponding to each divided training data subset, each training data subset may be sequentially input into the pre-training model according to the precipitation training sequence corresponding to each training data subset, and the input training data subset is utilized to train each network layer corresponding to the training object in the pre-training model, so as to update the parameters of each network layer corresponding to the training object. Because the number of layers of the middle-high layer network in the training object corresponding to each training data subset gradually decreases along with the increase of the precipitation training times, in the process of multiple precipitation training, the updated parameters of the middle-high layer network are smaller and smaller, the training process is gradually concentrated to the bottom layer network, the bottom layer network of the pre-training model is more accurate after multiple precipitation training, and the pre-training model after multiple precipitation training can be used as the strengthening model at the moment.

S204, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

Wherein the target network comprises a feature recognition network and the prediction layer network; the feature recognition network includes at least an underlying network.

S205, extracting target knowledge of the training task data set through a target network of the reinforcement model.

S206, training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

According to the technical scheme of the embodiment, the training task data set is divided into a plurality of training data subsets, the training object of each training data subset is determined based on the principle that the layer number of the middle-high layer network in the training object is inversely proportional to the precipitation training sequence, and the precipitation training is carried out on the training object of each training data subset after the division according to each training data subset to obtain the strengthening model. Constructing a distillation model according to the reinforcement model, and extracting target knowledge; and training the distillation model based on the extracted target knowledge and training task data set to obtain a target learning model. According to the method, based on the principle that the number of layers of the middle-high-level network in the training object is inversely proportional to the order of precipitation training, the training object of each precipitation training is determined, and parameters of the bottom-level network of the pre-training model after multiple precipitation training are more accurate. Provides a new idea for knowledge precipitation operation in the knowledge distillation process. And a guarantee is provided for the subsequent operation of training the target learning model by knowledge distillation.

Fig. 3 is a flowchart of another knowledge distillation-based model training method according to an embodiment of the present application, where further optimization is performed on the basis of the foregoing embodiment, and a detailed description is given of when an enhanced model of a pre-training model is obtained in the process of performing precipitation training on the pre-training model multiple times. As shown in fig. 3, the method includes:

s301, performing precipitation training on the pre-training model successively according to the training task data set.

It should be noted that, the specific implementation manner of performing the precipitation training on the pre-training model successively in this step is already described in detail in the above embodiment, and will not be described in detail herein.

S302, testing the pre-training model after precipitation training according to the testing task data set.

The test task data set may be test data for testing whether the pre-training model after the sediment training can accurately complete the prediction task. Optionally, sample data related to a predicted task may be obtained according to a predicted task that needs to be executed by the pre-training model, and then the sample data is divided into two parts, where one part is a training task data set in an embodiment of the present application, and the other part is a test task data set in an embodiment of the present application.

Optionally, in this embodiment, the test task data set may be input into the pretrained model after multiple precipitation training in S301, to obtain a predicted result of the pretrained model after precipitation training based on the output of the test task data, and finally, the predicted result is analyzed according to the real label in the test task data, and an evaluation index value indicating whether the output result of the pretrained model after multiple precipitation training is accurate is calculated, and the evaluation index value is used as the test result. Alternatively, the evaluation index value may be determined according to a prediction task, for example, may be the accuracy, precision, recall rate, etc. of the output result of the pre-training model after multiple precipitation training.

Optionally, in order to ensure accuracy of the test result, the embodiment may use multiple sets of test task data sets to test the pre-training model after the sediment training for multiple times, and determine the final test result according to the multiple test results.

And S303, if the test result meets the precipitation ending condition, taking the pre-training model after precipitation training as a strengthening model.

The precipitation ending condition may be a judgment condition for judging whether the pre-training model after the multiple precipitation training meets the reinforcement model. Specifically, the index threshold value corresponding to the evaluation index value in the test result may be used.

Optionally, in this embodiment, the pre-training model after the precipitation training may be tested in S302, the obtained test result (i.e. the evaluation index value) is compared with the index threshold in the precipitation ending condition, if the evaluation index value meets the index threshold, it is indicated that the test result meets the precipitation ending condition, and at this time, the pre-training model after the precipitation training may be used as the strengthening model; if the test result does not meet the precipitation ending condition, the method needs to return to S301 to continue to perform the precipitation training on the pre-training model successively according to the training task data set until the test result meets the precipitation ending condition.

S304, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

S305, extracting target knowledge of the training task data set through a target network of the reinforcement model.

S306, training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

According to the technical scheme of the embodiment, after the bottom layer network of the pre-training model is subjected to precipitation training for multiple times according to the training task data set, the pre-training model after precipitation training is tested according to the testing task data set, and if the test is passed, the pre-training model after precipitation training can be used as a strengthening model. Further constructing a distillation model according to the reinforcement model, and extracting target knowledge; and training the distillation model based on the extracted target knowledge and training task data set to obtain a target learning model. According to the embodiment, whether the knowledge precipitation achieves the expected effect of precipitation training or not is determined by testing the pre-training model after the knowledge precipitation, and the knowledge precipitation can be used as the strengthening model only if the knowledge precipitation achieves the expected effect, so that the accuracy of the bottom network parameters of the obtained strengthening model is ensured. And a guarantee is provided for the subsequent operation of training the target learning model by knowledge distillation.

Optionally, the foregoing embodiment describes a process of determining when to obtain the reinforcement model of the pre-training model in the process of performing multiple precipitation training on the underlying network of the pre-training model, and in the same way, in the process of training the distillation model according to the target knowledge and the training task data set, a similar method may also be used to determine whether the distillation model is trained, so as to obtain the target learning model. Specific: the embodiment of the application can be specifically executed when training the distillation model according to the target knowledge and the training task data set to obtain the target learning model: training the distillation model according to the target knowledge and the training task data set; testing the trained distillation model according to the test task data set; and if the test result meets the training ending condition, taking the trained distillation model as a target learning model. The process of testing the trained distillation model according to the training task data set is similar to the process of testing the pre-training model after precipitation training according to the training task data set described in the above embodiment, for example, the test task data set may be input into the trained distillation model, the evaluation index value is calculated according to the output prediction result of the trained distillation model and the real label of the test task data set, and if the evaluation index value meets the index threshold in the training end condition, it is indicated that the test result of the trained distillation model meets the training end condition, and the distillation model after the training can be used as the target learning model. The method has the advantages that whether the task prediction precision of the trained distillation model reaches the expected effect is determined by testing the trained distillation model, and the task prediction precision can be used as a final target learning model only if the task prediction precision reaches the expected effect, so that the accuracy of the target learning model distilled based on the knowledge distillation technology is improved.

Fig. 4-5 are flowcharts of two knowledge-based distillation model training methods according to embodiments of the present application, which are further optimized based on the above embodiments, and an introduction of two specific implementations of constructing a distillation model according to a target network is given.

Alternatively, fig. 4 shows an embodiment of constructing a distillation model with the same structure as the reinforcement model according to the target network, specifically:

s401, performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model.

The training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network, and the pre-training model comprises the bottom layer network, at least one middle-high layer network and the prediction layer network from bottom to top.

S402, taking at least two networks in the reinforcement model as target networks, and acquiring network structure blocks of the target networks.

The target network comprises a characteristic recognition network and a prediction layer network of the reinforcement model; the feature recognition network includes at least an underlying network of the augmentation model. Optionally, some or all of the middle-high layer network of the reinforcement model may also be included. Because the distillation model constructed in this embodiment has a simpler network structure than the reinforcement model, the characteristic recognition network of the target network according to the embodiment of the present application typically does not include or includes only a small amount of medium-to-high-level networks. The network structure block may be obtained by encapsulating the network structure of one or more network layers in the reinforcement model, for example, assuming that the reinforcement model in this embodiment is obtained by performing precipitation training on the pre-training model shown in fig. 1B, the network structure of the reinforcement model should also be as shown in fig. 1B, where the network structures of the 1 st to 3 rd network layers in fig. 1B may be encapsulated as the network structure block of the underlying network 10; encapsulating the network structures of the 4 th to 7 th network layers into network structure blocks of the middle layer network 110; encapsulating the network structures of the 8 th to 11 th network layers into network structure blocks of the higher layer network 111; the network structure of the 12 th network layer is encapsulated as a network structure block of the prediction layer network 12.

Alternatively, if a distillation model having the same structure as the reinforcement model is to be constructed, a network structure block corresponding to the target network in the reinforcement model may be obtained after the target network is selected from the reinforcement model. For example, the underlying network and the prediction layer network in the enhancement model are taken as target networks, and then the network structure blocks of the underlying network and the prediction layer network can be taken as the network structure blocks of the target networks.

S403, constructing a distillation model with the same structure as the reinforcement model according to the obtained network structure block.

Optionally, since the target network corresponds to at least two networks in the reinforcement model, the obtained network structure blocks are also network structure blocks of at least two networks, and this step may be to arrange the at least two network structure blocks in the order from bottom to top in the reinforcement model, and take the output of the network structure block located below as the input of the network structure block located above adjacent to the network structure block located below, so as to form a new model composed of the target network, where the new model is the distillation model that is constructed.

For example, assuming that the network blocks of the underlying network 10 and the prediction layer network 12 in fig. 1B are acquired in S402, since the underlying network 10 is located below the prediction layer network 12, it may be that the network block of the underlying network 10 is located below the network block of the prediction layer network 12, and the output of the network block of the underlying network 10 is connected to the input of the network block of the prediction layer network 12, so as to generate a distillation model composed of the network block of the underlying network 10 and the network block of the prediction layer network 12. Similarly, if the network blocks of the bottom layer network 10, the middle layer network 110 and the prediction layer network 12 are obtained in S402, it may be that the network block of the bottom layer network 10 is located at the lowest, the network block of the middle layer network 110 is located at the middle, the network block of the prediction layer network 12 is located at the uppermost, the output of the network block of the bottom layer network 10 is connected to the input of the network block of the middle layer network 110, and the output of the network block of the middle layer network 110 is connected to the input of the network block of the prediction layer network 12, thereby generating a distillation model composed of the network block of the bottom layer network 10, the network block of the middle layer network 110 and the network block of the prediction layer network 12.

S404, extracting target knowledge of the training task data set through a target network of the reinforcement model.

And S405, training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

Alternatively, fig. 5 shows an embodiment of constructing a distillation model with a different structure from the reinforcement model according to the target network, specifically:

and S501, performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model.

S502, taking at least two networks in the reinforcement model as target networks.

Optionally, the process of selecting the target network from the enhancement model has been described in the above embodiment, and this embodiment will not be described in detail herein.

S503, selecting a neural network model with a different structure from the reinforcement model as a distillation model according to the target network.

The output layer network of the neural network model is consistent with the type of the prediction layer network in the target network, and the non-output layer network of the neural network model is consistent with the type of the feature recognition network in the target network. The type of the prediction layer network means that the type of the network is of a prediction type, i.e., a task prediction type is performed. Types of so-called feature recognition networks include: a bottom layer network, a middle layer network, a higher layer network, etc.

Alternatively, since the distillation model constructed in the present embodiment is different from the reinforcement model in structure, a neural network model having a simple structure and being used for realizing the prediction task may be selected as the distillation model at this time according to the requirements. The neural network model which can be selected as the distillation model is generally simpler in structure and fewer in layer number, but the output layer of the neural network model is required to be consistent with the type of the prediction layer network in the target network, and the non-output layer network is required to be consistent with the type of the feature identification network in the target network. That is, the output layer of the neural network model needs to be a network capable of task prediction, and the non-output layer needs to be a network consistent with the type of the feature recognition network of the target network, for example, if the type of the feature recognition network of the target network is an underlying network, the type of the non-output layer of the neural network model should also be the underlying network; if the types of the feature recognition networks of the target network are the bottom layer network and the middle layer network, the types of the non-output layers of the neural network model should also be the bottom layer network and the middle layer network.

The distillation model constructed in this step has a small number of layers due to the structural units, and therefore is generally related to a reinforced model having a complicated structure in terms of a heterogeneous model. For example, assuming that the reinforcement model is the bert model, it may be that a CNN model is selected as the distillation model.

S504, extracting target knowledge of the training task data set through a target network of the reinforcement model.

S505, training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

According to the technical scheme, a specific implementation mode of two distillation models with the same or different structures as the reinforced model is constructed according to a target network of the reinforced model after precipitation training in the process of training the target learning model of the pre-training model based on a knowledge distillation technology is provided. If a distillation model with the same structure as the reinforcement model is constructed, the isomorphic distillation model is easier to distill and train to the prediction effect of the reinforcement model because the distillation model reserves the network structure block of the reinforcement model; if the distillation model with the same structure as the reinforcement model is constructed, the heterogeneous distillation model can learn different characteristics from the reinforcement model, so that the generalization capability of the model is improved. The embodiment of the application can be selected according to actual requirements, and has strong flexibility.

FIG. 6A is a flow chart of another knowledge-based distillation model training method provided in accordance with an embodiment of the application; fig. 6B is a schematic structural diagram of training a distillation model according to an embodiment of the present application. The embodiment is further optimized based on the embodiment, and specific description of training the distillation model according to the target knowledge and the training task data set is given. As shown in fig. 6A-6B, the method includes:

And S601, performing precipitation training on the pre-training model at least twice according to the training task data set to obtain a strengthening model of the pre-training model.

S602, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

The target network comprises a characteristic identification network and a prediction layer network; the feature recognition network includes at least an underlying network.

Illustratively, it is assumed that the target networks selected from the reinforcement model shown in fig. 6B are a bottom layer network, a middle layer network, and a prediction layer network, and the distillation model shown in fig. 6B is constructed based on these three networks.

S603, extracting target knowledge of the training task data set through a target network of the reinforcement model.

The present operation may be, for example, inputting training data of a preset size, such as a batch_size, in the training task data set into the enhancement model shown in fig. 6B, and obtaining a feature representation (knowledgeq_seq of the underlying network output of the enhancement model _l ) And a feature representation of the middle layer network output (knowledgeseq _m ) As a first data characteristic representation (knowledgeseq); and obtaining a characteristic representation (knowledgejprediction) of the prediction layer network output of the enhancement model as a first prediction probability representation. The first data characteristic representation and the first prediction probability representation acquired in the step are extracted target knowledge.

S604, inputting the training task data set into a distillation model, and determining a soft supervision label and a hard supervision label according to the processing result and target knowledge of the distillation model on the training task data set.

Wherein the soft supervision tag and the hard supervision tag are two supervision signals in the process of training the distillation model. Wherein the soft supervision labels are calculated based on the extracted target knowledge and the hard supervision labels are calculated based on the actual labels in the training task dataset.

Alternatively, the embodiment may be that the training task data set is input into the distillation model, and the distillation model processes the input training data set to obtain output results of each network layer of the distillation model, where the output results are used to determine the soft supervision labels by combining the target knowledge on one hand. And on the other hand, is used for calculating the hard supervision labels by combining the related information of the training task data set. The specific determination process comprises the following three substeps:

S6041, inputting the training task data set into the distillation model to obtain a second data characteristic representation output by a characteristic recognition network of the distillation model and a second prediction probability representation output by a prediction layer network of the distillation model.

Specifically, after training data with a preset size, such as a batch_size, in training task data is input into a distillation model, a prediction layer network output prediction result (namely, a feature representation) in the distillation model is obtained and is used as a second prediction probability representation; if only the bottom network exists in the characteristic identification network in the distillation model, acquiring the characteristic representation output by the bottom network as a second data characteristic representation; if the feature recognition network in the distillation model comprises a part of middle and high-level networks besides the bottom-level network, the bottom-level network and the feature representation output by the part of middle and high-level networks are acquired at the moment and taken as a second data feature representation. Exemplary, as shown in FIG. 6B, the training task data set is input into the distillation model, and since the feature recognition network of the target network in FIG. 6B includes the bottom layer network and the middle layer network, the distillation model needs to process the training task data set, and then the feature representation (samll_seq _l ) And a feature representation of the middle layer network output (samll_seq) _m ) As a second data characteristic representation (samll_seq); and taking a characteristic representation (small_prediction) output by a prediction layer network of the distillation model as a second prediction probability representation.

S6042, determining the soft supervision labels according to the target knowledge, the second data characteristic representation and the second prediction probability representation.

Optionally, since the target knowledge is formed by the first data feature representation and the first prediction probability representation, the embodiment may calculate the first data feature representation, the first prediction probability representation, the second data feature representation and the second prediction probability representation according to a preset algorithm, so as to obtain the soft supervision tag. The specific calculation algorithm is not limited in this embodiment. As a data feature tag, the mean variance of the first data feature representation and the second data feature representation in the target knowledge; taking the mean variance of the first predictive probability representation and the second predictive probability representation in the target knowledge as a probability prediction label; and then carrying out label fusion on the data characteristic labels and the probability prediction labels according to the weight value of the characteristic recognition network of the reinforcement model to obtain soft supervision labels. According to the embodiment, the soft supervision labels are determined according to the reinforcement model and the distillation model based on the same training task data set and the output characteristic representation, so that the determined soft supervision labels are more accurate, and the accuracy of a target learning model trained later is improved.

Specifically, the data feature label may be calculated according to the following formula (1), and the probability prediction label may be calculated according to the following formula (2); finally, the soft supervision labels are calculated according to the following formula (3).

loss_i＝MSE(knowledge_seq,small_seq) (1)

loss_p＝MSE(knowledge_predict,small_predict) (2)

loss_soft＝W _i *loss_i+loss_p (3)

Wherein loss_i is a data feature tag; MSE () is a mean variance function; knowledgeseq is a first data characteristic representation; small_seq is the second data characteristic representation; loss_p is a probability prediction tag; knowledgejprediction is a first predictive probability representation; small_prediction is a second predictive probability representation; loss_soft is a soft supervision tag; w (W) _i The weight values of the network are identified for the features.

Alternatively, when the feature recognition network includes a plurality of networks (such as an underlying network and a middle network), the first data feature representation and the second data feature representation are both composed of feature representations output by a plurality of network layers, and in this case, a data feature tag may be calculated according to formula (1) for each of the feature representations output by the network layers. For example, as shown in FIG. 6B, the first data characteristic representation includes: knowledgeseq _l And knowledgeseq _m The second data characteristic representation includes: samll_seq _l And samll_seq _m . At this time, it may be according to knowledgeseq _l And samll_seq _l Calculating data feature label loss_i of underlying network _l According to knowledgeseq _m And samll_seq _m Calculating data feature label loss_i of middle-layer network _m . Correspondingly, when calculating the soft supervision labels, the product of the weight value of each network and the data characteristic label thereof and the probability prediction label are summed to obtain the final soft supervision label. For example, for the scenario shown in fig. 6B, the calculation formula of the soft supervision tag may be loss_soft=w _l *loss_i _l +W _m *loss_i _m +loss_p。

And S6043, determining the hard supervision labels according to the second predictive probability representation and the training task data set information.

Wherein the training task data set information comprises: the training task data set includes a training sample number, a training label number, and an actual label value.

Alternatively, this sub-step may be to calculate the hard supervisory tag according to equation (4) below.

Wherein loss_hart is a hard supervision tag; n is the number of training samples in the training task data set; m is the number of training labels, i is the ith training sample; c is the training label; y is _ic The actual label value of the ith sample belonging to the ith training label; small_prediction _ic The ith training sample output for the predictive network layer of the distillation model belongs to the probability value of the c-th training label. Alternatively, y _ic The value of (2) may be 0 or 1.

In this embodiment, the soft supervision labels are determined according to the reinforcement model and the distillation model based on the same training task data set and the output characteristic representation, the hard supervision labels are determined according to the actual label values of the training task data and the prediction probability of the distillation model, a new idea is provided for determining the soft supervision labels and the hard supervision labels, and the accuracy of the soft and hard supervision labels is improved.

S605, determining the target label according to the soft supervision label and the hard supervision label.

The target label is a label value which is finally used for supervising the distillation model training after combining the characteristics of the soft supervision label and the hard supervision label. Optionally, this step determines the target tag according to the following formula (5):

loss＝alpha*loss_soft+(1-alpha)*loss_hart (5)

wherein loss is a target label, and alpha is a parameter variable; loss_soft is a soft supervision tag; loss_hart is a hard supervision tag.

The parameter variables in the above formula (5) may be constants set based on preset rules or variables trained with the distillation model. This embodiment is not limited.

S606, according to the target label, carrying out iterative updating on parameters of the distillation model to obtain a target learning model.

Alternatively, in this embodiment, the parameters of the distillation model may be updated and adjusted according to the target label determined in S605 and a preset rule, such as a back propagation algorithm (BP algorithm), so as to complete one iteration update of the parameters of the distillation model. And then acquiring the next set of training data with preset size, such as the size of batch_size, from the training task data set, inputting the training data into the distillation model, returning to execute the operation of S603-S606, and performing the next iterative update on the parameters of the distillation model so as to complete the training of the distillation model. After the distillation model is trained for a plurality of times, the trained distillation model can be tested through the test task data set, if the training ending condition is met, the distillation model is trained, and the trained distillation model can be used as a target learning model.

According to the technical scheme of the embodiment, a strengthening model is obtained by carrying out precipitation training on a bottom network of a pre-training model, a distillation model is constructed, and target knowledge is extracted; and determining a soft supervision tag and a hard supervision tag according to the processing result of the distillation model on the task training data and the extracted target knowledge, and further determining the target tag based on the soft supervision tag to iteratively update the parameters of the distillation model to obtain a target learning model. In the embodiment, the distillation model is trained by combining the soft supervision tag and the hard supervision tag, so that the generalization capability of the distillation model is improved while the trained distillation model approximates to the prediction effect of the pre-training model. Thereby better meeting the real-time response requirement of the man-machine interaction equipment.

FIG. 7 is a flow chart of a knowledge-based distillation model training method, provided in accordance with an embodiment of the application. The present embodiment provides a preferred example based on the above embodiments, and specifically, as shown in fig. 7, the method includes:

s701, obtaining a pre-training model.

Optionally, the pre-training model obtained in the step is a model which is already trained based on massive training samples, and the pre-training model can better complete an online prediction task.

S702, performing field training on the pre-training model according to the training field data set, and updating the pre-training model.

S703, performing successive precipitation training on the pre-training model according to the training task data set.

And S704, testing the pre-training model after precipitation training according to the test task data set.

S705, determining whether the test result satisfies the precipitation ending condition, if yes, executing S706, and if no, returning to executing S702.

Optionally, if the test result meets the precipitation ending condition, it indicates that the precipitation training has reached the expected effect, and S706 may be executed as an enhancement model, otherwise, it indicates that the precipitation training is insufficient, and it needs to return to S702 to update and adjust parameters of the pre-training model based on the training field data set.

S706, if the test result meets the precipitation ending condition, taking the pre-training model after precipitation training as the strengthening model.

S707, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

S708, extracting target knowledge of the training task data set through the target network of the reinforcement model.

S709, training the distillation model according to the target knowledge and the training task data set.

S710, testing the trained distillation model according to the test task data set.

S711, judging whether the test result meets the training ending condition, if so, executing S712, and if not, returning to executing S709.

And S712, if the test result meets the training ending condition, taking the trained distillation model as a target learning model.

According to the technical scheme provided by the embodiment of the application, a specific implementation scheme for distilling the target learning model from the pre-training model based on a knowledge distillation technology is provided, and the target learning model distilled by the scheme simplifies network structure branches and improves generalization capability of the model while maintaining the accurate prediction capability of the pre-training model. The target learning model is deployed into the man-machine interaction equipment, so that the task can be rapidly and accurately executed, and the real-time response requirement of the man-machine interaction equipment is met.

Fig. 8 is a flowchart of an intention recognition method according to an embodiment of the present application. The present embodiment is applicable to the case of performing intention recognition based on the target learning model trained in each of the above embodiments. The embodiment may be performed by an intention recognition device configured in an electronic apparatus, which may be implemented in software and/or hardware. Optionally, the electronic device may be a man-machine interaction device or a service end in communication interaction with the man-machine interaction device. The man-machine interaction device can be an intelligent robot, an intelligent sound box, an intelligent mobile phone and the like. As shown in fig. 8, the method includes:

s801, user voice data collected by the man-machine interaction equipment is obtained.

Optionally, the man-machine interaction device of the embodiment of the application can collect the user voice data in the environment in real time through a voice collection device (such as a microphone) configured in the man-machine interaction device. If the execution subject of the embodiment is a man-machine interaction device, the man-machine interaction device may directly perform the following operation of S802 after collecting the user voice data. If the execution body of the embodiment is a server that performs communication interaction with the man-machine interaction device, after the man-machine interaction device collects the user voice data, the man-machine interaction device transmits the user voice data to the server that performs communication interaction, and the server performs the following operation of S802 after obtaining the user voice data.

S802, inputting the user voice data into the target learning model to acquire a user intention recognition result output by the target learning model.

The target learning model in this embodiment is determined based on training by the knowledge distillation-based model training method described in any one of the above embodiments. And the target learning model of the present embodiment is a model for performing intention recognition.

Optionally, after the user voice data is acquired by the man-machine interaction device or the service end interacting with the man-machine interaction device, the acquired user voice data is input into the target learning model, at this time, the target learning model performs on-line analysis and prediction on the user voice data by adopting an algorithm during training based on the input user voice data, and a user intention recognition result is output, at this time, the user intention recognition result output by the target learning model is acquired by the man-machine interaction device or the service end interacting with the man-machine interaction device.

S803, determining a response result of the man-machine interaction device according to the user intention recognition result.

Optionally, the man-machine interaction device or the server side interacting with the man-machine interaction device determines a target man-machine interaction response rule corresponding to the user intention recognition result based on the obtained user intention recognition result, determines the response result based on the target man-machine interaction response rule, and feeds back the response result to the user so as to realize man-machine interaction based on the user voice data.

According to the technical scheme provided by the embodiment of the application, the target learning model which is trained based on the knowledge distillation-based model training method and is used for intention recognition is deployed in the man-machine interaction equipment or the service end which is in communication interaction with the man-machine interaction equipment, the man-machine interaction equipment or the service end which is in communication interaction with the man-machine interaction equipment can acquire user voice data and input the user voice data into the target learning model, and the response result is determined based on the user intention recognition result output by the target learning model. The target learning model deployed in the man-machine interaction equipment or the service end in communication interaction with the man-machine interaction equipment is obtained through training in a knowledge distillation mode, the network structure of the target learning model is simpler than that of a pre-training model, the prediction effect can approximate to a complex pre-training model, quick and accurate intention recognition can be realized, and the real-time response requirement of the man-machine interaction equipment is met.

Fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application, where the embodiment is applicable to a case of compression training a pre-training model with a complex network structure into a target learning model with a simple network structure based on a knowledge distillation technology. The device can implement the knowledge distillation-based model training method according to any embodiment of the present application, and the device 900 specifically includes the following steps:

The precipitation training module 901 is configured to perform precipitation training on a pre-training model at least twice according to a training task data set, so as to obtain an enhanced model of the pre-training model; the training object of each precipitation training at least comprises a bottom layer network, a prediction layer network and a gradually decreasing middle-high layer network, and the pre-training model comprises the bottom layer network, at least one middle-high layer network and the prediction layer network from bottom to top;

a distillation model construction module 902, configured to take at least two networks in the reinforcement model as target networks, and construct a distillation model according to the target networks, where the target networks include a feature recognition network and the prediction layer network; the feature recognition network at least comprises the bottom layer network;

a target knowledge extraction module 903, configured to extract target knowledge of the training task data set through a target network of the augmentation model;

and the distillation model training module 904 is configured to train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

Further, the bottom layer network and the middle-high layer network are used for carrying out feature recognition; the prediction layer network is used for carrying out task prediction according to the identified characteristics.

Further, the precipitation training module 901 includes:

the data subset dividing unit is used for dividing the training task data set to determine a plurality of training data subsets;

the training object determining unit is used for determining the training object corresponding to each training data subset according to the set precipitation training times; the training objects corresponding to the training data subsets comprise a bottom layer network, a middle-high layer network and a prediction layer network of the pre-training model, and the number of layers of the included middle-high layer network is inversely proportional to the order of precipitation training;

the precipitation training unit is used for carrying out one-time precipitation training on the training object corresponding to each training data subset in the pre-training model according to each training data subset;

and the number of divisions of the training data subset is less than or equal to the total number of layers of the pre-training model.

Further, the middle-high layer network included in each training object is a network layer adjacent to the bottom layer network and continuous upwards; and based on the increase in the number of precipitation exercises, the number of layers of the medium-high layer network included in the exercise object is decremented to zero.

Further, the precipitation training module 901 is specifically configured to:

According to the training task data set, carrying out precipitation training on the pre-training model successively;

testing the pre-training model after precipitation training according to the test task data set;

and if the test result meets the precipitation ending condition, taking the pre-training model after the precipitation training as a strengthening model.

Further, the device further comprises:

and the domain training model is used for carrying out domain training on the pre-training model according to the training domain data set before carrying out precipitation training on the pre-training model at least twice according to the training task data set, and updating the pre-training model.

Further, the distillation model construction module 902 is specifically configured to:

taking at least two networks in the reinforcement model as target networks, and acquiring network structure blocks of the target networks;

and constructing a distillation model with the same structure as the strengthening model according to the obtained network structure block.

taking at least two networks in the reinforcement model as target networks;

and selecting a neural network model with a structure different from that of the reinforcement model as a distillation model according to the target network, wherein the output layer network of the neural network model is consistent with the type of the prediction layer network in the target network, and the non-output layer network of the neural network model is consistent with the type of the feature identification network in the target network.

Further, the target knowledge extraction module 903 is specifically configured to:

taking the training task data set as the input of the strengthening model, and acquiring a first data characteristic representation output by a characteristic recognition network of the strengthening model and a first prediction probability representation output by a prediction layer network of the strengthening model;

the acquired first data feature representation and the first predictive probability representation are used as target knowledge of the training task data set.

Further, the distillation model training module 904 includes:

the supervision tag determining unit is used for inputting the training task data set into the distillation model and determining a soft supervision tag and a hard supervision tag according to the processing result of the distillation model on the training task data set and the target knowledge;

the target label determining unit is used for determining a target label according to the soft supervision label and the hard supervision label;

and the model parameter updating unit is used for iteratively updating the parameters of the distillation model according to the target label.

Further, the supervision tag determination unit specifically includes:

an output acquisition subunit, configured to input the training task data set into the distillation model, and obtain a second data feature representation output by a feature recognition network of the distillation model, and a second prediction probability representation output by a prediction layer network of the distillation model;

A soft label determination subunit configured to determine a soft supervision label based on the target knowledge, the second data characteristic representation, and the second predictive probability representation;

a hard tag determination subunit configured to determine a hard supervisory tag based on the second predictive probability representation and the training task dataset information.

Further, the training task data set information includes: the training task data set includes a training sample number, a training label number, and an actual label value.

Further, the soft tag determination subunit is specifically configured to:

taking the mean variance of the first data feature representation and the second data feature representation in the target knowledge as a data feature label;

taking the mean variance of the first predictive probability representation and the second predictive probability representation in the target knowledge as a probability prediction label;

and carrying out label fusion on the data characteristic labels and the probability prediction labels according to the weight value of the characteristic recognition network of the reinforcement model to obtain soft supervision labels.

Further, the distillation model training module 904 is specifically configured to:

training the distillation model according to the target knowledge and the training task data set;

Testing the trained distillation model according to the test task data set;

and if the test result meets the training ending condition, taking the trained distillation model as a target learning model.

Further, the pre-training model is a bert model.

Further, the pre-training model and the target learning model are models for performing intention recognition;

correspondingly, the device further comprises:

the model deployment module is used for deploying the target learning model into man-machine interaction equipment so as to conduct intention recognition on user voice data acquired by the man-machine interaction equipment in real time.

Fig. 10 is a schematic structural diagram of an intention recognition device according to an embodiment of the present application, and the embodiment is applicable to a case of performing intention recognition based on a target learning model trained in the above embodiments. The device can implement the intention recognition method according to any embodiment of the present application, and the device 1000 specifically includes the following:

the voice data acquisition module 1001 is configured to acquire user voice data acquired by the man-machine interaction device;

the intention recognition module 1002 is configured to input the user voice data into a target learning model, so as to obtain a user intention recognition result output by the target learning model; wherein the target learning model is determined based on training by the knowledge distillation-based model training method according to any one of the above embodiments;

and a response result determining module 1003, configured to determine a response result of the man-machine interaction device according to the user intention recognition result.

Further, the device is configured in the man-machine interaction equipment or a service end in communication interaction with the man-machine interaction equipment.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 11, there is a block diagram of an electronic device of a knowledge-based model training method or an intention recognition method in accordance with an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 11, the electronic device includes: one or more processors 1101, memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 11, a processor 1101 is taken as an example.

Memory 1102 is a non-transitory computer-readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the knowledge distillation based model training method or the intent recognition method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the knowledge distillation-based model training method or the intention recognition method provided by the present application.

The memory 1102 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to a knowledge-based distillation model training method or an intention recognition method in an embodiment of the present application (e.g., a sediment training module 901, a distillation model construction module 902, a target knowledge extraction module 903, and a distillation model training module 904 shown in fig. 9; or a voice data acquisition module 1001, an intention recognition module 1002, and a response result determination module 1003 shown in fig. 10). The processor 1101 executes various functional applications of the server and data processing, i.e., implements the knowledge distillation based model training method or the intention recognition method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 1102.

Memory 1102 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of an electronic device of a knowledge-based distillation model training method or an intention recognition method, or the like. In addition, memory 1102 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1102 optionally includes memory remotely located relative to processor 1101, which may be connected to the electronic device of the knowledge-based distillation model training method or the intent recognition method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the knowledge distillation-based model training method or the intention recognition method may further include: an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103 and output device 1104 may be connected by a bus or other means, for example in fig. 11.

The input device 1103 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the knowledge-based model training method or the intent recognition method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output device 1104 may include a display device, auxiliary lighting (e.g., LEDs), and haptic feedback (e.g., a vibration motor), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A knowledge distillation based model training method, the method comprising:

training the distillation model according to the target knowledge and the training task data set to obtain a target learning model; the target learning model is used for carrying out intention recognition on the voice data of the user.

2. The method of claim 1, wherein the underlay network and the mid-high layer network are used for feature recognition; the prediction layer network is used for carrying out task prediction according to the identified characteristics.

3. The method of claim 1, wherein training the pre-training model for at least two depositions based on the training task data set comprises:

dividing the training task data set to determine a plurality of training data subsets;

according to the set precipitation training times, determining the training objects corresponding to each training data subset; the training objects corresponding to the training data subsets comprise a bottom layer network, a middle-high layer network and a prediction layer network of the pre-training model, and the number of layers of the included middle-high layer network is inversely proportional to the order of precipitation training;

Performing one-time precipitation training on a training object corresponding to the training data subset in the pre-training model according to each training data subset;

4. A method according to claim 3, wherein each of the training objects comprises a middle-high level network that is a network layer adjacent to and continuous upward from an underlying network; and based on the increase in the number of precipitation exercises, the number of layers of the medium-high layer network included in the exercise object is decremented to zero.

5. The method of claim 1, wherein training the pre-training model for precipitation at least twice based on the training task data set, resulting in an enhanced model of the pre-training model, comprises:

6. The method of claim 1, wherein prior to training the pre-training model at least two depositions based on the training task data set, further comprising:

And performing field training on the pre-training model according to the training field data set, and updating the pre-training model.

7. The method of claim 1, wherein taking at least two networks of the reinforcement model as target networks and constructing a distillation model from the target networks comprises:

8. The method of claim 1, wherein taking at least two networks of the reinforcement model as target networks and constructing a distillation model from the target networks comprises:

taking at least two networks in the reinforcement model as target networks;

9. The method of claim 1, wherein extracting target knowledge of the training task dataset through a target network of the augmentation model comprises:

10. The method of claim 1, wherein training the distillation model based on the target knowledge and the training task data set comprises:

inputting the training task data set into the distillation model, and determining a soft supervision tag and a hard supervision tag according to the processing result of the distillation model on the training task data set and the target knowledge;

determining a target label according to the soft supervision label and the hard supervision label;

and carrying out iterative updating on the parameters of the distillation model according to the target label.

11. The method of claim 10, wherein inputting the training task data set into the distillation model and determining soft and hard supervision labels based on the processing results of the distillation model on the training task data set and the target knowledge, comprises:

Inputting the training task data set into the distillation model to obtain a second data characteristic representation output by a characteristic recognition network of the distillation model and a second prediction probability representation output by a prediction layer network of the distillation model;

determining a soft supervision tag based on the target knowledge, the second data characteristic representation, and the second predictive probability representation;

a hard supervision tag is determined based on the second predictive probability representation and the training task dataset information.

12. The method of claim 11, wherein the training task data set information comprises: the training task data set includes a training sample number, a training label number, and an actual label value.

13. The method of claim 11, wherein determining a soft supervision tag from the target knowledge, the second data characteristic representation, and the second predictive probability representation comprises:

14. The method of claim 1, wherein training the distillation model based on the target knowledge and the training task data set to obtain a target learning model comprises:

testing the trained distillation model according to the test task data set;

15. The method of any of claims 1-14, wherein the pre-trained model is a bert model.

16. The method of any of claims 1-14, wherein the pre-trained model and target learning model are models for intent recognition;

correspondingly, after training the distillation model according to the target knowledge and the training task data set to obtain a target learning model, the method further comprises:

and deploying the target learning model into man-machine interaction equipment so as to identify the intention of the user voice data acquired by the man-machine interaction equipment in real time.

17. An intent recognition method, the method comprising:

acquiring user voice data acquired by man-machine interaction equipment;

inputting the user voice data into a target learning model to obtain a user intention recognition result output by the target learning model; wherein the target learning model is determined based on training by the knowledge-based model training method of any one of claims 1-16;

18. The method of claim 17, wherein the method execution subject is the human-machine interaction device or a server in communication interaction with the human-machine interaction device.

19. A knowledge distillation based model training apparatus, the apparatus comprising:

the distillation model training module is used for training the distillation model according to the target knowledge and the training task data set to obtain a target learning model; the target learning model is used for carrying out intention recognition on the voice data of the user.

20. The apparatus of claim 19, wherein the underlay network and the mid-high layer network are configured to perform feature recognition; the prediction layer network is used for carrying out task prediction according to the identified characteristics.

21. The apparatus of claim 19, wherein the precipitation training module comprises:

22. The apparatus of claim 21, wherein each of the training objects comprises a middle-high level network that is a network layer that is contiguous and upwardly continuous with an underlying network; and based on the increase in the number of precipitation exercises, the number of layers of the medium-high layer network included in the exercise object is decremented to zero.

23. The apparatus of claim 19, wherein the precipitation training module is specifically configured to:

24. The apparatus of claim 19, further comprising:

25. The apparatus of claim 19, wherein the distillation model building module is specifically configured to:

26. The apparatus of claim 19, wherein the distillation model building module is further specifically configured to:

taking at least two networks in the reinforcement model as target networks;

27. The apparatus of claim 19, wherein the target knowledge extraction module is specifically configured to:

28. The apparatus of claim 19, wherein the distillation model training module comprises:

29. The apparatus of claim 28, wherein the supervision tag determination unit specifically comprises:

30. The apparatus of claim 29, wherein the training task data set information comprises: the training task data set includes a training sample number, a training label number, and an actual label value.

31. The apparatus of claim 29, wherein the soft tag determination subunit is specifically configured to:

32. The apparatus of claim 19, wherein the distillation model training module is further to:

testing the trained distillation model according to the test task data set;

33. The apparatus of any of claims 19-32, wherein the pre-trained model is a bert model.

34. The apparatus of any of claims 19-32, wherein the pre-trained model and target learning model are models for intent recognition;

correspondingly, the method further comprises the steps of:

35. An intent recognition device, the device comprising:

the intention recognition module is used for inputting the user voice data into a target learning model so as to acquire a user intention recognition result output by the target learning model; wherein the target learning model is determined based on training by the knowledge-based model training method of any one of claims 1-16;

36. The apparatus of claim 35, wherein the apparatus is configured in the human-machine interaction device or is a server in communication interaction with the human-machine interaction device.

37. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the knowledge distillation based model training method of any of claims 1-16, or to perform the intent recognition method of any of claims 17-18.

38. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the knowledge-based distillation model training method of any one of claims 1-16, or to perform the intent recognition method of any one of claims 17-18.