CN111242297A - Knowledge distillation-based model training method, image processing method and device - Google Patents

Knowledge distillation-based model training method, image processing method and device Download PDF

Info

Publication number
CN111242297A
CN111242297A CN201911319895.4A CN201911319895A CN111242297A CN 111242297 A CN111242297 A CN 111242297A CN 201911319895 A CN201911319895 A CN 201911319895A CN 111242297 A CN111242297 A CN 111242297A
Authority
CN
China
Prior art keywords
data
model
distillation
training
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911319895.4A
Other languages
Chinese (zh)
Inventor
张有才
戴雨辰
常杰
危夷晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201911319895.4A priority Critical patent/CN111242297A/en
Publication of CN111242297A publication Critical patent/CN111242297A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a knowledge distillation based model training method applied to student models, comprising: according to the distillation position, arranging a second output layer which is the same as the first output layer at the distillation position; acquiring a training set, wherein the training set comprises a plurality of training data; obtaining first data output by a first output layer and second data output by a second output layer based on the training data; acquiring supervision data output by a teacher model on a teacher layer corresponding to the distillation position based on the training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model; obtaining a distillation loss value according to a distillation loss function based on the difference between the supervision data and the first data and the second data; parameters of the student model are updated based on the distillation loss value. Through the public embodiment, the teacher model is more emphatic in knowledge distillation and the knowledge of simple data is transmitted to the student model, so that the training efficiency of knowledge distillation is improved, and the accuracy of the student model is ensured.

Description

Knowledge distillation-based model training method, image processing method and device
Technical Field
The present disclosure relates generally to the field of image recognition, and more particularly to a knowledge-based distillation model training method, a knowledge-based distillation model training apparatus, an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of artificial intelligence recognition, a model is generally adopted for data processing, image recognition and the like, so that while the recognition accuracy and the recognition range are continuously improved, a neural network is more and more huge, the calculation is time-consuming, the parameters are more, and the required storage capacity is huge. It is difficult to apply it to a mobile terminal, especially a mobile terminal with poor hardware.
Knowledge distillation is a model compression method, and in a teacher-student framework, the feature expression 'knowledge' learned by a complex teacher model with strong learning ability is distilled out and transmitted to a student model with small parameter and weak learning ability. In brief, the new small model is used for learning the prediction result of the large model, and the middle knowledge of the complex model or the combined model is transferred into a relatively simple model in a proper mode, so that the model is convenient to popularize and deploy.
At present, some knowledge distillation technology main methods mainly aim at distillation positions (such as a last layer of feature output position, a feature map output position, a position of an output position before a neural network softmax) and the like and two directions of a distillation measurement mode. However, in the traditional method, all training samples are treated in the same way, and the student model simulates the teacher model as much as possible, but the student model has limited capacity after all, cannot perfectly learn all knowledge of the teacher model, and often cannot obtain the optimal performance in blind simulation.
Disclosure of Invention
In order to solve the above problems in the prior art, a first aspect of the present disclosure provides a knowledge-based training method for a model, wherein the method is applied to a student model, and the method includes: setting a second output layer which is the same as the first output layer at the distillation position according to the distillation position; acquiring a training set, wherein the training set comprises a plurality of training data; obtaining first data output by a first output layer and second data output by a second output layer based on the training data; acquiring supervision data output by a teacher model on a teacher layer corresponding to the distillation position based on training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model; obtaining a distillation loss value according to a distillation loss function based on the difference between the supervision data and the first data and the second data; parameters of the student model are updated based on the distillation loss value.
In one example, the distillation loss function is set such that the gap is positively correlated with the value of distillation loss; and as the second data increased, the distillation loss value decreased.
In one example, the distillation loss function is:
Figure BDA0002326852510000021
wherein the content of the first and second substances,
Figure BDA0002326852510000022
in order to supervise the data,
Figure BDA0002326852510000023
as the first data, it is the first data,
Figure BDA0002326852510000024
is the second data.
In one example, the training set further comprises: standard marking data corresponding to the training data one by one; the method further comprises the following steps: obtaining student output data output by the student model based on the training data; obtaining a task loss value according to a task loss function based on the standard annotation data and the student output data; updating parameters of the student model based on the distillation loss value, including: and updating parameters of the student model based on the total loss value of the task loss value and the distillation loss value.
In one example, the method further comprises: and after the total loss value is lower than the training threshold value through multiple iterations, deleting the second output layer to obtain the student model which completes the training.
In one example, the distillation positions include one or more of: any feature extraction layer in the student model and a full connection layer of the student model.
A second aspect of the present disclosure provides an image processing method, the method comprising: acquiring an image; and extracting image characteristics of the image through a model to obtain an image recognition result, wherein the model is a student model obtained through the knowledge distillation-based model training method of the first aspect.
A third aspect of the present disclosure provides a knowledge-based distillation model training device, applied to a student model, the device comprising: the model building module is used for setting a second output layer which is the same as the first output layer at the distillation position according to the distillation position; the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of training data; the data processing module is used for obtaining first data output by the first output layer and second data output by the second output layer based on the training data; the second acquisition module is used for acquiring the supervision data output by the teacher model on a teacher layer corresponding to the distillation position based on the training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model; the loss calculation module is used for obtaining a distillation loss value according to a distillation loss function based on the difference between the supervision data and the first data and the second data; and the parameter adjusting module is used for updating the parameters of the student model based on the distillation loss value.
A fourth aspect of the present disclosure provides an image processing apparatus comprising: the image acquisition module is used for acquiring an image; and the image recognition module is used for extracting image characteristics of the image through a model to obtain an image recognition result, wherein the model is a student model obtained through the knowledge distillation-based model training method in the first aspect.
A fifth aspect of the present disclosure provides an electronic device, comprising: a memory to store instructions; and a processor for invoking memory-stored instructions to perform the knowledge-distillation based model training method of the first aspect or the image processing method of the second aspect.
A sixth aspect of the present disclosure provides a computer-readable storage medium having stored therein instructions that, when executed by a processor, perform the knowledge-based distillation model training method of the first aspect or the image processing method of the second aspect.
According to the knowledge distillation-based model training method and device, the output layer is added according to the distillation position, and the corresponding loss function is used, so that the teacher model can transmit the knowledge of simple data to the student model more heavily in the knowledge distillation, namely, the self-adaptive knowledge transfer is realized, the transfer of the knowledge of dirty data and over-difficult sample data is reduced, the knowledge is transmitted to the student model, the method and device can adapt to any student model, the knowledge transfer can be realized at different positions as required, the training effect of the student model with a simple model structure and few parameters is ensured, and the accuracy and reliability of the recognition result of the student model are ensured.
Drawings
The above and other objects, features and advantages of the embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 shows a schematic flow diagram of a knowledge-based distillation model training method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a knowledge-based distillation model training method according to another embodiment of the present disclosure;
FIG. 3 shows a flow diagram of an image processing method according to an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a knowledge-based distillation model training apparatus according to an embodiment of the present disclosure;
FIG. 5 shows a schematic diagram of an image processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an electronic device provided in an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely to enable those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way.
It should be noted that, although the expressions "first", "second", etc. are used herein to describe different modules, steps, data, etc. of the embodiments of the present disclosure, the expressions "first", "second", etc. are merely used to distinguish between different modules, steps, data, etc. and do not indicate a particular order or degree of importance. Indeed, the terms "first," "second," and the like are fully interchangeable.
In order to make the process of training a student network by knowledge distillation more efficient and transfer more reliable and useful knowledge from a complex teacher model to a student model with a more simplified model, the embodiment of the present disclosure provides a knowledge distillation-based model training method 10 applied to a student model in a knowledge distillation framework of a teacher-student model, as shown in fig. 1, the knowledge distillation-based model training method 10 may include steps S11-S16, which are described in detail below:
step S11, setting a second output layer identical to the first output layer of the distillation position according to the distillation position.
Carry out knowledge distillation's position as required, to the simple transformation that carries out of student's model, according to the original first output layer in this position, increase the second output layer that sets up side by side with first output layer. The second output layer and the first output layer have the same position and structure, and the parameters can be different, and in the following training process, the parameters of the second output layer and the first output layer are independently adjusted.
In some embodiments, the distillation position comprises one or more of: any feature extraction layer in the student model and a full connection layer of the student model. The distillation position can be selected according to actual needs, and in the course of training, a plurality of distillation positions can be selected, and the aforesaid structural modification can be carried out to every distillation position.
Step S12, a training set is obtained, the training set including a plurality of training data.
And acquiring a plurality of training data for training, wherein in the multi-iteration training, the training data is used for inputting the model, and loss is calculated for the result through the supervision data, so that the model parameters are updated.
Step S13 is to obtain first data output by the first output layer and second data output by the second output layer based on the training data.
After the training data is input into the student model, the first data can be obtained through a first output layer through backward propagation, and the second data can be obtained through a second output layer which is parallel to the first output layer at the same position. Depending on the distillation position, that is, the first output layer and the second output layer, the first data and the second data may be feature data, a feature map, a logic output before softmax of a neural network, or the like. Wherein the values of the first data and the second data may be different but the format and dimensions are the same, e.g. both are feature vector representations.
And step S14, acquiring supervision data output by a teacher model on a teacher layer corresponding to the distillation position based on the training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model.
The teacher model is a more complex model, and can be provided that the teacher model has more layers, more complex structures, more parameters and the like, so that the teacher model also has very good performance and generalization capability, and is difficult to deploy in some terminal devices due to the need of larger storage space and more calculation support.
The teacher model in this embodiment is a model that has completed training and is used to complete the same task as the student models, such as both for image recognition. And the structures of the first data and the second data are basically the same, and based on the distillation positions of the student models, the teacher model also has a corresponding teacher layer to output corresponding supervision data, and the supervision data has the same format and dimensions as the first data and the second data.
And step S15, obtaining a distillation loss value according to the distillation loss function based on the difference between the supervision data and the first data and the second data.
In some conventional techniques, the parameters of the student model are adjusted based only on the difference between the supervised data and the first data. In the embodiment of the present disclosure, the second data is added, the distillation loss function includes the difference between the monitoring data and the first data, and the second data to obtain the distillation loss value, and the student model can update the parameters of the model according to the distillation loss value.
In one embodiment, the distillation loss function is set such that the gap is positively correlated to the value of distillation loss; and as the second data increased, the distillation loss value decreased. The larger the difference between the supervision data and the first data is, the larger the distillation loss value is, the larger the amplitude of the parameters required to be adjusted by the student model is, and the output data of the distillation position of the student model cannot well express the characteristics of the student model for the training data. On the other hand, the difference between the supervision data and the first data is too large, which also indicates that the training data may be dirty data or training data which is too difficult, and for the dirty data, knowledge migration should be avoided as much as possible, while the training data which is too difficult may be unusual data in an actual application environment, and for a simplified student model, the meaning of learning the training data which is too difficult is not great. Therefore, in the distillation loss function, through the second data, under the condition that the difference between the monitoring data and the first data is overlarge, the parameters of the second output layer can be adjusted, the second data is improved, and the distillation loss value of the distillation loss function is reduced. The training process is a process for reducing the loss value, so that when the student model updates parameters according to the distillation loss value, the student model can increase the second data by updating the parameters of the second output layer, thereby achieving the purpose of reducing the distillation loss value. Through the mode, the damage of knowledge migration of dirty data or difficult data during training is reduced, the training effect of clean data and difficult data is correspondingly improved, and the training is more efficient.
In other embodiments, the distillation loss function LdistillCan be
Figure BDA0002326852510000061
Wherein the content of the first and second substances,
Figure BDA0002326852510000062
in order to supervise the data,
Figure BDA0002326852510000063
as the first data, it is the first data,
Figure BDA0002326852510000064
for the second data, d represents the dimension and N represents the number of batches of training data. The supervision data, the first data and the second data are data of corresponding distillation positions, and the data are in the same form, such as output d-dimension feature vectors, wherein the supervision data are data output by a teacher model and used for conveying knowledge, the first data are data output by a first output layer of the distillation positions in the student model and capable of representing features, the second data are data output by a second output layer arranged at the distillation positions and capable of representing confidence degrees of the training data, the function of adjusting weights is achieved, and in the case that the training data are dirty data and the like, the influence on other parameters of the model can be reduced.
Under the condition that the loss value exceeds the threshold value, parameters of the student model need to be updated to reduce the loss value, the student model can enable the first data to be close to the supervision data through updating the parameters, and therefore the distillation loss value is reduced. In the present embodiment, according to the formula
Figure BDA0002326852510000065
In part, it can be seen that, in addition to reducing the loss value by updating the parameters of the student model so that the first data is close to the supervised data, the student model can also reduce the loss value by updating the parameters of the student model when the difference between the supervised data and the first data is too largeNew parameters, in particular parameters of the second output layer, for improving the second data
Figure BDA0002326852510000066
Distillation loss values can also be reduced, in which case adjustments to other parameters of the student model are reduced, and the adverse effects of dirty or overly difficult data on the student model are reduced. Meanwhile, in order to avoid that the student model adjusts the parameters of the second output layer too heavily, the value of the second data is only increased to reduce the distillation loss value, so that more knowledge cannot be transferred from the teacher model, a restriction factor is also set in the formula
Figure BDA0002326852510000071
When the second data is too low, the value of the restriction factor is increased, so that the condition that the student model only blindly adjusts the parameters of the second output layer to reduce the second data is avoided.
And step S16, updating parameters of the student model based on the distillation loss value.
And updating parameters of the model, including all parameters of the student model, wherein the parameters of the first output layer are adjusted to have the strongest influence on the first data, and the parameters of the second output layer are adjusted to have the strongest influence on the first data. Therefore, parameters of the distillation position can be trained and updated better according to the distillation loss value, adverse effects of dirty data or over-difficult data on the student model can be reduced through the distillation loss function of any one of the embodiments, training is more efficient, and the output result of the trained student model is more accurate.
In one embodiment, as shown in FIG. 2, the knowledge-based distillation model training method 10, wherein the training set further comprises: standard marking data corresponding to the training data one by one; the knowledge-distillation-based model training method 10 further includes: step S17, obtaining student output data output by the student model based on the training data; step S18, based on the standard marking data and the student output data, obtaining a task loss value according to the task loss function; meanwhile, step S16 includes: and updating parameters of the student model based on the total loss value of the task loss value and the distillation loss value.
In order to train a student model completely and better, task-related training is carried out on the student model, namely according to the actual task to be completed of the model, training data are input into the student model through step S17, final student output data, namely recognition results, clustering results and the like, are output through structures in all the student models, and then through step S18, a task loss value is obtained according to a task loss function based on comparison of the student output data and standard labeling data of the training data; then, in step S16, the parameters of the student model are updated according to the total loss value of the mission loss value and the distillation loss value. Therefore, the student model is trained comprehensively and accurately, and the influence of local distillation on the overall result of the model is avoided.
In an embodiment, the knowledge-based distillation model training method 10 further includes deleting the second output layer after the total loss value is lower than the training threshold value through multiple iterations, so as to obtain the trained student model. When the output of the student model is sufficiently converged, namely the total loss value is lower than a training threshold value, the parameter update of the student model can be considered to meet the requirement, and at the moment, the preset second output layer can be deleted to obtain the student model completing the training. In the embodiment provided by the disclosure, the second output layer is only used for providing second data in the knowledge distillation-based model training process to achieve the effects and effects described in the foregoing embodiments, and after the parameter update meets the requirement, the existence significance of the second output layer also disappears, the original structure of the student model can be recovered, the storage space of the student model is reduced, and the unconscious calculation is reduced. This is particularly true when multiple distillation locations are distilled simultaneously. The original structure of the student model can be restored through the method of the example, and the universality of the embodiment of the disclosure is also improved.
Based on the same inventive concept, fig. 3 illustrates an image processing method 20 provided by the embodiment of the present disclosure, which includes: step S21, acquiring an image; step S22, extracting image features of the image through a model to obtain an image recognition result, where the model is a student model obtained by the knowledge distillation-based model training method 10 according to any one of the foregoing embodiments. In some scenes, the student model is used for image recognition, the student model obtained by the knowledge distillation-based model training method 10 is more efficient in training, simple in structure and higher in operation speed, can be used in terminal equipment, and ensures the accuracy of an image processing result.
Based on the same inventive concept, fig. 4 shows a knowledge-based training device 100 provided by an embodiment of the present disclosure, which is applied to a student model, and as shown in fig. 4, the knowledge-based training device 100 includes: a model building module 110 for setting a second output layer identical to the first output layer at the distillation position according to the distillation position; a first obtaining module 120, configured to obtain a training set, where the training set includes a plurality of training data; a data processing module 130, configured to obtain first data output by the first output layer and second data output by the second output layer based on the training data; a second obtaining module 140, configured to obtain supervision data output by the teacher model on a teacher layer corresponding to the distillation position based on the training data, where the teacher model is a complex model that has been trained and performs the same task as the student model; a loss calculation module 150, configured to obtain a distillation loss value according to a distillation loss function based on a difference between the supervision data and the first data and the second data; and the parameter adjusting module 160 is used for updating the parameters of the student model based on the distillation loss value.
In one example, the distillation loss function is set such that the gap is positively correlated with the value of distillation loss; and as the second data increased, the distillation loss value decreased.
In one example, the distillation loss function is:
Figure BDA0002326852510000081
wherein the content of the first and second substances,
Figure BDA0002326852510000082
to supervise countingAccording to the above-mentioned technical scheme,
Figure BDA0002326852510000083
as the first data, it is the first data,
Figure BDA0002326852510000084
is the second data.
In one example, the training set further comprises: standard marking data corresponding to the training data one by one; the data processing module 130 is further configured to: obtaining student output data output by the student model based on the training data; the loss calculation module 150 is further configured to: based on the standard marking data and the student output data, obtaining a task loss value according to a task loss function; the parameter adjustment module 160 is further configured to: and updating parameters of the student model based on the total loss value of the task loss value and the distillation loss value.
In one example, model building module 110 is further configured to: and after the total loss value is lower than the training threshold value through multiple iterations, deleting the second output layer to obtain the student model which completes the training.
In one example, the distillation positions include one or more of: any feature extraction layer in the student model and a full connection layer of the student model.
With respect to the knowledge-based distillation model training apparatus 100 in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
Based on the same inventive concept, fig. 5 illustrates an image processing apparatus 200 according to an embodiment of the disclosure, and as shown in fig. 5, the image processing apparatus 200 includes: an image acquisition module 210 for acquiring an image; and an image recognition module 220, configured to extract image features of the image through a model to obtain an image recognition result, where the model is a student model obtained through the knowledge-based distillation model training method 10 according to the first aspect.
With regard to the image processing apparatus 200 in the above-described embodiment, the specific manner in which the respective modules perform operations has been described in detail in the embodiment related to the method, and will not be described in detail here.
As shown in fig. 6, one embodiment of the present disclosure provides an electronic device 300. The electronic device 300 includes a memory 301, a processor 302, and an Input/Output (I/O) interface 303. The memory 301 is used for storing instructions. A processor 302 for calling the instructions stored in the memory 301 to execute the neural network compression method or the image processing method of the embodiments of the present disclosure. Wherein the processor 302 is connected to the memory 301, the I/O interface 303, respectively, for example via a bus system and/or other form of connection mechanism (not shown). The memory 301 may be used to store programs and data, including programs of a neural network compression method or an image processing method involved in the embodiments of the present disclosure, and the processor 302 executes various functional applications and data processing of the electronic device 300 by running the programs stored in the memory 301.
The processor 302 in the embodiment of the present disclosure may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and the processor 302 may be one or a combination of a Central Processing Unit (CPU) or other Processing units with data Processing capability and/or instruction execution capability.
Memory 301 in the disclosed embodiments may comprise one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile Memory may include, for example, a Random Access Memory (RAM), a cache Memory (cache), and/or the like. The non-volatile Memory may include, for example, a Read-only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), a Solid-State Drive (SSD), or the like.
In the embodiment of the present disclosure, the I/O interface 303 may be used to receive input instructions (e.g., numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device 300, etc.), and may also output various information (e.g., images or sounds, etc.) to the outside. The I/O interface 303 in the embodiments of the present disclosure may include one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, a touch panel, and the like.
It is to be understood that although operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.
The methods and apparatus related to embodiments of the present disclosure can be accomplished with standard programming techniques with rule-based logic or other logic to accomplish the various method steps. It should also be noted that the words "means" and "module" as used herein, and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving input.
Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code, which is executable by a computer processor for performing any or all of the described steps, operations, or procedures.
The foregoing description of the implementations of the disclosure has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosure. The embodiments were chosen and described in order to explain the principles of the disclosure and its practical application to enable one skilled in the art to utilize the disclosure in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims (11)

1. A knowledge-distillation-based model training method, wherein the method is applied to student models, the method comprising:
according to the distillation position, arranging a second output layer which is the same as the first output layer of the distillation position;
acquiring a training set, wherein the training set comprises a plurality of training data;
obtaining first data output by the first output layer and second data output by the second output layer based on the training data;
acquiring supervision data output by a teacher model on a teacher layer corresponding to the distillation position based on the training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model;
obtaining a distillation loss value according to a distillation loss function based on the difference between the supervision data and the first data and the second data;
updating parameters of the student model based on the distillation loss value.
2. The knowledge-based distillation model training method of claim 1, wherein the distillation loss function is set such that the gap is positively correlated with the distillation loss value; and as the second data increases, the distillation loss value decreases.
3. The knowledge-based distillation model training method of claim 2, wherein the distillation loss function is:
Figure FDA0002326852500000011
wherein the content of the first and second substances,
Figure FDA0002326852500000012
in order to be able to monitor the data,
Figure FDA0002326852500000013
in order to be able to process the first data,
Figure FDA0002326852500000014
is the second data.
4. The knowledge-based distillation model training method according to any one of claims 1 to 3, wherein,
the training set further comprises: standard marking data corresponding to the training data one by one;
the method further comprises the following steps: obtaining student output data output by the student model based on the training data; and obtaining a task loss value according to a task loss function based on the standard marking data and the student output data;
updating parameters of the student model based on the distillation loss, including: updating parameters of the student model based on the total loss value of the mission loss value and the distillation loss value.
5. The knowledge-based distillation model training method of claim 4, wherein the method further comprises: and deleting the second output layer after the total loss value is lower than a training threshold value through multiple iterations to obtain the student model completing training.
6. The knowledge-based distillation model training method of claim 1, wherein the distillation position comprises one or more of: any one feature extraction layer in the student model, and a full connection layer of the student model.
7. An image processing method, wherein the method comprises:
acquiring an image;
and extracting image characteristics of the image through a model to obtain an image recognition result, wherein the model is a student model obtained through the knowledge distillation-based model training method according to any one of claims 1-6.
8. A knowledge-distillation-based model training apparatus, for application to student models, the apparatus comprising:
the model building module is used for setting a second output layer which is the same as the first output layer at the distillation position according to the distillation position;
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of training data;
the data processing module is used for obtaining first data output by the first output layer and second data output by the second output layer based on the training data;
the second acquisition module is used for acquiring supervision data output by a teacher model on a teacher layer corresponding to the distillation position based on the training data, wherein the teacher model is a complex model which is trained and completes the same task as the student model;
a loss calculation module for obtaining a distillation loss value according to a distillation loss function based on the difference between the supervision data and the first data and the second data;
and the parameter adjusting module is used for updating the parameters of the student model based on the distillation loss value.
9. An image processing apparatus, wherein the apparatus comprises:
the image acquisition module is used for acquiring an image;
an image recognition module, configured to extract image features of the image through a model to obtain an image recognition result, where the model is a student model obtained through the knowledge distillation-based model training method according to any one of claims 1 to 6.
10. An electronic device, wherein the electronic device comprises:
a memory to store instructions; and
a processor for invoking the memory-stored instructions to perform the knowledge-distillation based model training method of any one of claims 1-6 or the image processing method of claim 7.
11. A computer readable storage medium having stored therein instructions which, when executed by a processor, perform the knowledge distillation based model training method of any one of claims 1-6 or the image processing method of claim 7.
CN201911319895.4A 2019-12-19 2019-12-19 Knowledge distillation-based model training method, image processing method and device Pending CN111242297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911319895.4A CN111242297A (en) 2019-12-19 2019-12-19 Knowledge distillation-based model training method, image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911319895.4A CN111242297A (en) 2019-12-19 2019-12-19 Knowledge distillation-based model training method, image processing method and device

Publications (1)

Publication Number Publication Date
CN111242297A true CN111242297A (en) 2020-06-05

Family

ID=70877619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911319895.4A Pending CN111242297A (en) 2019-12-19 2019-12-19 Knowledge distillation-based model training method, image processing method and device

Country Status (1)

Country Link
CN (1) CN111242297A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device
CN111898707A (en) * 2020-08-24 2020-11-06 鼎富智能科技有限公司 Model training method, text classification method, electronic device and storage medium
CN111950638A (en) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 Image classification method and device based on model distillation and electronic equipment
CN112116441A (en) * 2020-10-13 2020-12-22 腾讯科技(深圳)有限公司 Training method, classification method, device and equipment of financial risk classification model
CN112232397A (en) * 2020-09-30 2021-01-15 上海眼控科技股份有限公司 Knowledge distillation method and device of image classification model and computer equipment
CN112329885A (en) * 2020-11-25 2021-02-05 江苏云从曦和人工智能有限公司 Model training method, device and computer readable storage medium
CN112541122A (en) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN112561059A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method and apparatus for model distillation
CN112699678A (en) * 2021-03-24 2021-04-23 达而观数据(成都)有限公司 Model distillation method combined with dynamic vocabulary enhancement
CN112819050A (en) * 2021-01-22 2021-05-18 北京市商汤科技开发有限公司 Knowledge distillation and image processing method, device, electronic equipment and storage medium
CN113222175A (en) * 2021-04-29 2021-08-06 深圳前海微众银行股份有限公司 Information processing method and system
CN113222123A (en) * 2021-06-15 2021-08-06 深圳市商汤科技有限公司 Model training method, device, equipment and computer storage medium
CN113255763A (en) * 2021-05-21 2021-08-13 平安科技(深圳)有限公司 Knowledge distillation-based model training method and device, terminal and storage medium
CN113392984A (en) * 2021-06-29 2021-09-14 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for training a model
CN113705317A (en) * 2021-04-14 2021-11-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113707123A (en) * 2021-08-17 2021-11-26 慧言科技(天津)有限公司 Voice synthesis method and device
WO2022052997A1 (en) * 2020-09-09 2022-03-17 Huawei Technologies Co.,Ltd. Method and system for training neural network model using knowledge distillation
CN114298224A (en) * 2021-12-29 2022-04-08 云从科技集团股份有限公司 Image classification method, device and computer readable storage medium
WO2022077646A1 (en) * 2020-10-13 2022-04-21 上海依图网络科技有限公司 Method and apparatus for training student model for image processing
CN114677565A (en) * 2022-04-08 2022-06-28 北京百度网讯科技有限公司 Training method of feature extraction network and image processing method and device
WO2022178948A1 (en) * 2021-02-26 2022-09-01 平安科技(深圳)有限公司 Model distillation method and apparatus, device, and storage medium
CN115456167A (en) * 2022-08-30 2022-12-09 北京百度网讯科技有限公司 Lightweight model training method, image processing device and electronic equipment

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111554268B (en) * 2020-07-13 2020-11-03 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device
CN111950638A (en) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 Image classification method and device based on model distillation and electronic equipment
CN111950638B (en) * 2020-08-14 2024-02-06 厦门美图之家科技有限公司 Image classification method and device based on model distillation and electronic equipment
CN111898707A (en) * 2020-08-24 2020-11-06 鼎富智能科技有限公司 Model training method, text classification method, electronic device and storage medium
WO2022052997A1 (en) * 2020-09-09 2022-03-17 Huawei Technologies Co.,Ltd. Method and system for training neural network model using knowledge distillation
CN112232397A (en) * 2020-09-30 2021-01-15 上海眼控科技股份有限公司 Knowledge distillation method and device of image classification model and computer equipment
CN112116441B (en) * 2020-10-13 2024-03-12 腾讯科技(深圳)有限公司 Training method, classification method, device and equipment for financial risk classification model
CN112116441A (en) * 2020-10-13 2020-12-22 腾讯科技(深圳)有限公司 Training method, classification method, device and equipment of financial risk classification model
WO2022077646A1 (en) * 2020-10-13 2022-04-21 上海依图网络科技有限公司 Method and apparatus for training student model for image processing
CN112329885A (en) * 2020-11-25 2021-02-05 江苏云从曦和人工智能有限公司 Model training method, device and computer readable storage medium
CN112329885B (en) * 2020-11-25 2021-07-09 江苏云从曦和人工智能有限公司 Model training method, device and computer readable storage medium
CN112561059B (en) * 2020-12-15 2023-08-01 北京百度网讯科技有限公司 Method and apparatus for model distillation
CN112561059A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method and apparatus for model distillation
CN112541122A (en) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN112819050A (en) * 2021-01-22 2021-05-18 北京市商汤科技开发有限公司 Knowledge distillation and image processing method, device, electronic equipment and storage medium
CN112819050B (en) * 2021-01-22 2023-10-27 北京市商汤科技开发有限公司 Knowledge distillation and image processing method, apparatus, electronic device and storage medium
WO2022156331A1 (en) * 2021-01-22 2022-07-28 北京市商汤科技开发有限公司 Knowledge distillation and image processing method and apparatus, electronic device, and storage medium
WO2022178948A1 (en) * 2021-02-26 2022-09-01 平安科技(深圳)有限公司 Model distillation method and apparatus, device, and storage medium
CN112699678A (en) * 2021-03-24 2021-04-23 达而观数据(成都)有限公司 Model distillation method combined with dynamic vocabulary enhancement
CN113705317A (en) * 2021-04-14 2021-11-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113705317B (en) * 2021-04-14 2024-04-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113222175B (en) * 2021-04-29 2023-04-18 深圳前海微众银行股份有限公司 Information processing method and system
CN113222175A (en) * 2021-04-29 2021-08-06 深圳前海微众银行股份有限公司 Information processing method and system
CN113255763A (en) * 2021-05-21 2021-08-13 平安科技(深圳)有限公司 Knowledge distillation-based model training method and device, terminal and storage medium
CN113255763B (en) * 2021-05-21 2023-06-09 平安科技(深圳)有限公司 Model training method, device, terminal and storage medium based on knowledge distillation
CN113222123A (en) * 2021-06-15 2021-08-06 深圳市商汤科技有限公司 Model training method, device, equipment and computer storage medium
CN113392984B (en) * 2021-06-29 2022-10-14 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for training a model
CN113392984A (en) * 2021-06-29 2021-09-14 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for training a model
CN113707123B (en) * 2021-08-17 2023-10-20 慧言科技(天津)有限公司 Speech synthesis method and device
CN113707123A (en) * 2021-08-17 2021-11-26 慧言科技(天津)有限公司 Voice synthesis method and device
CN114298224A (en) * 2021-12-29 2022-04-08 云从科技集团股份有限公司 Image classification method, device and computer readable storage medium
CN114677565A (en) * 2022-04-08 2022-06-28 北京百度网讯科技有限公司 Training method of feature extraction network and image processing method and device
CN115456167A (en) * 2022-08-30 2022-12-09 北京百度网讯科技有限公司 Lightweight model training method, image processing device and electronic equipment
CN115456167B (en) * 2022-08-30 2024-03-12 北京百度网讯科技有限公司 Lightweight model training method, image processing device and electronic equipment

Similar Documents

Publication Publication Date Title
CN111242297A (en) Knowledge distillation-based model training method, image processing method and device
US10552737B2 (en) Artificial neural network class-based pruning
US9619749B2 (en) Neural network and method of neural network training
CN112801298B (en) Abnormal sample detection method, device, equipment and storage medium
JP7266674B2 (en) Image classification model training method, image processing method and apparatus
WO2023065859A1 (en) Item recommendation method and apparatus, and storage medium
CN110400575A (en) Interchannel feature extracting method, audio separation method and device calculate equipment
US20210224647A1 (en) Model training apparatus and method
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
US20200082213A1 (en) Sample processing method and device
JPWO2019146057A1 (en) Learning device, live-action image classification device generation system, live-action image classification device generation device, learning method and program
CN114974397A (en) Training method of protein structure prediction model and protein structure prediction method
JP2022543245A (en) A framework for learning to transfer learning
CN113326940A (en) Knowledge distillation method, device, equipment and medium based on multiple knowledge migration
CN114925320B (en) Data processing method and related device
WO2022265573A2 (en) Automatically and efficiently generating search spaces for neural network
Planas et al. Extrapolation with gaussian random processes and evolutionary programming
US11366984B1 (en) Verifying a target object based on confidence coefficients generated by trained models
CN113377964A (en) Knowledge graph link prediction method, device, equipment and storage medium
CN117315758A (en) Facial expression detection method and device, electronic equipment and storage medium
WO2023174189A1 (en) Method and apparatus for classifying nodes of graph network model, and device and storage medium
CN113743448B (en) Model training data acquisition method, model training method and device
WO2019200548A1 (en) Network model compiler and related product
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
JP2020052935A (en) Method of creating learned model, method of classifying data, computer and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200605