CN114972839B - Generalized continuous classification method based on online comparison distillation network - Google Patents

Generalized continuous classification method based on online comparison distillation network Download PDF

Info

Publication number
CN114972839B
CN114972839B CN202210326319.8A CN202210326319A CN114972839B CN 114972839 B CN114972839 B CN 114972839B CN 202210326319 A CN202210326319 A CN 202210326319A CN 114972839 B CN114972839 B CN 114972839B
Authority
CN
China
Prior art keywords
model
feature
student model
samples
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210326319.8A
Other languages
Chinese (zh)
Other versions
CN114972839A (en
Inventor
冀中
黎晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210326319.8A priority Critical patent/CN114972839B/en
Publication of CN114972839A publication Critical patent/CN114972839A/en
Application granted granted Critical
Publication of CN114972839B publication Critical patent/CN114972839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a generalized continuous classification method based on an online comparison distillation network, which establishes a classification model based on knowledge distillation; establishing a buffer area and updating the buffer area by using a reservoir sampling method; randomly sampling S samples from the buffer area and respectively inputting the S samples into a teacher model and a student model to obtain classification output and feature embedding of the corresponding models; calculating and classifying output quality scores according to a teacher model, adjusting knowledge distillation loss function weights of different samples, and calculating distillation lossFeature embedding between the two models is compared, and the distillation loss of the comparison relation between the two models is calculatedCalculating self-supervision loss of student modelAnd supervised contrast learning lossCalculating the cross entropy classification loss of student modelAnd determining the parameters of the maximum optimization objective optimization student model by adding the loss weights. The parameters of the teacher model are updated by the parameters of the student model. The invention has good classification accuracy for both new tasks and old tasks.

Description

Generalized continuous classification method based on online comparison distillation network
Technical Field
The invention relates to a generalized continuous classification method, in particular to a generalized continuous classification method based on an online comparison distillation network.
Background
Currently, in recent years, deep learning has achieved good results in computer vision tasks such as image classification, object detection, and semantic segmentation. However, when a neural network trained on an old task is trained directly on a new task, the new task can severely interfere with the performance of the old task, creating a catastrophic forgetting (Catastrophic Forgetting) problem. Having the neural network train from scratch obviously consumes more time and computing resources, and the data of the previous task is not necessarily re-acquired due to privacy issues, etc. The human beings have the ability to learn continuously, and can learn new knowledge quickly on the basis of old knowledge without compromising the stability of previously learned knowledge. It is desirable that the neural network has the ability to learn continuously in humans, and continuous learning (Continual Learning), also called incremental learning (INCREMENTAL LEARNING), is proposed to overcome the catastrophic forgetfulness problem. In recent years, a large amount of continuous learning work has employed the idea of empirical playback (Experience Replay), storing samples of a portion of an old task, and playing back stored samples while training a new task to alleviate the catastrophic forgetfulness problem.
In the existing continuous learning technology, it is often required to assume that the categories between the tasks are mutually disjoint, i.e., the categories in the new task are all not found in the old task, a clear task boundary exists between the tasks, and in the real world task, such a priori knowledge is most likely not present. Many existing techniques utilize a priori knowledge that is unlikely to occur in such a real-world task, simplifying the difficulty of continuing the learning problem. For example, when the model output of the old sample at the past moment is used for outputting the model output of the regular old sample at the current moment to relieve catastrophic forgetting, the dimension of the old model output and the dimension of the new model output become inconsistent due to the arrival of new categories, and under the condition that the categories between tasks are assumed to be complementary, the output of the new model can be only partially overlapped with the old model. The prior continuous method using the mutually disjoint categories between tasks can not be applied to the setting of generalized continuous learning. For this reason, generalized continuous learning (General Continual Learning) technology to solve the catastrophic forgetfulness problem in real-world scenarios is attracting attention. The goal of generalized continuous learning is to consolidate learned knowledge from non-stationary infinite data streams and learn new knowledge quickly. Under the setting of generalized continuous learning, the categories among each task may be intersected, new samples of old categories may appear in new tasks, and the previous method for solving continuous learning by means of priori knowledge which does not necessarily exist in the real world is difficult to apply to generalized continuous learning.
Generalized continuous learning is a general continuous learning scenario, and can also be applied to classical class incremental learning (CLASS INCREMENTAL LEARNING), task incremental learning (TASK INCREMENTAL LEARNING) and Domain incremental learning (Domain INCREMENTAL LEARNING) scenarios. However, the specific prior knowledge in these classical scenes cannot be utilized to alleviate catastrophic forgetfulness when image classification is performed in a scene of generalized continuous learning. This means that at experience playback, some inherent non-specific scene information must be mined to consolidate knowledge of the old task.
Disclosure of Invention
The invention provides a generalized continuous classification method based on an online comparison distillation network for solving the technical problems in the prior art.
The invention adopts the technical proposal for solving the technical problems in the prior art that: a generalized continuous classification method based on an online comparison distillation network comprises the following steps:
Step 1, establishing a classification model based on knowledge distillation, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are respectively provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of a student model; initializing parameters of a teacher model and a student model and giving a buffer zone with a fixed size;
Step 2, counting the number of samples currently encountered when a batch of data stream containing R samples arrives, and updating a buffer area by using a reservoir sampling method;
Step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classified output data of the teacher model and the student model corresponding to the S samples through processing of respective feature encoders and classifiers of the teacher model and the student model; respectively obtaining feature embedded data of a teacher model and a student model corresponding to the S samples through processing of respective feature encoders and feature mappers of the two samples;
Step 4, calculating the mass fraction of the classified output data of the teacher model, adjusting the weight of the online knowledge distillation loss function of different samples according to the mass fraction of the classified output data of the teacher model, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing the characteristic embedded data between the teacher model and the student model, and calculating the comparative relation distillation loss of the teacher model and the student model
Step 6, utilizing self-supervision learning and supervision contrast learning to help the student model to extract discriminant features, and calculating self-supervision loss of the student modelLearning loss/>, as compared with supervision
Step 7, based on experience playback, calculating cross entropy classification loss of student model
Step 8, calculating the total optimization target of the student model Alpha 1 to alpha 3 are hyper-parameters of each corresponding loss function; optimizing parameters of the student model by using a random gradient descent algorithm;
and 9, directly updating the parameters of the teacher model by using the parameters of the student model.
Further, in step 2, assuming that the non-stationary data stream is composed of n sample-disjoint tasks { T 1,T2,...,Tn }, the training set of each task T n is composed of labeled dataThe composition, wherein m is the number of samples of the task T n training set, x i is the ith image sample in the task T n training set, and y i is the category marked by the ith image sample x i in the task T n training set; buffer/>Is/>X j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; the reservoir sampling method comprises the following steps:
Step A1, judging the number num of samples and the buffer capacity of the current encountered sample The size of the two parts ifStoring samples (x i,yi) directly to the buffer/>In (a) and (b); x i is the ith image sample in the training set of task T n, y i is the category marked by the ith image sample x i in the training set of task T n;
Step A2, if Generating a random integer rand_num, wherein the minimum value of the random integer is 0, and the maximum value of the random integer is num-1; if/>Replacing samples in the buffer with samples (x i,yi) (x rand_num,yrand_num);xrand_num denotes buffer/>, indexed rand_numY rand_num denotes the buffer/>, indexed rand_numIs included.
Further, in step 4, the method for calculating the quality score of the teacher model classified output data is as follows:
Setting: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)The sample x j is represented to be sequentially processed by a feature encoder and a classifier of the teacher model to obtain classified output data; omega (x j) is the mass fraction of the teacher model class output data corresponding to sample x j; the formula for ω (x j) is shown below:
Wherein:
ρ represents a temperature coefficient;
c represents the number of all possible categories;
exp (·) represents an exponential function based on a natural constant e;
Output data for classification/> The classification output data of the corresponding category y j;
Representing classified output data/> The classification of each category of the data is output.
Further, in step 4, set upThe sample x j is represented to be sequentially processed by a feature encoder and a classifier of the student model to obtain classified output data; calculating online distillation loss/>, of teacher model and student modelThe method of (2) is different from the following method:
wherein: | 2 denotes the l 2 norm; Representing a mathematical expectation function.
Further, in step 5, set: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After a sample x j is input into a teacher model, feature embedded data of the teacher model is obtained through a feature encoder and a feature mapper; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; z t is that after all samples x j of the current batch are input into a teacher model, all teacher model feature embedded data/>, which is obtained through feature encoder and feature mapper processingIs a collection of (3); z s is that after all samples x j of the current batch are input into the student model, all student model feature embedded data/>, which are obtained through a feature encoder and a feature mapperIs a collection of (3); /(I)Representing feature embedded data sampled from z s; z t+ represents and/>Teacher model features with the same class labels are embedded into the dataset; /(I)Representing feature embedded data sampled from z t+; /(I)Representing feature embedded data sampled from z t; calculating comparative relation distillation loss/>, of teacher model and student modelThe method of (2) is as follows:
Wherein:
representing a mathematical expectation function;
| 2 denotes the l 2 norm;
log (·) represents a natural log function based on a natural constant e;
representing a judging function for judging feature embedded data/> And/>Whether derived from their joint distribution/>
Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/>
(. Cndot.) T represents a transpose;
exp (·) represents an exponential function based on a natural constant e;
τ represents the temperature coefficient.
Further, the step 6 includes the following sub-steps:
Step B1, setting Θ ttt as a feature encoder, a classifier and a feature mapper of the teacher model, and setting Θ sss as a feature encoder, a classifier and a feature mapper of the teacher model; each training sample (x, y) of the student model is subjected to random geometric transformation once to obtain amplified training samples Where x represents the image sample and y is the category marked by the image sample x,/>For geometrically transformed image samples,/>A label that is geometrically transformed; amplified training samples/>Inputting the data into a student model, and processing the data by a feature encoder and a feature mapper of the student model to obtain corresponding student model feature data F s and feature embedded data/>Wherein:
Step B2, inputting the obtained student model feature data F s to a multi-layer sensor In the method for judging training samples/>The kind of geometric transformation is performed; the calculation formula for setting the output of the multilayer sensor as S s,Ss is as follows:
step B3, calculating self-supervision loss The calculation formula of (2) is as follows:
Wherein the method comprises the steps of Representing a mathematical expectation function;
softmax (·) represents the softmax function;
l (·) represents a cross entropy loss function;
step B4, setting: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; /(I)Feature embedded data/>, representing the resulting overall student modelAnd/>Is a collection of (3); /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)Representation and/>Student model features with the same class labels are embedded into the dataset; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; based on the original feature embedded data and the amplified feature embedded data, performing supervision and contrast learning by utilizing the feature embedded data in the student model, and monitoring and contrast learning a loss function/>The calculation formula of (2) is as follows:
Wherein:
Wherein the method comprises the steps of Representing mathematical expectations;
| 2 denotes the l 2 norm;
log (·) represents a natural log function based on a natural constant e;
representing feature embedded data/> And/>Is a distance of (2);
representing feature embedded data/> And/>Is a distance of (2);
exp (·) represents an exponential function based on a natural constant e;
Representing a transpose;
τ represents a temperature coefficient;
step B5, self-monitoring loss Learning loss/>, as compared with supervisionCombining to obtain cooperative contrast lossHelping student model to better extract discriminant features,/>The calculation formula of (2) is as follows:
Further, in step B1, the geometric transformation includes rotating, scaling and adjusting the aspect ratio of the image.
Further, in step 7, assuming that the non-stationary data stream is composed of n sample-disjoint tasks { T 1,T2,...,Tn }, let x denote task T n and bufferY is the category marked by the image sample x; cross entropy classification loss/>, of student modelThe calculation formula of (2) is as follows:
Wherein:
representing a mathematical expectation function;
softmax (·) represents the softmax function;
l (·) represents a cross entropy loss function;
r s (x) represents classified output data of the image sample x after sequentially passing through the feature encoder and the classifier of the student model.
Further, in step 9, a specific method for updating the parameters of the teacher model by using the parameters of the student model is as follows:
Setting Θ ttt as a feature encoder, a classifier and a feature mapper of the teacher model, and setting Θ sss as a feature encoder, a classifier and a feature mapper of the teacher model; the updating method of the teacher model parameters comprises the following steps:
Θt←mΘt+(1-m)[(1-X)Θt+XΘs];
Φt←mΦt+(1-m)[(1-X)Φt+XΦs];
Ψt←mΨt+(1-m)[(1-X)Ψt+XΨs];
where m represents a momentum factor and X obeys a Bernoulli distribution (also referred to as a 0-1 distribution), defined as:
P(X=k)=pk(1-p)1-k,k={0,1};
The value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model is controlled through the Bernoulli probability p.
Further, the calculation formula of the momentum factor m is as follows:
m=min(itera/(itera+1),η);
Wherein itera is the iteration number of the current student model, min (itera/(itera +1), η) represents the smaller one of itera/(itera +1) and η, η is a constant, and is generally set to 0.999.
The invention has the advantages and positive effects that: according to the generalized continuous classification method based on the online comparison distillation network, the old task knowledge is effectively consolidated by using a teacher-student framework in online knowledge distillation, so that the model has good classification accuracy for both new tasks and old tasks. In the training stage, the training strategy of contrast learning is introduced into online knowledge distillation, the teacher model realizes the accumulation of knowledge by integrating weights of student models at all moments, and the student models relieve catastrophic forgetting by distilling classification output data and contrast relations to the teacher model. The teacher model and the student model cooperate with each other, so that the student model keeps the performance of the old task, the teacher model accumulates more balanced weights on the old task and the new task when accumulating weights, and the teacher model can better guide the student model to consolidate the knowledge of the old task when the student model trains the new task. In the test stage, the invention adopts the teacher model for testing, because the teacher model integrates the advantages of distinguishing different categories of student models at different moments, the teacher model has good classification performance for all the categories. Therefore, the invention can effectively integrate the advantages of the student network and improve the classification accuracy of the teacher network during testing.
Drawings
FIG. 1 is a workflow diagram of a generalized continuous classification method based on an online comparative distillation network according to the present invention.
Detailed Description
For a further understanding of the invention, its features and advantages, reference is now made to the following examples, which are illustrated in the accompanying drawings in which:
Referring to fig. 1, a generalized continuous classification method based on an online comparative distillation network includes the following steps:
Step 1, establishing a classification model based on knowledge distillation, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are respectively provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of a student model; parameters of the teacher model and the student model are initialized and a buffer of a fixed size is given.
And 2, counting the number of samples currently encountered when a batch data stream containing R samples arrives, and updating a buffer zone by using a reservoir sampling method.
Step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classified output data of the teacher model and the student model corresponding to the S samples through processing of respective feature encoders and classifiers of the teacher model and the student model; respectively obtaining feature embedded data of a teacher model and a student model corresponding to the S samples through processing of respective feature encoders and feature mappers of the two samples; namely: processing the S samples sequentially through a feature encoder and a classifier of the teacher model to obtain a classified output data set of the teacher model; sequentially processing the characteristic encoder and the classifier of the student model to obtain a classified output data set of the student model; sequentially processing by a feature encoder and a feature mapper of the teacher model to obtain a feature embedded data set of the teacher model; and sequentially processing the characteristics of the student model by a characteristic encoder and a characteristic mapper to obtain a characteristic embedded data set of the student model.
Step 4, calculating the mass fraction of the classified output data of the teacher model, adjusting the weight of the online knowledge distillation loss function of different samples according to the mass fraction of the classified output data of the teacher model, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing the characteristic embedded data between the teacher model and the student model, and calculating the comparative relation distillation loss of the teacher model and the student model
Step 6, utilizing self-supervision learning and supervision contrast learning to help the student model to extract discriminant features, and calculating self-supervision loss of the student modelLearning loss/>, as compared with supervision
Step 7, based on experience playback, calculating cross entropy classification loss of student model
Step 8, calculating the total optimization target of the student model Alpha 1 to alpha 3 are hyper-parameters of each corresponding loss function; and optimizing parameters of the student model by using a random gradient descent algorithm.
And 9, directly updating the parameters of the teacher model by using the parameters of the student model.
Preferably, in step 2, it may be assumed that the non-stationary data stream is composed of n sample-disjoint tasks { T 1,T2,...,Tn }, the training set of each task T n is composed of tagged dataThe composition, wherein m is the number of samples of the task T n training set, x i is the ith image sample in the task T n training set, and y i is the category marked by the ith image sample x i in the task T n training set; buffer/>Is/>X j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; the method of reservoir sampling may comprise the steps of:
Step A1, judging the size between the number of samples num currently encountered and the buffer capacity B, if num is less than or equal to B, directly storing the samples (x i,yi) into the buffer In (a) and (b); x i is the ith image sample in the training set of task T n, and y i is the category labeled by the ith image sample x i in the training set of task T n.
Step A2, ifGenerating a random integer rand_num, wherein the minimum value of the random integer is 0, and the maximum value of the random integer is num-1; if/>Replacing samples in the buffer with samples (x i,yi) (x rand_num,yrand_num);xrand_num denotes buffer/>, indexed rand_numY rand_num denotes the buffer/>, indexed rand_numIs included.
Preferably, in step 4, the method for calculating the quality score of the teacher model classified output data may be as follows:
The method can be provided with: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)The sample x j is represented to be sequentially processed by a feature encoder and a classifier of the teacher model to obtain classified output data; omega (x j) is the mass fraction of the teacher model class output data corresponding to sample x j; the formula for ω (x j) can be as follows:
Wherein: ρ represents a temperature coefficient; c represents the number of all possible categories; exp (·) represents an exponential function based on a natural constant e; Output data for classification/> The classification output data of the corresponding category y j; /(I)Representing classified output dataThe classification of each category of the data is output.
Preferably, in step 4, there is providedThe sample x j is represented to be sequentially processed by a feature encoder and a classifier of the student model to obtain classified output data; calculating online distillation loss/>, of teacher model and student modelThe method of (2) is as follows:
wherein: | 2 denotes the l 2 norm; representing a mathematical expectation function; exp (·) represents an exponential function based on a natural constant e.
Preferably, in step 5, it may be provided that: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After a sample x j is input into a teacher model, feature embedded data of the teacher model is obtained through a feature encoder and a feature mapper; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; z t is that after all samples x j of the current batch are input into a teacher model, all teacher model feature embedded data/>, which is obtained through feature encoder and feature mapper processingIs a collection of (3); z s is that after all samples x j of the current batch are input into the student model, all student model feature embedded data/>, which are obtained through a feature encoder and a feature mapperIs a collection of (3); /(I)Representing feature embedded data sampled from z s; z t+ represents and/>Teacher model features with the same class labels are embedded into the dataset; /(I)Representing feature embedded data sampled from z t+; /(I)Representing feature embedded data sampled from z t; calculating comparative relation distillation loss/>, of teacher model and student modelThe method of (2) can be as follows:
Wherein: Representing a mathematical expectation function; | 2 denotes the l 2 norm; log (·) represents a natural log function based on a natural constant e; /(I) Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/> Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/> Representing a transpose; exp (·) represents an exponential function based on a natural constant e; τ represents the temperature coefficient.
Preferably, the step 6 may include the following sub-steps:
Step B1, a feature encoder, a classifier and a feature mapper corresponding to Θ ttt and to the teacher model can be provided, a feature encoder, a classifier and a feature mapper corresponding to Θ sss and to the teacher model can be provided, each training sample (x, y) of the student model is subjected to a random geometric transformation to obtain amplified training samples Where x represents the image sample and y is the category marked by the image sample x,/>For geometrically transformed image samples,/>A label that is geometrically transformed; amplified training samples/>Inputting the data into a student model, and processing the data by a feature encoder and a feature mapper of the student model to obtain corresponding student model feature data F s and feature embedded data/>Wherein:
Step B2, the obtained student model feature data F s can be input into a multi-layer sensor In the method for judging training samples/>The kind of geometric transformation is performed; the calculation formula for S s,Ss, which can set the output of the multi-layer sensor, can be as follows:
step B3, calculating self-supervision loss The calculation formula of (2) can be as follows:
Wherein the method comprises the steps of Representing a mathematical expectation function; softmax (·) represents the softmax function; l (·) represents a cross entropy loss function;
Step B4, the following steps are set: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; /(I)Feature embedded data/>, representing the resulting overall student modelAnd/>Is a collection of (3); /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)Representation and/>Student model features with the same class labels are embedded into the dataset; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; based on the original feature embedded data and the amplified feature embedded data, the original feature embedded data and the amplified feature embedded data in the student model can be utilized to conduct supervision and contrast learning, and a loss function/>, of the supervision and contrast learning is utilizedThe calculation formula of (2) can be as follows:
Wherein:
Wherein the method comprises the steps of Representing mathematical expectations; | 2 denotes the l 2 norm; log (·) represents a natural log function based on a natural constant e; /(I)Representing feature embedded data/>And/>Is a distance of (2); /(I)Representing feature embedded data/>And/>Is a distance of (2); exp (·) represents an exponential function based on a natural constant e; /(I)Representing a transpose; τ represents a temperature coefficient;
Step B5, self-monitoring loss Learning loss/>, as compared with supervisionCombining to obtain the collaborative contrast loss/>Helping student model to better extract discriminant features,/>The calculation formula of (2) can be as follows:
preferably, in step B1, the geometric transformation may comprise rotating, scaling and adjusting the aspect ratio of the image.
Preferably, in step 7, it may be assumed that the non-stationary data stream consists of n sample-disjoint tasks { T 1,T2,…,Tn }, let x denote task T n and bufferY is the category marked by the image sample x; cross entropy classification loss/>, of student modelThe calculation formula of (2) can be as follows:
Wherein: Representing a mathematical expectation function; softmax (·) represents the softmax function; l (·) represents a cross entropy loss function; r s (x) represents classified output data of the image sample x after sequentially passing through the feature encoder and the classifier of the student model.
Preferably, in step 9, a specific method for updating the parameters of the teacher model by using the parameters of the student model may be as follows:
Setting Θ ttt as a feature encoder, a classifier and a feature mapper of the teacher model, and setting Θ sss as a feature encoder, a classifier and a feature mapper of the teacher model; the updating method of the teacher model parameters can be as follows:
Θt←mΘt+(1-m)[(1-X)Θt+XΘs];
Φt←mΦt+(1-m)[(1-X)Φt+XΦs];
Ψt←mΨt+(1-m)[(1-X)Ψt+XΨs];
where m represents a momentum factor and X obeys a Bernoulli distribution (also referred to as a 0-1 distribution), which can be defined as:
P(X=k)=pk(1-p)1-k,k={0,1};
wherein the value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model can be controlled through the Bernoulli probability p.
Preferably, the calculation formula of the momentum factor m can be as follows:
m=min(itera/(itera+1),η);
Wherein itera is the iteration number of the current student model, min (itera/(itera +1), η) represents the smaller one of itera/(itera +1) and η, η is a constant, and can be set to 0.999.
The workflow and working principle of the invention are further described in the following with a preferred embodiment of the invention:
The generalized continuous learning is to consolidate old knowledge from non-stationary data streams and accumulate new knowledge at the same time, and finally complete classification prediction of all the seen class images. Assuming that the non-stationary data stream consists of N sample disjoint tasks { T 1,T2,...,TN }, the training set of each task T n is composed of tagged data Composition, where m is the number of samples of the task T n training set, x i is the ith image sample in the task T n training set, and y i is the category labeled by the ith image sample x i in the task T n training set. In the test stage, the generalized continuous learning method can finish classification tasks for all the categories which are seen at present. The test set for each task T n is made up of tagged data/>Composition, where p is the number of samples of task T n test set, x q is the qth image sample in task T n test set, and y q is the category marked by the qth image sample x q in task T n test set. The generalized continuous learning task is to conduct category prediction on all task { T 1,T2,...,Tn } test sets trained currently.
FIG. 1 is a workflow diagram of a generalized continuous classification method based on an online comparative distillation network according to the present invention. Wherein,Representing a capacity of/>X j is the j-th image sample in the buffer, and y j is the class marked by the j-th image sample x j in the buffer. /(I)Representing on-line distillation loss,/>Representing comparative distillation losses. Setting Θ ttt as the feature encoder, classifier and feature mapper of teacher model, setting Θ sss as the feature encoder, classifier and feature mapper of student model,
The invention discloses a generalized continuous classification method based on an online comparison distillation network, which comprises the following steps:
Step1, before a task starts, firstly initializing parameters of a teacher model and a student model and giving a buffer zone with a fixed size: the component (a) is t=Θst=Φst=Ψs,
Step 2, counting the number num of samples currently encountered when a batch data stream containing bsz samples arrives, and updating a buffer zone by using a reservoir sampling methodThis ensures that the probability that all samples are stored in the buffer is equal. For a particular sample, the specific steps of sampling by using the reservoir include:
(1) Judging the number of samples num and the buffer capacity of the current encountered sample The size of the space, if/>Storing the samples (x i,yi) directly into a buffer;
(2) If it is A random integer rand_num is generated, the minimum value of the random integer is 0, and the maximum value is num-1. If/>Replacing samples in the buffer with samples (x i,Yi) (x rand_num,yrand_num);xrand-num denotes index/>Y rand_num denotes the index/>Image sample tags in a buffer of (a). /(I)
Step 3, from the buffer areaRandomly sampling S samples x j to consolidate old knowledge, respectively inputting the S samples x j into a teacher model and a student model, and respectively obtaining classified output data of the teacher model and the student model through a feature encoder and a classifier, wherein the classified output data are as follows:
the feature embedded data of the teacher model and the student model obtained through the feature encoder and the feature mapper are respectively as follows:
Step 4, setting: The sample x j is represented to be sequentially processed by a feature encoder and a classifier of the teacher model to obtain classified output data; omega (x j) is the mass fraction of the teacher model class output data corresponding to sample x j; /(I) Representing classified output data/>Classification output data of each category; let/>The sample x j is represented to be sequentially processed by a feature encoder and a classifier of the student model to obtain classified output data; /(I)Output data for classification/>The classification of the corresponding category y j.
The quality fraction omega (x j) of the classified output data of each sample is obtained through the calculation and sampling of the perceptron:
Where ρ is the temperature coefficient, C represents the number of all possible classes, exp (·) represents an exponential function based on the natural constant e.
Calculating an on-line distillation loss according to formulas (1), (2) and (5)
Wherein | 2 denotes the l 2 norm,Representing a mathematical expectation function. By giving the difference between outputs of teacher model and student model/>Weight ω (x j) to let the student model focus more on samples with high mass fractions of the samples.
Step 5, comparing the characteristic embedded data between the teacher model and the student model, and calculating the comparison relation distillation loss according to formulas (3) and (4)
Wherein the method comprises the steps ofRepresenting a mathematical expectation function, log (·) representing a natural logarithmic function based on a natural constant e,/>After a sample x j is input into a teacher model, feature embedded data of the teacher model is obtained through a feature encoder and a feature mapper; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; z t is that after all samples x j of the current batch are input into a teacher model, all teacher model feature embedded data/>, which is obtained through feature encoder and feature mapper processingIs a collection of (3); z s is that after all samples x j of the current batch are input into the student model, all student model feature embedded data/>, which are obtained through a feature encoder and a feature mapperIs a collection of (3); Representing feature embedded data sampled from z s; z t+ represents and/> Teacher model features with the same class labels are embedded into the dataset; /(I)Representing feature embedded data sampled from z t+; /(I)Representing feature embedded data sampled from z t.
Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/>The calculation formula is as follows:
Where exp (·) represents an exponential function based on a natural constant e, | 2 represents the l 2 norm, (·) T represents the transpose, τ represents the temperature coefficient.
Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/>The calculation formula is as follows:
Where exp (·) represents an exponential function based on a natural constant e, | 2 represents the l 2 norm, (·) T represents the transpose, τ represents the temperature coefficient.
Step 6, utilizing self-supervision learning and supervision contrast learning to help the student model extract discriminant features, comprising the following specific steps:
(1) Each training sample (x, y) of the student model is subjected to random geometric transformation once to obtain amplified training samples Where x represents the image sample and y is the category marked by the image sample x,/>For geometrically transformed image samples,/>Is a geometrically transformed label. The geometric transformation includes rotating, scaling, and adjusting the aspect ratio of the image. Thus, the number of training images of the student model is doubled. For this set of images/>, with random geometric transformationsInputting them into the student network, corresponding student model features and feature embedded data are obtained:
(2) Inputting the obtained student model features into a multi-layer sensor In the method for judging training samples/>Types of geometric transformations:
(3) Calculating self-supervision loss />
Wherein the method comprises the steps ofRepresenting the mathematical expectation function, softmax (·) represents the softmax function, and l (·) represents the cross entropy loss function.
(4) Is provided withAfter the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; /(I)Feature embedded data/>, representing the resulting overall student modelAnd/>Is a collection of (3); /(I)Representing slaveEmbedding the characteristics obtained by sampling into data; /(I)Representation and/>Student model features with the same class labels are embedded into the dataset; the expression slave/> Embedding the characteristics obtained by sampling into data; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; based on the original feature embedded data and the amplified feature embedded data, performing supervision and contrast learning by utilizing the feature embedded data in the student model, and monitoring and contrast learning a loss function/>The calculation formula of (2) is as follows:
Wherein:
representing mathematical expectations; log (·) represents a natural log function based on a natural constant e;
representing feature embedded data/> And/>Is a distance of (2); /(I)Representing feature embedded data/>And/>Is a distance of (2); exp (·) represents an exponential function based on a natural constant e; | 2 denotes the l 2 norm; /(I)Representing a transpose; τ represents the temperature coefficient.
(5) For self-supervision lossLearning loss/>, as compared with supervisionCombining to obtain cooperative contrast lossHelping student models to better extract discriminant features:
step 7, based on experience playback, calculating the cross entropy classification loss of the student model:
Where x represents task T n and buffer Y is the category marked by the image sample x; Representing a mathematical expectation function; softmax (·) represents the softmax function; l (·) represents a cross entropy loss function; r s (x) represents classified output data of the image sample x after sequentially passing through the feature encoder and the classifier of the student model.
R s (x) represents the output of the image sample x through the feature encoder Θ s and classifier Φ s of the student model:
Step 8, calculating the total optimization target of the student model Optimizing parameters of the student model by using a random gradient descent algorithm:
Wherein α 1、α2 and α 3 represent hyper-parameters.
Step 9, the teacher model directly uses the parameters of the student model to update the parameters of the teacher model, and the parameters do not involve gradient feedback, and Θ ttt corresponds to a feature encoder, a classifier and a feature mapper of the teacher model, and Θ sss corresponds to a feature encoder, a classifier and a feature mapper of the teacher model. The updating method comprises the following steps:
Θt←mΘt+(1-m)[(1-X)Θt+XΘs] (21);
Φt←mΦt+(1-m)[(1-X)Φt+XΦs] (22);
Ψt←mΨt+(1-m)[(1-X)Ψt+XΨs] (23);
where m represents a momentum factor and X obeys a Bernoulli distribution (also referred to as a 0-1 distribution), defined as:
P(X=k)=pk(1-p)1-k,k={0,1} (24);
The value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model is controlled through the Bernoulli probability p.
In order for the teacher model to learn new knowledge quickly at an early stage of model training, the momentum factor m is designed as:
m=min(itera/(itera+1),η) (25);
Wherein itera is the iteration number of the current student model, min (itera/(itera +1), η) represents the smaller one of itera/(itera +1) and η, η is a constant, and is generally set to 0.999.
The generalized continuous classification method based on the online comparison distillation network can test at any time. In the test stage, a teacher model is adopted for testing. The reason is that student models at different times are good at classifying different categories, and teacher models learned from the student models can cumulatively learn their advantages. Thus, the teacher model has a greater ability to distinguish all of the categories seen than the student model.
The above-described embodiments are only for illustrating the technical spirit and features of the present invention, and it is intended to enable those skilled in the art to understand the content of the present invention and to implement it accordingly, and the scope of the present invention is not limited to the embodiments, i.e. equivalent changes or modifications to the spirit of the present invention are still within the scope of the present invention.

Claims (10)

1. The generalized continuous classification method based on the online comparison distillation network is characterized by comprising the following steps of:
Step 1, establishing a classification model based on knowledge distillation, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are respectively provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of a student model; initializing parameters of a teacher model and a student model and giving a buffer zone with a fixed size;
Step 2, assuming that the non-stationary data stream is composed of tasks with disjoint samples, the training set of each task is composed of labeled data The method comprises the steps of forming a task training set, wherein the number of samples is the number of samples in the task training set, the number of samples in the task training set is the number of samples in the first image sample in the task training set, and the number of the samples in the task training set is the number of the samples in the first image sample in the task training set; buffer/>The capacity of the image sensor is that the first image sample in the buffer area is the category marked by the first image sample in the buffer area; when a batch of data flow containing R samples arrives, counting the number of the samples currently encountered, and updating a buffer area by using a reservoir sampling method;
Step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classified output data of the teacher model and the student model corresponding to the S samples through processing of respective feature encoders and classifiers of the teacher model and the student model; respectively obtaining feature embedded data of a teacher model and a student model corresponding to the S samples through processing of respective feature encoders and feature mappers of the two samples;
Step 4, calculating the mass fraction of the classified output data of the teacher model, adjusting the weight of the online knowledge distillation loss function of different samples according to the mass fraction of the classified output data of the teacher model, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing the characteristic embedded data between the teacher model and the student model, and calculating the comparative relation distillation loss of the teacher model and the student model
Step 6, utilizing self-supervision learning and supervision contrast learning to help the student model to extract discriminant features, and calculating self-supervision loss of the student modelLearning loss/>, as compared with supervision
Step 7, based on experience playback, calculating cross entropy classification loss of student model
Step 8, calculating the total optimization target of the student model To the super parameter of each corresponding loss function; optimizing parameters of the student model by using a random gradient descent algorithm;
and 9, directly updating the parameters of the teacher model by using the parameters of the student model.
2. The generalized continuous classification method based on an on-line comparative distillation network according to claim 1, wherein in step 2, the method of reservoir sampling comprises the steps of:
Step A1, judging the number num of samples and the buffer capacity of the current encountered sample The size of the space, if/>Storing samples (x i,yi) directly to the buffer/>In (a) and (b); x i is the ith image sample in the training set of task T n, y i is the category marked by the ith image sample x i in the training set of task T n;
Step A2, if Generating a random integer rand_num, wherein the minimum value of the random integer is 0, and the maximum value of the random integer is num-1; if/>Replacing samples in the buffer with samples (x i,yi) (x rand_num,yrand_num);xrand_num denotes buffer/>, indexed rand_numY rand_num denotes the buffer/>, indexed rand_numIs included.
3. The generalized continuous classification method based on online comparative distillation network according to claim 1, wherein in step 4, the method for calculating the mass fraction of the teacher model classification output data is as follows:
Setting: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)The sample x j is represented to be sequentially processed by a feature encoder and a classifier of the teacher model to obtain classified output data; omega (x j) is the mass fraction of the teacher model class output data corresponding to sample x j; the formula for ω (x j) is shown below:
Wherein:
ρ represents a temperature coefficient;
c represents the number of all possible categories;
exp (·) represents an exponential function based on a natural constant e;
Output data for classification/> The classification output data of the corresponding category y j;
Representing classified output data/> The classification of each category of the data is output.
4. The generalized continuous classification method based on online comparative distillation network according to claim 3, wherein in step 4, there is providedThe sample x j is represented to be sequentially processed by a feature encoder and a classifier of the student model to obtain classified output data; calculating online distillation loss/>, of teacher model and student modelThe method of (2) is as follows:
Wherein: i/2 represents A norm; /(I)Representing a mathematical expectation function.
5. The generalized continuous classification method based on an online contrastive distillation network according to claim 1, wherein in step 5, it is set that: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After a sample x j is input into a teacher model, feature embedded data of the teacher model is obtained through a feature encoder and a feature mapper; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; z t is that after all samples x j of the current batch are input into a teacher model, all teacher model feature embedded data/>, which is obtained through feature encoder and feature mapper processingIs a collection of (3); z s is that after all samples x j of the current batch are input into the student model, all student model feature embedded data/>, which are obtained through a feature encoder and a feature mapperIs a collection of (3); /(I)Representing feature embedded data sampled from z s; z t+ represents and/>Teacher model features with the same class labels are embedded into the dataset; /(I)Representing feature embedded data sampled from z t+; /(I)Representing feature embedded data sampled from z t; calculating comparative relation distillation loss/>, of teacher model and student modelThe method of (2) is as follows:
Wherein:
representing a mathematical expectation function;
i/2 represents A norm;
log (·) represents a natural log function based on a natural constant e;
representing a judging function for judging feature embedded data/> And/>Whether derived from their joint distribution/>
Representing a judging function for judging feature embedded data/>And/>Whether derived from their joint distribution/>
Representing a transpose;
exp (·) represents an exponential function based on a natural constant e;
τ represents the temperature coefficient.
6. The generalized continuous classification method based on an online comparative distillation network according to claim 5, wherein step 6 comprises the following sub-steps:
Step B1, setting Θ ttt as a feature encoder, a classifier and a feature mapper of the teacher model, and setting Θ sss as a feature encoder, a classifier and a feature mapper of the teacher model; each training sample (x, y) of the student model is subjected to random geometric transformation once to obtain amplified training samples Where x represents the image sample and y is the category marked by the image sample x,/>For geometrically transformed image samples,/>A label that is geometrically transformed; amplified training samples/>Inputting the data into a student model, and processing the data by a feature encoder and a feature mapper of the student model to obtain corresponding student model feature data F s and feature embedded data/>Wherein:
Step B2, inputting the obtained student model feature data F s to a multi-layer sensor In the method for judging training samples/>The kind of geometric transformation is performed; the calculation formula for setting the output of the multilayer sensor as S s,Ss is as follows:
step B3, calculating self-supervision loss The calculation formula of (2) is as follows:
Wherein the method comprises the steps of Representing a mathematical expectation function;
softmax (·) represents the softmax function;
representing a cross entropy loss function;
step B4, setting: representing a capacity of/> Is a buffer of (1); x j is the jth image sample in the buffer, y j is the class marked by the jth image sample x j in the buffer; /(I)After the sample x j is input into the student model, feature embedded data of the student model are obtained through a feature encoder and a feature mapper; /(I)Feature embedded data/>, representing the resulting overall student modelAnd/>Is a collection of (3); /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)Representation and/>Student model features with the same class labels are embedded into the dataset; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; /(I)The expression slave/>Embedding the characteristics obtained by sampling into data; based on the original feature embedded data and the amplified feature embedded data, performing supervision and contrast learning by utilizing the feature embedded data in the student model, and monitoring and contrast learning a loss function/>The calculation formula of (2) is as follows:
Wherein:
Wherein the method comprises the steps of Representing mathematical expectations;
i/2 represents A norm;
log (·) represents a natural log function based on a natural constant e;
representing feature embedded data/> And/>Is a distance of (2);
representing feature embedded data/> And/>Is a distance of (2);
exp (·) represents an exponential function based on a natural constant e;
Representing a transpose;
τ represents a temperature coefficient;
step B5, self-monitoring loss Learning loss/>, as compared with supervisionCombining to obtain the collaborative contrast loss/>Helping student model to better extract discriminant features,/>The calculation formula of (2) is as follows:
7. The generalized continuous classification method based on an online contrasted distillation network according to claim 1, wherein in step B1, the geometric transformation includes rotating, scaling and adjusting the aspect ratio of the image.
8. The generalized continuous classification method based on online contrastive distillation network according to claim 1, wherein in step 7, assuming that the non-stationary data stream is composed of n sample disjoint tasks { T 1,T2,...,Tn }, let x denote task T n and bufferY is the category marked by the image sample x; cross entropy classification loss/>, of student modelThe calculation formula of (2) is as follows:
Wherein:
representing a mathematical expectation function;
softmax (·) represents the softmax function;
representing a cross entropy loss function;
r s (x) represents classified output data of the image sample x after sequentially passing through the feature encoder and the classifier of the student model.
9. The generalized continuous classification method based on online contrasted distillation network according to claim 1, wherein in step 9, a specific method for updating parameters of a teacher model by using parameters of a student model is as follows:
Setting Θ ttt as a feature encoder, a classifier and a feature mapper of the teacher model, and setting Θ sss as a feature encoder, a classifier and a feature mapper of the teacher model; the updating method of the teacher model parameters comprises the following steps:
Θt←mΘt+(1-m)[(1-X)Θt+XΘs];
Φt←mΦt+(1-m)[(1-X)Φt+XΦs];
Ψt←mΨt+(1-m)[(1-X)Ψt+XΨs];
Where m represents a momentum factor and X obeys the bernoulli distribution, defined as:
P(X=k)=pk(1-p)1-k,k={0,1};
The value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model is controlled through the Bernoulli probability p.
10. The generalized continuous classification method based on an online contrastive distillation network according to claim 9, wherein the calculation formula of the momentum factor m is as follows:
m=min(itera/(itera+1),η);
wherein itera is the iteration number of the current student model, min (itera/(itera +1), η) represents the smaller one of itera/(itera +1) and η, η is a constant, and is set to 0.999.
CN202210326319.8A 2022-03-30 2022-03-30 Generalized continuous classification method based on online comparison distillation network Active CN114972839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210326319.8A CN114972839B (en) 2022-03-30 2022-03-30 Generalized continuous classification method based on online comparison distillation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210326319.8A CN114972839B (en) 2022-03-30 2022-03-30 Generalized continuous classification method based on online comparison distillation network

Publications (2)

Publication Number Publication Date
CN114972839A CN114972839A (en) 2022-08-30
CN114972839B true CN114972839B (en) 2024-06-25

Family

ID=82976151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210326319.8A Active CN114972839B (en) 2022-03-30 2022-03-30 Generalized continuous classification method based on online comparison distillation network

Country Status (1)

Country Link
CN (1) CN114972839B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511059B (en) * 2022-10-12 2024-02-09 北华航天工业学院 Network light-weight method based on convolutional neural network channel decoupling
CN115457042B (en) * 2022-11-14 2023-03-24 四川路桥华东建设有限责任公司 Method and system for detecting surface defects of thread bushing based on distillation learning
CN116502621B (en) * 2023-06-26 2023-10-17 北京航空航天大学 Network compression method and device based on self-adaptive comparison knowledge distillation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610173A (en) * 2021-08-13 2021-11-05 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113869512A (en) * 2021-10-09 2021-12-31 北京中科智眼科技有限公司 Supplementary label learning method based on self-supervision and self-distillation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
CN111767711B (en) * 2020-09-02 2020-12-08 之江实验室 Compression method and platform of pre-training language model based on knowledge distillation
CN116171446A (en) * 2020-09-09 2023-05-26 华为技术有限公司 Method and system for training neural network model through countermeasure learning and knowledge distillation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610173A (en) * 2021-08-13 2021-11-05 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113869512A (en) * 2021-10-09 2021-12-31 北京中科智眼科技有限公司 Supplementary label learning method based on self-supervision and self-distillation

Also Published As

Publication number Publication date
CN114972839A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN114972839B (en) Generalized continuous classification method based on online comparison distillation network
CN109086658B (en) Sensor data generation method and system based on generation countermeasure network
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN111582397B (en) CNN-RNN image emotion analysis method based on attention mechanism
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
CN111460157B (en) Cyclic convolution multitask learning method for multi-field text classification
CN115731441A (en) Target detection and attitude estimation method based on data cross-modal transfer learning
CN111611375B (en) Text emotion classification method based on deep learning and turning relation
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN106339718A (en) Classification method based on neural network and classification device thereof
CN111813939A (en) Text classification method based on representation enhancement and fusion
CN115712740A (en) Method and system for multi-modal implication enhanced image text retrieval
CN113076490B (en) Case-related microblog object-level emotion classification method based on mixed node graph
Chen et al. STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos
Xu Mt-resnet: a multi-task deep network for facial attractiveness prediction
Liu et al. Learning a similarity metric discriminatively with application to ancient character recognition
CN116956228A (en) Text mining method for technical transaction platform
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
Zhu Neural architecture search for deep face recognition
CN112541081B (en) Migratory rumor detection method based on field self-adaptation
Zhang et al. Fabric image retrieval based on multi-modal feature fusion
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
Baber et al. Facial expression recognition and analysis of interclass false positives using CNN
CN114529063A (en) Financial field data prediction method, device and medium based on machine learning
Soujanya et al. A CNN based approach for handwritten character identification of Telugu guninthalu using various optimizers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant