CN116797850A

CN116797850A - Class increment image classification method based on knowledge distillation and consistency regularization

Info

Publication number: CN116797850A
Application number: CN202310875218.0A
Authority: CN
Inventors: 史殿习; 史燕燕; 杨绍武; 杨焕焕; 李林; 刘哲
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-09-22

Abstract

The invention discloses a class increment image classification method based on knowledge distillation and consistency regularization, and aims to improve classification accuracy. Firstly, constructing an old class image recognition system facing a class increment learning scene, and training the old class image classification system facing the class increment learning scene by adopting an old class image data set to obtain a trained old class image classification system; and then constructing all class image recognition systems oriented to class increment learning scenes, wherein a classification prediction module in all class image recognition systems is obtained by expanding a classification prediction module of the trained old class image recognition system. And performing incremental training on all the class image classification systems by adopting the new class image data set to obtain the trained class-oriented incremental learning scene-oriented all the class image classification systems. And identifying the images by adopting the trained all-category image identification system to obtain an identification result. The invention can reserve old knowledge to the maximum extent, effectively overcome the problem of catastrophic forgetting and improve the recognition precision.

Description

Class increment image classification method based on knowledge distillation and consistency regularization

Technical Field

The invention relates to the field of image classification in computer vision, in particular to a method for classifying images based on knowledge distillation and consistency regularization type incremental learning.

Background

Computer vision has achieved remarkable results in recent years as an important field of research for artificial intelligence. Artificial intelligence has been widely developed and applied in various aspects as one of key technologies of new infrastructure, and in particular, computer vision technology has been rapidly developed. The field of image classification is an important application scenario for computer vision technology. With the development of artificial intelligence technology, the existing image classification method generally adopts a neural network to train on a large-scale data set, so as to realize classification of all targets. Most existing image classification methods have an inherent assumption that all examples of target classes are available during the training phase, and conventional training strategies typically require that all samples corresponding to the old and new tasks be available during the training phase, thereby re-model training on all data. However, new data and knowledge are continually emerging in open world scenarios, especially the internet world. In practical application, new category data may be encountered, which requires that the classification model learn new target categories in the process of data increment, so as to obtain stronger classification capability. The existing image classification method cannot meet the requirement of incremental learning, and when fine adjustment is performed by adopting data of a new task, the generalization capability of the old task is drastically reduced, which is called a catastrophic forgetting phenomenon (that is, the recognition accuracy of the old class is drastically reduced).

Aiming at the catastrophic forgetting problem, the existing image classification method facing the class increment learning scene can be divided into the following four classes:

(1) Image classification method based on parameter isolation

This type of image classification approach assumes that for old tasks, some of the parameters in the model are important and should not be adjusted, while unimportant parameters can be adjusted to accommodate new tasks. Thus, the evaluation of the importance of the parameters is an important subtask. However, the image classification method based on the parameter isolation assumes that the parameters of the model are mutually independent, and can evaluate the importance of the parameters and impose constraints, respectively. Therefore, in the image classification method based on parameter isolation, when the incremental model is trained, different model parameters are distributed for each task by restraining the importance weight of the old model parameters, so that the classification accuracy of identifying all the tasks is improved. This type of approach varies in how important the model parameters are estimated. Such as the EWC method (see the paper of the literature "J.Kirkpatrick et al.," Overcoming catastrophic forgetting in neural networks, "Arxiv161200796Cs Stat, jan.2017, access: may 08,2022.[ Online ]. Avalable: http:// arxiv.org/abs/1612.00796.," J.Kirkpatrick et al: overcome catastrophic forgetfulness in neural networks). However, more parameters are needed to be introduced in the method, and reasonable indexes are difficult to design to measure the importance of the parameters.

(2) Image classification method based on examples

The core idea of the image classification method based on the examples is to combine the stored old class examples with the new task data by storing a small amount of data of the previous task, and use the combined old class examples and the new task data as the input of an incremental model to jointly optimize the parameters of the incremental model so as to achieve the purpose of identifying all classes. The main challenge of the method is that the old class data is less, the new class data is more, and the serious data unbalance problem exists between the new class and the old class. At present, a typical method is IL2M (see literature "E.Belouadiah and A.Popescu," IL2M: class Incremental Learning With Dual Memory, "in ICCV, S)eoul, korea (South), oct.2019, pp.583-592 ", paper by e.belua dah et al: IL2M: class incremental learning using double memory), biC (see literature "y.wu et al," Large Scale Incremental Learning, "ArXiv190513260 Cs, may 2019,Accessed:May 08,2022 [ Online ]].Available:http:// arxiv.org/abs/1905.13260The paper of y.wu et al: large scale incremental learning), LUCIR (see papers of documents "S.Hou, X.Pan, C.C.Loy, Z.Wang, and d.lin," Learning a Unified Classifier Incrementally via Rebalancing, "in CVPR, long beacon, CA, USA, jun.2019, pp.831-839", s.hou et al: unified classifier learning by rebalancing delta) and WA (see literature "B.Zhao, X.Xiao, G.Gan, B.Zhang, and s. -t.xia," Maintaining Discrimination and Fairness in Class Incremental Learning, "in CVPR, seattle, WA, USA,2020, pp.13208-13217", paper by b zhao et al). Maintaining discrimination and fairness in class delta learning) all utilize the above ideas to solve the problem of data imbalance in class delta learning scene oriented image classification. Although the above methods can effectively improve the recognition accuracy of the old category, they may not be suitable for a limited memory scenario. In other words, these methods may not work if some old class data is not stored. Moreover, as learning tasks continue to occur, the size of the stored old class data becomes huge, and the storage requirements for the computer during the training process are increasing, which is impractical for applications with limited memory and computing resources (such as mobile phones or robots).

(3) Image classification method based on generation model

Another effective method for solving catastrophic forgetting is to generate pseudo samples of the previous class through learning generation countermeasure network (Generative Adversarial Networks, GANs), and use the generated samples to represent the distribution of old data, and then input them together with new class images into an incremental model to realize classification of all classes. The method avoids the storage of the old class images, but requires an additional generation network to be trained to generate the old class pseudo samples, and requires more calculation amount and memory, so that the training efficiency of the incremental model is low.

(4) Regularization strategy-based image classification method

The core idea of the regularization strategy-based image classification method is to protect the weights of the old model from being covered by the weights of the incremental model by applying constraints to the loss function of the incremental model, which is mainly achieved by designing changes in distillation loss penalty network parameters. Knowledge distillation techniques (see documents "G.Hinton, O.Vinyals, and J. Dean," Distilling the Knowledge in a Neural Network, "arXiv, arXiv:1503.02531, mar.2015.doi: 10.48550/arXiv.1503.02531", paper by G.Hinton et al: distilling knowledge in neural networks) were initially widely used in model compression, the core of which was to migrate key knowledge learned in teacher models into student models. Later, the technology was used to solve the problem of catastrophic forgetfulness in class incremental learning, the earliest effort was LwF (see literature "Z.Li and D.Hoiem," Learning without Forgetting, "ArXiv160609282Cs Stat, feb.2017, accessed: may 08,2022 [ Online ] ].Available:http://arxiv.org/abs/ 1606.09282Paper of z li et al: learning without forgetting), the prediction of the new task by the incremental model is similar to the prediction of the new task by the old model by designing knowledge distillation loss, thereby improving the recognition accuracy of the old category. This work shows that knowledge distillation methods can effectively preserve the information of the old class.

Thanks to the rapid development of the incremental learning technology, the training requirement of the traditional image classification method is broken through, all samples corresponding to new and old tasks can be used in training, and model training is performed again on all data. However, new category data may be encountered in practical applications, and the image classification method based on class increment learning can learn new training samples and target categories in the process of each data increment, and simultaneously can reduce forgetting capacity of old categories. The first three types of image classification methods need to store a small amount of old type images or generate old type pseudo samples by adopting a generation model, and the old type data or the pseudo samples and the new type data are input into the model together for training, so that the problems of low training efficiency and unbalanced types are caused. The image classification method based on the regularization strategy can realize the recognition capability of all types of images without storing old types of images or complex generation models, thereby reducing the dependence of the traditional image classification model on a large amount of old type labeling data. Meanwhile, PASS (see documents "F.Zhu, X. -Y.Zhang, C.Wang, F.Yin, and C. -L.Liu," Prototype Augmentation and Self-Supervision for Incremental Learning, "in CVPR, nashville, TN, USA, jun.2021, pp.5133-5141.", paper by F.Zhu et al: incremental learning based on prototype enhancement and self-supervision) uses feature distillation to retain knowledge of old classes and memorize a class representation prototype for each old class, thereby achieving image classification in class incremental learning scenarios. However, this method causes instability of the stored old class prototypes, and causes classification bias problems. Therefore, the current regularization strategy-based image classification method still faces the following dilemma:

(1) The image classification method based on regularization strategy only distills the output layer and the intermediate features of the old model, so that the past knowledge is difficult to keep completely, and the degree of catastrophic forgetting of the old class is high, namely the classification precision of the old class is not high.

(2) The existing regularization strategy-based image classification method does not need to store old class images, and the PASS method improves classification accuracy of the old class images by storing old class prototypes. However, the old class prototypes stored by the method are unstable, and classification deviation problems can occur, so that the recognition accuracy of all classes is poor.

In summary, the above four image classification methods for class-oriented incremental learning scenarios can cause a problem of catastrophic forgetting or classification deviation of old classes to a higher extent, so that the classification accuracy of all classes is poor. At present, no technical scheme is disclosed which relates to a method for solving the problems by adopting knowledge distillation and consistency regularization, so that classification of all types of images is realized.

Disclosure of Invention

Aiming at the problem that the prior class increment learning-oriented image classification method can cause poor classification precision of all classes due to catastrophic forgetting and classification deviation of old classes when classifying images on the premise of not storing old class images, the invention provides a class increment image classification method based on knowledge distillation and consistency regularization, which is based on the prior teacher-student model architecture, utilizes the ideas of knowledge distillation and consistency regularization, does not need to store old class training images, and utilizes a student model obtained by fine tuning a teacher model by new class data to identify images of all classes, thereby improving the identification precision of images of all classes.

In order to solve the technical problems, the technical scheme of the invention is as follows: firstly, an old class image recognition system oriented to a class increment learning scene is constructed, and the system is composed of a first input preprocessing module, a first self-supervision enhancement module, a first feature learning module and a first classification prediction module. Then preparing a data set required by training the old class image classification system facing the class increment learning scene, taking the images of the first 50 classes in the data set as the old class image data set, and training the old class image classification system facing the class increment learning scene by adopting the old class image data set to obtain the trained old class image classification system facing the class increment learning scene. Then constructing all class image recognition systems facing to class increment learning scenes, wherein the all class image recognition systems facing to the class increment learning scenes are composed of a second input preprocessing module, a second self-supervision enhancing module, a second feature learning module and a second class prediction module, the second class prediction module is used for expanding the output of the first class prediction module to the number of classes of images of all classes, and the class image classification systems are initialized on the basis of trained old class classification system parameters. And selecting the images with 10 categories added each time from the last 50 categories in the data set as a new category image data set, and training all the initialized category-oriented image classification systems facing the category-oriented incremental learning scene by adopting the new category image data set to obtain all the category-oriented image classification systems facing the category-oriented incremental learning scene after training. And then repeatedly constructing all class image recognition systems facing the class increment learning scene, adding 10 classes based on the number of the previous class categories, selecting the added 10 classes of images as new class image data sets, and training all class image classification systems facing the class increment learning scene repeatedly constructed by adopting the new class image data sets to obtain the trained all class image classification systems facing the class increment learning scene. After repeating the construction for five times and training all the class image recognition systems facing the class increment learning scene, 100 classes in the data set participate in training to obtain all the class image recognition systems facing the class increment learning scene which are finally trained. And finally, identifying the images to be identified containing all the categories by adopting a finally trained image identification system facing to the class increment learning scene, so as to obtain an image identification result.

By combining knowledge distillation and consistency regularization, the method can identify images of all categories under the condition of not depending on the old images, effectively improve the identification accuracy of the old images, and further relieve the catastrophic forgetting problem of the old images. According to the invention, all class image classification systems are initialized on the basis of the trained old class classification system parameters, and only new class training data are used for training all class image classification systems for multiple times so as to achieve the effect of incremental image classification, so that the trained all class image classification systems have higher recognition precision of images to be tested.

The technical scheme of the invention is as follows:

first, an old class image classification system for identifying the old class oriented class increment learning scene is constructed. Class increment learning scene-oriented old class image recognition system is composed of old class image data set X _old、 The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first feature learning module and a first classification prediction module. The first input preprocessing module, the first self-supervision enhancement module, the first feature learning module, and the first classification prediction module are each implemented in a deep learning framework, pyTorch, by a multi-layer convolutional neural network CNN (Convolutional Neural Network).

The first input preprocessing module is connected with the old-class image data set, the first self-supervision enhancement and the first characteristic learning module and is used for processing the old-class image data set X _old Reading old class images, and inputting a first pair of preprocessing modules from X _old The old class images in the old class image set X read in the process are subjected to random image clipping, horizontal overturning and brightness changePreprocessing such as normalization to obtain a preprocessed old class image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Representing the preprocessed old class image set X ¹ And tag set Y ¹ And sending the message to the first self-supervision and enhancement module.

The first self-supervision enhancing module is connected with the first input preprocessing module and the first characteristic learning module, and receives X from the first input preprocessing module ¹ And Y ¹ For X ¹ And Y ¹ Enhancement was performed using the Self-supervised label enhancement method (H.Lee, S.J.Hwang, and J.shin, "Self-supervised Label Augmentation via Input transformations." arXiv, jun.29, 2020.doi:10.48550/arXiv.1910.05872.H.Lee et al, paper: self-supervised label enhancement by input transformations, 5714-5724), yielding an enhanced old class image set X ¹ ' at X ¹ Generating enhanced tag set Y on' basis ¹ ' X is as follows ¹ ' and Y ¹ ' send to the first feature learning module.

The first feature learning module is connected with the first self-supervision and enhancement module and the first classification prediction module, and receives the enhanced X from the first self-supervision and enhancement module ¹ ' and Y ¹ ' from X ¹ ' extracting a high-dimensional semantic feature representation set F ¹ ', F ¹ ' and Y ¹ ' send to the first classification prediction module. The first feature learning module is a ResNet18 network (He K, zhang X, ren S, et al deep Residual Learning for Image Recognition [ J)]The paper of IEEE,2016.He K et al: depth residual learning for image recognition 770-778), the network is divided into six modules. The first module consists of a first convolution layer, a first normalization layer, an activation function layer and a first downsampling layer, wherein the convolution kernel of the first convolution layer is 3 multiplied by 3, the step length is 1, and the filling size is 1; the second to fifth modules are composed of two residual units, and each residual unit is composed of 1 convolution layer, 1 normalization layer and an activation function layer; the convolution kernel size of the convolution layer of the residual unit in the second module is 3×3, the step size is 1, and the padding size is 1; third stepThe convolution kernel size of the convolution layer of the residual unit in the module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fourth module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fifth module is 3×3, the step size is 2, and the padding size is 1. The sixth module consists of a second downsampling layer, the step size of which is 1, without padding. The activation function layers adopted by the first feature learning module all adopt ReLU functions (Jiang Angbo, wang Weiwei. ReLU activation function optimization study [ J ] ]Sensor and microsystem, 2018, 37 (02): 50-52.).

The first classification prediction module is connected with the first feature learning module and consists of 1 full-connection layer. The first classification prediction module receives a high-dimensional semantic feature representation set F from the first feature learning module ¹ ', F ¹ The' dimension is reduced to the number of classes of the old class image, and then a cross entropy loss function is used (see literature "Mannor S, peleg D, rubinstein R.the cross entropy method for classification [ C ]]International machine learning conference on the order of// Proceedings of the, nd international conference on Machine learning, 2005:561-568 "@, mannor S et al: the cross entropy method is used for classifying) calculates the difference between the predicted category and the real label as a loss value, and optimizes the first feature learning module by using the back propagation of the loss value.

Second step, constructing an old-class image data set X for training an old-class image recognition system facing class increment learning scene _old . As training set, open source data set CIFAR100 (https:// www.cs.toronto.edu/-kriz/CIFAR. Html, 2009) from collection by alexan-kriging-heat-fusky, vinod-nell, and jeffy-Xinton was used. There are 100 classes in the CIFAR100 dataset, each class containing 600 color images of size 32 x 32, each class having 500 training images and 100 test images. All training images of the first 50 categories are selected as the old-category image dataset X _old Is X _old Dispensing label Y _old 。

Third step, the first input preprocessing module is used for preprocessing the first input from X _old Old class diagram with size N is read inImage set X, x= { X ₁ ，x ₂ ，...，x _n …，x _N N=64, 1.ltoreq.n.ltoreq.x _n Representing the nth image in X, the first input preprocessing module adopts a preprocessing method to preprocess the X in X _n Preprocessing, including random cutting, horizontal overturning, brightness change and normalization of images to obtain a preprocessed old-class image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Representation of Collecting the preprocessed old class images X ¹ And corresponding tag set Y ¹ Send to the first self-monitoring enhancement module, y _n Is x _n The method is as follows:

3.1 let variable n=1;

3.2 x _n Conversion to RGB color space, obtaining 3-channel x _n ；

3.3 pair 3 channel X _n Is normalized to 32 x 32 to obtain normalized x _n ；

3.4 normalizing x _n Converting from vector form to Tensor (Tensor) form to obtain Tensor form x _n X in tensor form _n By usingIndicating, will->Put into the preprocessed image set X ¹ ；

3.5 x _n Tag y of (2) _n Put into tag set Y ¹ ，y _n ∈Y _old ；

3.6 if N is less than N, let n=n+1, turn 3.2; if n=n, a preprocessed image set X is obtained ¹ And tag set Y ¹ Wherein, the method comprises the steps of, wherein,Y ¹ ＝{y ₁ ，y ₂ ，...，y _n …，y _N 3.7;

3.7X ¹ And Y ¹ And sending the data to a self-supervision and enhancement module.

Fourth, the first self-monitoring enhancement module receives X from the first input preprocessing module ¹ And Y ¹ For X ¹ Enhancement was performed using the Self-supervised label enhancement method (H.Lee, S.J.Hwang, and J.shin, "Self-supervised Label Augmentation via Input transformations." arXiv, jun.29, 2020.doi:10.48550/arXiv.1910.05872.H.Lee et al, paper: self-supervised label enhancement by input transformations, 5714-5724), yielding an enhanced old class image set X ¹ ' and tag Y ¹ ' X is as follows ¹ ' and Y ¹ ' send to the first feature learning module, the method is:

4.1 let variable n=1;

4.2 pairs X ¹ In (a) and (b)Respectively rotating by 90 degrees, 180 degrees and 270 degrees to obtain rotated image +.>Will beAnd->Put into enhanced old-class image set X ¹ ' in;

4.3 isCalculate the corresponding tag, y _n +1 as +.>Will y _n +2 as +.>Will y _n +3 as +.>And put these 3 labels into the enhanced label set Y ¹ ' in;

4.4 if N is less than N, let n=n+1, turn 4.2; if n=n, the enhanced old-class image set X is obtained ¹ ′， And tag set Y ¹ ′，Y ¹ ′＝{{y ₁ ，y ₁ +1，y ₁ +2，y ₁ +3}，{y ₂ ，y ₂ +1，y ₂ +2，y ₂ +3}，…，{y _n ，y _n +1，y _n +2，y _n +3}，…，{y _N ，y _N +1，y _N +2，y _N +3}, turn 4.5;

4.5X ¹ ' and Y ¹ ' send to the first feature learning module.

Fifth step, X is adopted ¹ Training a first feature learning module and a first classification prediction module of the old class image recognition system facing the class increment learning scene to obtain optimal network weight parameters of the first feature learning module and the first classification prediction module. The method comprises the following steps:

5.1 initializing weight parameters in a first feature learning module and a first classification prediction module, enabling an initial learning rate to be 0.01, enabling a batch processing size to be N, enabling N=64, enabling N to be more than or equal to 1 and less than or equal to 64, enabling a total training iteration round number epoch_max to be 100, and enabling a current training round number epoch_cur to be 1.

5.2 the first feature learning module receives X from the first self-supervised enhancement module ¹ ' and Y ¹ 'X' is extracted by adopting a characteristic extraction method ¹ ' extracting features to obtain X ¹ ' high-dimensional semantic feature set F ¹ ', at this time F ¹ ′＝{F ² ，F ³ ，F ⁴ ，F ⁵ ，F ⁶ }，F ² Representing a second high-dimensional semantic feature set，F ³ Representing a third high-dimensional semantic feature set, F ⁴ Representing a fourth high-dimensional semantic feature set, F ⁵ Representing a fifth high-dimensional semantic feature set, F ⁶ Represents the sixth high-dimensional semantic feature set and uses F ¹ ' and Y ¹ ' send to the first classification prediction module. The specific method comprises the following steps:

5.2.1 initializing n=1;

5.2.2A first module in the first feature learning Module adopts a convolution method to X ¹ Nth set of images inPerforming convolution operation for 1 time to obtain the result of the first convolution module of the first module>Will->The method is that:

5.2.2.1 first convolutional layer pair of first module in first feature learning module(The input channel of each image is 3) two-dimensional convolution is performed to obtain a two-dimensional convolution result of 64 channels +.>Will->Sending to a first normalization layer;

5.2.2.2 first normalized layer pair of first module in first feature learning modulePerforming normalization operation to obtain normalization result->Will->An activation function layer sent to the first module;

5.2.2.3 pairs of activation function layers of a first module of the first feature learning modulesPerforming nonlinear activation to obtain nonlinear activation result +.>Will->Sending to the first downsampling layer;

5.2.2.4 first downsampling layer pair of first module of first feature learning modulePerforming maximum pooling operation to obtain the result of the first module in the feature learning module with channel number of 64>Will->Sending the first characteristic learning module to a second module in the first characteristic learning module;

5.2.3 the second of the first feature learning modules receives from the first of the first feature learning modulesBy residual cell operation (see literature "He K, zhang X, ren S, et al deep residual learning for image recognition [ C ] ]/(Proceedings of the IEEE conference on computer vision and pattern recoganation.2016): 770-778 ", heK et al: image recognition based on depth residual learning) pair +.>Performing convolution operation for 2 times to obtain a feature learning module with 64 channelsResults of the second module->Will->To a third module of the first feature learning modules and will +.>Put to a second high-dimensional semantic feature set F ² Is a kind of medium.

5.2.4 the third module of the first feature learning module receives from the second module of the first feature learning moduleUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain result of the third module in the feature learning module with 128 channels +.>Will->To the fourth module of the first feature learning module and will +.>Put to the third high-dimensional semantic feature set F ³ Is a kind of medium.

5.2.5 the fourth module of the first feature learning module receives from the third module of the first feature learning moduleUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the fourth module in the feature learning module with 256 channels +.>Will->To the fifth module of the first feature learning module and will +.>Put to fourth high-dimensional semantic feature set F ⁴ Is a kind of medium.

5.2.6 the fifth module of the first feature learning modules receives from the fourth module of the first feature learning modulesUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the fifth module in the feature learning module with channel number of 512 +.>Will->To the sixth module in the feature learning module and to +.>Put to the fifth high-dimensional semantic feature set F ⁵ Is a kind of medium.

5.2.7 the sixth module of the first feature learning modules receives from the fifth module of the first feature learning modulesSecond downsampling layer pair in sixth module>Downsampling to obtain the result +.f of the sixth module in the feature learning module with 512 channels>Will->Send to the first class prediction module and will +.>Put into the sixth high-dimensional semantic feature set F6.

5.2.8 if N < N, let n=n+1, turn 5.2.2; if n=n, five high-dimensional semantic feature sets F are obtained ² ，F ³ ，F ⁴ ，F ⁵ ，F ⁶ ， Will F ² ，F ³ ，F ⁴ ，F ⁵ ，F ⁶ Put to X ¹ ' high-dimensional semantic feature set F ¹ ' in this case F ¹ ′＝{F ² ，F ³ ，F ⁴ ，F ⁵ ，F ⁶ }。

5.2.9 the sixth module in the first feature learning module learns F ¹ ' and Y ¹ ' send to the first classification prediction module.

5.3 the first class prediction module receives F from a sixth module in the feature learning module ¹ ' and Y ¹ ' calculating F by using formula (1) ¹ F in' ⁶ And Y is equal to ¹ ' Cross entropy loss L ₁ F in formula (1) represents the full-connected-layer classifier in the first classification prediction module, F (F) ⁶ ) Represents F ⁶ Prediction category L processed by classifier f _ce (f(F ⁶ )，Y ¹ ' indicates the prediction category F (F) ⁶ ) With the genuine label Y ¹ Cross entropy loss between's.

L ₁ ＝L _ce (f(F ⁶ )，Y ¹ ' formula (1)

5.4, the current training round number epoch_cur=epoch_cur+1; if the epoch_cur is less than or equal to the training iteration total round number epoch_max, turning to 5.2; if the epoch_cur is greater than the total number of training iterations epoch_max, the training is ended, and the training is turned to 5.5.

5.5 calculating the old class prototype set P Using equation (2) _old ，P _old Comprises N elements

And 5.6, saving the weight parameters obtained by the first feature learning module and the first classification prediction module in the trained old-class image recognition system in a pth format to obtain the trained old-class image classification system facing the class incremental learning scene.

And sixthly, constructing all class image recognition systems for recognizing all class-oriented incremental learning scenes by adopting an all class image recognition system construction method. All kinds of image recognition systems facing to class increment learning scene are formed by a new kind of training image data set X _new The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first characteristic learning module and a first classification prediction module. X is X _new For classes 51 to 60, X in CIFAR100 dataset _new The label denoted Y _new The second input preprocessing module has the same structure and function as the first input preprocessing module, the second self-supervision and enhancement module has the same structure and function as the first self-supervision and enhancement module, the second feature learning module has the same structure and function as the first feature learning module, and the second classification prediction module has a different structure and function from the first classification prediction template. The weight parameters of the second feature learning module are initialized by adopting the parameters of the first feature learning module, and the weight parameters of the second classification prediction module are initialized by adopting the parameters of the first classification prediction module. The second input preprocessing module, the second self-supervision enhancement module, the second feature learning module, and the second classification prediction module are also each implemented in the deep learning framework, pyTorch, by a multi-layer convolutional neural network CNN (Convolutional Neural Network).

Second input preprocessing module and new type image data set X _new The second self-supervision enhancement and the second characteristic learning module are connected, and the second input preprocessing module is used for preprocessing the new image data set X _new Reading new class image set XX, XX= { XX ₁ ，xx ₂ ，...，xx _n ...，xx _N }, for XX of XX _n Preprocessing such as random image clipping, horizontal overturning, brightness change, normalization and the like to obtain a preprocessed new image set X ² Collecting the preprocessed new image set X ² And corresponding label Y ² (Y ² ＝{yy ₁ ，yy ₂ ，...，yy _n …，yy _N -yy) to a second self-supervising enhancement module _n Is xx _n Is a label of (a).

The second self-supervision enhancing module is connected with the second input preprocessing module and the second characteristic learning module, and receives X from the second input preprocessing module ² And Y ² For X ² The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ² ' sum tag set Y ² ' X is as follows ² ' and Y ² ' send to the second feature learning module.

The second feature learning module is connected with the second self-supervision and enhancement module and the second classification prediction module, and receives X from the second self-supervision and enhancement module ² ' and Y ² ' from X ² The high-dimensional semantic feature representation of the new class image is extracted from the' and sent to the second class prediction module. The second feature learning module is a ResNet18 network having six modules with the same structure and function as the six modules of the first feature learning module.

The second classification prediction module is connected with the second feature learning module and consists of 1 full-connection layer. The output of the full-connection layer is expanded based on the output of the first classification prediction module, and the output is expanded into the number of categories (sum of the number of old categories and the number of new categories) of the images of all the categories. The second classification prediction module receives the high-dimensional semantic feature representation of the new class image from the second feature learning module, reduces the dimension of the high-dimensional semantic feature representation of the new class image to the number of classes of all images (new class + old class), and then calculates the difference between the prediction class and the real label as a loss value by using a cross entropy loss function, thereby being beneficial to the fact thatThe second feature learning module is optimized with back propagation of the loss values. Calculation of P using a consistency regularization loss function _old And P _aug And calculating the difference between the old class image and the new class image by using a knowledge distillation loss function, taking the sum of the three difference values as a loss value, and optimizing a second feature learning module by using the back propagation of the loss value.

Seventh step, constructing new class image data set X for training all class image recognition systems _new . The method comprises the following steps: selecting 51 th to 60 th class images from CIFAR100 dataset as new class image dataset X _new 。

Eighth step, the second input preprocessing module is used for preprocessing the input data from X _new Image set XX, xx= { XX with size N is read in ₁ ，xx ₂ ，...，xx _n ...，xx _N N=64, 1.ltoreq.n.ltoreq.64; the second input preprocessing module adopts the preprocessing method in the third step to perform XX in XX _n Preprocessing to obtain a new type of preprocessed image set X ² Collecting the preprocessed new image set X ² And corresponding label Y ² (Y ² ＝{yy ₁ ，yy ₂ ，...，yy _n ...，yy _N -yy) to a second self-supervising enhancement module _n Is xx _n Is a label of (a).

Ninth step, the second self-monitoring enhancement module receives X from the second input preprocessing module ² And Y ² For X ² The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ² ' and tag Y ² ' X is as follows ² ' and tag Y ² ' send to the second feature learning module. Let the number of enhancements t=1;

and tenth, initializing weight parameters of second feature learning modules of all class image classification systems facing the class increment learning scene by adopting weight parameters of a first feature learning module in the class increment learning scene-oriented old class image classification system trained in the fifth step.

Eleventh step, let the dimension of the full connection layer parameter in the second classification prediction moduleIs [512, old_class+10 ] ]Wherein the total number of categories num_categories=100, the old categories old_categories=the number of old categories K _old =50, the dimension of the full-link layer parameter in the second class prediction module is [512, old_classes ]]Is initialized with full connection layer weight parameters of the first class prediction module, and otherwise [512, 10]Initializing the dimension of the number by adopting a random assignment mode; number of new categories K _new ＝10；

Twelfth, training the new enhanced data set X obtained in the ninth step ² Training a second feature learning module and a second classification prediction module of all the class image recognition systems facing the class incremental learning scene to obtain optimal network weight parameters of the second feature learning module and the second classification prediction module, and obtaining all the class image classification systems after the first training. The method comprises the following steps:

12.1 let the initial learning rate learning be 0.01, the training iteration total number epoch_max be 100, and the current training round number epoch_cur be 1.

12.2X ² Each image in the' is input to a second feature learning module of all types of image classification systems, and then the second feature learning module and the second classification prediction module are trained by using knowledge distillation loss, cross entropy loss and consistency regularization loss to obtain optimal network weight parameters in the second feature learning module and the second classification prediction module. The method comprises the following steps:

12.2.1 the second feature learning module receives X from the second self-supervised enhancement module ² ' and Y ² ' X is extracted by the characteristic extraction method described in step 5.2 ² ' extracting features to obtain X ² ' New class high-dimensional semantic feature set F ² ', F ² ' and Y ² ' send to second class prediction Module, F ² ′＝{FF ² ，FF ³ ，FF ⁴ ，FF ⁵ ，FF ⁶ }，FF ² Is X ² ' second high-dimensional semantic feature set of new class, FF ³ Is X ² ' third high-dimensional semantic feature set of new class, FF ⁴ Fourth high-dimensional semantic tetter being a new classCondition set, FF ⁵ Is X ² ' the fifth high-dimensional semantic feature set of new class, FF ⁶ Is X ² The sixth high-dimensional semantic feature set of the' new class.

12.2.2 the first feature learning module receives X from the second self-monitoring enhancement module ² ' and Y ² ' X is extracted by the characteristic extraction method described in step 5.2 ² ' extracting features to obtain X ² ' old class high-dimensional semantic feature set F ¹ ", F ¹ "send to second class prediction Module, F ¹ ″＝{FFF ² ，FFF ³ ，FFF ⁴ ，FFF ⁵ ，FFF ⁶ }，FFF ² Is X ² ' old class second highest dimensional semantic feature set, FFF ³ Is X ² ' old class third high-dimensional semantic feature set, FFF ⁴ Is X ² ' old class fourth high-dimensional semantic feature set, FFF ⁵ Is X ² ' old class fifth high-dimensional semantic feature set, FFF ⁶ Is X ² ' old class sixth high-dimensional semantic feature set.

12.2.3 the second class prediction module of the all-class image classification system receives F from the second feature learning module ² ' and Y ² ' receive F from the first feature learning module ¹ ″。

12.2.4 training the second feature learning module and the second classification prediction module with a sum of cross entropy loss, knowledge distillation loss, and consistency regularization loss. The method comprises the following steps:

12.2.4.1 the sixth high-dimensional semantic feature set FF is computed using equation (1) in 5.3 steps ⁶ And Y ² 'difference between' as cross entropy class loss L of new class ₂ ，L ₂ ＝L _ce (f(FF ⁶ ，Y ² ′))。

12.2.4.2 old class prototype set P for 5.5 step computation _old Gaussian noise is used (see document "A.Aghajanyan, A.Shrivastava, A.Gupta, N.Goyal, L.Zettlemoyer, and S.Gupta," Better Fine-Tuning by Reducing Representational Collapse, "arXiv, arXiv:2008.03156, aug.2020.doi:10.48550/arXiv.2008.03156, A)Aghajanylan et al: better fine tuning by reducing the characterization crash ") using equation (3) to calculate the enhanced prototype P _aug

P _aug ＝P _old +e×r _t Formula (3)

E in the formula (3) represents Gaussian noise, e is subjected to normal distribution, r _t An uncertainty scale index, r, representing a control enhancement prototype _t By the formulaTo calculate +.>For->The square root is calculated. / >D represents the dimension 512 dimension of the feature space, Σ _t，k Covariance matrix representing K-class features, T _r Representing the trace of the calculated matrix.

12.2.4.3 employs a symmetric Kullback-Leibler (KL) technique by formula (4) (see document "b.zheng et al.," Consistency Regularization for Cross-Lingual Fine-tuning., "arXiv, jun.15, 2021.accessedj un.06, 2022.[ Online")].Available：http://arxiv.org/abs/ 2106.08226"paper by B.Zheng et al: consistency regularization across language fine tuning) calculates the old class prototype set P calculated in 5.5 steps _old And 5.6 steps of calculation to obtain enhanced prototype P _aug The consistency loss between the two is taken as prototype consistency loss L _p . Wherein f (·) represents a full-connection layer classifier of the classification prediction module in the classification system of all classes of images, KL _S Is a symmetric KL divergence;

L _p ＝KL _S (f(P _old )，f(P _aug ) Formula (4)

12.2.4.4 the knowledge distillation loss is calculated by a knowledge distillation method, and the knowledge distillation loss comprises a multi-scale self-care distillation loss L _MSFD Similarity of featuresProbability of sex distillation loss L _FSPD And global characteristic distillation loss L _GFD The method comprises the following three parts:

12.2.4.4.1 calculation of multiscale self-noted distillation loss L _MSFD The method comprises the following steps:

12.2.4.4.1.1 from F ² ' fetch { FF } ² ，FF ³ ，FF ⁴ ，FF ⁵ For F (V) ² Each feature representation in' computes a self-attention feature representation, FF ² Self-attention characteristic representation of (2)FF ² Channel number c=64, ff ³ Self-attention feature representation->FF ³ Channel number c=128, ff ⁴ Self-attention feature representation->FF ⁴ Channel number c=256, ff ⁵ Self-attention feature representation->FF ⁵ C=512, C being the number of channels represented by the self-care feature.

12.2.4.4.1.2 from F ¹ "take out { FFF ] ² ，FFF ³ ，FFF ⁴ ，FFF ⁵ For F (V) ¹ Each feature representation in "computes a self-attention feature representation, FFF ² Self-attention characteristic representation of (2)FFF ² Channel number c=64, fff ³ Self-attention feature representation->FFF ³ Channel number c=128, fff ⁴ Self-attention feature representation-> FFF ⁴ Channel number c=256, fff ⁵ Self-attention feature representation->FFF ⁵ Channel number c=512.

12.2.4.4.1.3 calculation of multiscale self-noted distillation loss L _MSFD ， ||q|| ² Represents a norming of q.

12.2.4.4.2 calculation of the probabilistic distillation loss of feature similarity L _FSPD The method comprises the following steps:

12.2.4.4.2.1 from F ² ' fetch FF in ⁶ From F ¹ "take out FFF in ⁶ By similarity probability distribution techniques (see literature "N.passalis and A.Tefas," Learning Deep Representations with Probabilistic Knowledge transfer. "arXiv, mar.20, 2019. Accepted: jul.03, 2022.[ Online.)].Available：http://arxiv.org/abs/1803.10837The paper of ", n.passalis et al: using a probabilistic knowledge transfer learning depth representation) calculates a first feature similarity probability distribution p' using equation (5) and a second feature similarity probability distribution p″ using equation (6).

12.2.4.4.2.2 the formula of the Kullback-Leibler (KL) divergence is used (see literature "Joyce J M. Kullback-Leibler divergence [ M ]]v/International encyclopedia of statistical science, springer, berlin, heidelberg,2011:720-722 "," Joyce J M et alHuman paper: K-L divergence) computing feature similarity probability distillation loss L _FSPD =kl (p ", p ') (i.e. KL divergence is calculated for p" and p').

12.2.4.4.3 from F ² ' fetch FF in ⁶ From F ¹ "take out FFF in ⁶ Calculating global characteristic distillation loss L by adopting a formula (7) _GFD 。

L _GFD ＝||FF ⁶ -FFF ⁶ || ² Formula (7)

12.2.4.4.4 by combining the above three distillation losses, the total knowledge distillation loss L is calculated _kd ，L _kd ＝L _MSFD +L _FSPD +L _GFD ；

12.2.4.4.5 utilizes the overall optimization objective L _total Updating parameters of a second feature learning module and a second classification prediction module in all types of image classification systems, L _total ＝L _clf +λ ₁ L _p +λ ₂ L _kd . Wherein L is _clf Representing the loss of classification of new class feature representation and old class prototype and enhanced prototype by all class image classification systems, L _clf ＝L _ce (f(P _old ；P _aug )，Y ¹ ′)+L ₂ ，L _ce (f(P _old ；P _aug )，Y ¹ ') represents the old class prototype, the enhanced prototype and the old class label Y ¹ ' Classification loss, L ₂ ＝L _ce (f(FF ⁶ )，Y ² ') represents a new class feature representing FF ⁶ With new class label Y ² ' cross entropy classification loss. Lambda (lambda) ₁ A first weight factor denoted as loss, lambda ₂ A second weight factor expressed as loss, lambda is set in the present invention ₁ ＝λ ₂ ＝10。

12.2.5 let the current training round number epoch_cur=epoch_cur+1; if the epoch_cur is less than or equal to the training iteration total round number epoch_max, turning to 12.2.4.4; if the epoch_cur is greater than the total number of training iterations epoch_max, obtaining all types of image classification systems after the first training, and turning to the thirteenth step.

A thirteenth step, constructing a new class-oriented incremental learning scene all class image recognition system by adopting the construction method of the class-oriented incremental learning scene in the sixth step, continuously constructing a new class data set, continuously training the newly constructed new class-oriented incremental learning scene all class image recognition system by adopting the new class data set until all 100 classes in the CIFAR100 data set participate in training, and obtaining all class image classification systems after final training, wherein the method comprises the following steps:

13.1, enabling the image recognition systems of all classes to train the number of rounds again, wherein x=1;

and 13.2, constructing an x+1 class increment learning scene-oriented all-class image recognition system by adopting the all-class image recognition system construction method in the sixth step, wherein the system consists of an x+2 input preprocessing module, an x+2 self-supervision enhancing module, an x+2 characteristic learning module and an x+2 classification prediction module. The structure and the function of the x+2 input preprocessing module, the x+2 self-supervision and enhancement module and the x+2 characteristic learning module are the same as those of the second input preprocessing module, the second self-supervision and enhancement module and the second characteristic learning module. The dimension of the full link layer parameter of the x+2-th classification prediction module is initialized to [512, old_classification+ (x+1) 10];

13.3 all training images of categories 10 (x+5) +1 to 10 (x+6) of the CIFAR100 dataset are taken as new category image dataset X _new ；

13.4 x+2 input pretreatment Module from X _new Is of size N, xxx= { XXX ₁ ，xxx ₂ ，...，xxx _n …，xxx _N N=64, 1.ltoreq.n.ltoreq.64; the x+2 input preprocessing module adopts the preprocessing method in the third step to perform XXX in XXX _n Preprocessing to obtain a new type of preprocessed image set X ³ Collecting the preprocessed new image set X ³ And corresponding label Y ³ (Y ³ ＝{yyy ₁ ，yyy ₂ ，...，yyy _n ...，yyy _N X+2 self-monitoring enhancement module, yyy) _n Is xxx _n Is a label of (a).

13.5 (x+2) th self-monitoring enhancement module receives X from (x+2) th input preprocessing module ³ And Y ³ For X ³ By the fourth stepEnhancing by a self-supervision tag enhancement method to generate a new enhanced image set X ³ ' and tag Y ³ ' X is as follows ³ ' and Y ³ ' send to the x+2 feature learning module.

13.6 initializing weight parameters of the (x+2) th feature learning module of all class image classification systems facing class increment learning scenes by adopting weight parameters of the (x+2) th feature learning module of all class image classification systems after the (x) th training;

13.7 the dimension of the full-link layer parameter in the x+2 classification prediction module is [512, old_class+10 (x+1) ], wherein the part of the full-link layer parameter in the second classification prediction module with the dimension of [512, old_class+10x ] is initialized by adopting the full-link layer weight parameter of the x+1 classification prediction module, and the dimension of [512, 10] is initialized by adopting a random assignment mode.

13.8 let t=x+1, k _old ＝50+10x，K _new =10, the training method described in the twelfth step was applied to the enhanced new class data set X obtained in 13.5 steps ³ Training an x+2 feature learning module and an x+2 classification prediction module of the x+1-th class image recognition system facing the class increment learning scene to obtain optimal network weight parameters of the x+2 feature learning module and the x+2 classification prediction module, and obtaining the x+1-th class image classification system after training.

13.9 if x < 5, let x=x+1, turn 13.2; if x=5, obtaining all kinds of image recognition systems after final (fifth) training, wherein the final (fifth) training is composed of a sixth input preprocessing module, a sixth self-supervision and enhancement module, a sixth feature learning module and a sixth classification prediction module, and turning to a fourteenth step.

Fourteenth step, the final trained all-class image classification system is adopted to test the image X input by the user _user And carrying out image classification to obtain a predicted image classification result. The method comprises the following steps:

14.1 receiving a test image set X which belongs to a category in CIFAR100 and is input by a user by using the trained all-category image classification system _user And tag set Y _user ；

14.2A sixth input pretreatment Module applies the pretreatment method described in the third step to X _user Preprocessing to obtain a preprocessed image setWill->And corresponding tag set->Is sent to a sixth self-supervision and enhancement module, yy _n is xx _n Is a label of (a).

14.3 the sixth self-monitoring enhancement module receives the preprocessed test image set from the sixth input preprocessing moduleAnd->For->Adopting the self-supervision label enhancement method in the fourth step to enhance, and generating an enhanced user image set +.>And tag->Will->And tag->Transmitting to a sixth feature learning module;

14.4 sixth feature learning ModuleReception from self-supervision enhancement moduleAnd->The feature extraction method described in step 5.2 is used for +.>Extracting features to obtain->High-dimensional semantic feature set F' _user ，/> Will F' _user And->And sending the result to a sixth classification prediction module. />Is a second high-dimensional semantic feature set, +.>Is a third high-dimensional semantic feature set, +.>Is the fourth high-dimensional semantic feature set, +.>Is the fifth high-dimensional semantic feature set, +.>Is a sixth high-dimensional semantic feature set;

14.5 the sixth classification prediction module receives the high-dimensional semantic feature representation F 'from the sixth feature learning module' _user F 'is set' _user To a dimension reduced toThe number of classes involved (i.e. from [256, 512 weight parameters of full connectivity layer ]]The dimension of (2) is reduced to [256, 100 ]Dimension), select [256, 100]And the class number corresponding to the vector with the maximum probability in the dimension is used as a classification result of the test image.

And fifteenth step, ending.

The invention can achieve the following technical effects:

1. the invention can improve the recognition accuracy of images to all categories, and the invention calculates knowledge distillation loss by adopting a knowledge distillation method in step 10.4.3.3, and compared with the traditional knowledge distillation method based on an output layer and intermediate characteristics, the invention leads the multi-scale self-attention distillation loss L to be realized _MSFD Probability of feature similarity distillation loss L _FSPD And global characteristic distillation loss L _GFD The three parts are added, so that the classification precision of all the categories can be improved.

2. The thirteenth step of the present invention is to input test image X to the user without storing any old class training data _user The image classification is carried out, so that the problem of catastrophic forgetting of the old class is effectively solved, and the classification precision of the image input by the user is improved.

The foregoing has been presented in some detail to facilitate understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and such modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

The invention has been proved by experiments on CIFAR100 data set, and the experimental results show that the invention can not only relieve catastrophic forgetting, but also reduce classification deviation.

Drawings

FIG. 1 is a logic structure diagram of an old class image classification system facing a class increment learning scene and all class image recognition systems facing the class increment learning scene, which are constructed in the first step;

fig. 2 is a general flow chart of the present invention.

Detailed Description

FIG. 2 is a general flow chart of the present invention; as shown in fig. 2, the present invention includes the steps of:

first, an old class image classification system for identifying the old class oriented class increment learning scene is constructed. As shown in FIG. 1, the old class image recognition system facing class increment learning scene is composed of an old class image data set K _old The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first characteristic learning module and a first classification prediction module. The first input preprocessing module, the first self-supervision enhancement module, the first feature learning module, and the first classification prediction module are each implemented in a deep learning framework, pyTorch, by a multi-layer convolutional neural network CNN (Convolutional Neural Network).

The first input preprocessing module is connected with the old-class image data set, the first self-supervision enhancement and the first characteristic learning module and is used for processing the old-class image data set X _old Reading old class images, and inputting a first pair of preprocessing modules from X _old The old class images in the old class image set X read in the process are subjected to preprocessing such as image random cutting, horizontal overturning, brightness change, normalization and the like to obtain a preprocessed old class image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Representing the preprocessed old class image set X ¹ And tag set Y ¹ And sending the message to the first self-supervision and enhancement module.

The first self-supervision enhancing module is connected with the first input preprocessing module and the first characteristic learning module, and receives X from the first input preprocessing module ¹ And Y ¹ For X ¹ And Y ¹ Enhancement was performed using the Self-supervised label enhancement method (H.Lee, S.J.Hwang, and J.shin, "Self-supervised Label Augmentation via Input transformations." arXiv, jun.29, 2020.doi:10.48550/arXiv.1910.05872.H.Lee et al, paper: self-supervised label enhancement by input transformations, 5714-5724), yielding an enhanced old class image set X ¹ ' at X ¹ Generating enhanced tag set Y on' basis ¹ ' X is as follows ¹ ' and Y ¹ ' transmittingTo the first feature learning module.

The first feature learning module is connected with the first self-supervision and enhancement module and the first classification prediction module, and receives the enhanced X from the first self-supervision and enhancement module ¹ ' and Y ¹ ' from X ¹ ' extracting a high-dimensional semantic feature representation set F ¹ ', F ¹ ' and Y ¹ ' send to the first classification prediction module. The first feature learning module is a ResNet18 network (He K, zhang X, ren S, et al deep Residual Learning for Image Recognition [ J)]The paper of IEEE,2016.He K et al: depth residual learning for image recognition 770-778), the network is divided into six modules. The first module consists of a first convolution layer, a first normalization layer, an activation function layer and a first downsampling layer, wherein the convolution kernel of the first convolution layer is 3 multiplied by 3, the step length is 1, and the filling size is 1; the second to fifth modules are composed of two residual units, and each residual unit is composed of 1 convolution layer, 1 normalization layer and an activation function layer; the convolution kernel size of the convolution layer of the residual unit in the second module is 3×3, the step size is 1, and the padding size is 1; the convolution kernel size of the convolution layer of the residual unit in the third module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fourth module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fifth module is 3×3, the step size is 2, and the padding size is 1. The sixth module consists of a second downsampling layer, the step size of which is 1, without padding. The activation function layers adopted by the first feature learning module all adopt ReLU functions (Jiang Angbo, wang Weiwei. ReLU activation function optimization study [ J ] ]Sensor and microsystem, 2018, 37 (02): 50-52.).

The first classification prediction module is connected with the first feature learning module and consists of 1 full-connection layer. The first classification prediction module receives a high-dimensional semantic feature representation set F from the first feature learning module ¹ ', F ¹ The' dimension is reduced to the number of categories of the old-class image, and then a cross entropy loss function is utilized (see literature "Mannor S, peleg D, rubistein R.the cross entropy method for) classification[C]International machine learning conference on the order of// Proceedings of the, nd international conference on Machine learning, 2005:561-568 "@, mannor S et al: the cross entropy method is used for classifying) calculates the difference between the predicted category and the real label as a loss value, and optimizes the first feature learning module by using the back propagation of the loss value.

Second step, constructing an old-class image data set X for training an old-class image recognition system facing class increment learning scene _old . CIFAR100 (https:// www.cs.toronto.edu/-kriz/CIFAR. Html, 2009) from the open source dataset was used as the training set. There are 100 classes in the CIFAR100 dataset, each class containing 600 color images of size 32 x 32, each class having 500 training images and 100 test images. All training images of the first 50 categories are selected as the old-category image dataset X _old Is X _old Dispensing label Y _old The set of test images of all categories is denoted as X _test 。

Third step, the first input preprocessing module is used for preprocessing the first input from X _old Old class image set X of size N, x= { X ₁ ，x ₂ ，...，x _n ...，x _N N=64, 1.ltoreq.n.ltoreq.x _n Representing the nth image in X, the first input preprocessing module adopts a preprocessing method to preprocess the X in X _n Preprocessing including random cutting, horizontal overturning, brightness change, normalization and the like of the image to obtain a preprocessed old image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Representation of Collecting the preprocessed old class images X ¹ And corresponding tag set Y ¹ Send to the first self-monitoring enhancement module, y _n Is x _n The method is as follows:

3.1 let variable n=1;

3.2 x _n Conversion to RGB color space to obtainX to 3 channels _n ；

3.3 x of 3 channels _n Is normalized to 32 x 32 to obtain normalized x _n ；

3.5 x _n Tag y of (2) _n Put into tag set Y ¹ ，y _n ∈Y _old ；

3.6 if N is less than N, let n=n+1, turn 3.2; if n=n, a preprocessed image set X is obtained ¹ And tag set Y ¹ Wherein, the method comprises the steps of, wherein,Y ¹ ＝{y ₁ ，y ₂ ，...，y _n ...，y _N 3.7;

4.1 let variable n=1;

4.4 if N is less than N, let n=n+1, turn 4.2; if n=n, the enhanced old-class image set X is obtained ¹ ′， And tag set Y ¹ ′，Y ¹ ′＝{{y ₁ ，y ₁ +1，y ₁ +2，y ₁ +3}，{y ₂ ，y ₂ +1，y ₂ +2，y ₂ +3}，…，{y _n ,y _n +1，y _n +2，y _n +3}，…，{y _N ，y _N +1，y _N +2，y _N +3}, turn 4.5;

4.5X ¹ ' and Y ¹ ' send to the first feature learning module.

5.1 initializing weight parameters in a first feature learning module and a first classification prediction module, enabling an initial learning rate to be 0.01, enabling a batch processing size to be N, enabling N=64, enabling N to be more than or equal to 1 and less than or equal to N, enabling a total training iteration round number epoch_max to be 100, and enabling a current training round number epoch_cur to be 1.

5.2 the first feature learning module receives X from the first self-supervised enhancement module ¹ ' and Y ¹ 'X' is extracted by adopting a characteristic extraction method ¹ ' extracting features to obtain X ¹ ' high-dimensional semantic feature set F ¹ ', at this time F ¹ ′＝{F ² ，F ³ ，F ⁴ ，F ⁵ ，F ⁶ }，F ² Representing a second high-dimensional semantic feature set, F ³ Representing a third high-dimensional semantic feature set, F ⁴ Representing a fourth high-dimensional semantic feature set, F ⁵ Representing a fifth high-dimensional semantic feature set, F ⁶ Represents the sixth high-dimensional semantic feature set and uses F ¹ ' and Y ¹ ' send to the first classification prediction module. The specific method comprises the following steps:

5.2.1 initializing n=1;

5.2.2A first module in the first feature learning Module adopts a convolution method to X ¹ Nth set of images inPerforming convolution operation for 1 time to obtain the result of the first convolution module of the first module>Will->To the second moduleThe method comprises the following steps: />

5.2.3 the second of the first feature learning modules receives from the first of the first feature learning modules By residual unit operation (see literature "He K, zhang X, ren S, et al deep residual learning for image recognition [ C ]]/(Proceedings of the IEEE conference on computer vision and pattern recoganation.2016): 770-778 ", heK et al: image recognition based on depth residual learning) pair +.>Performing convolution operation for 2 times to obtain the result of the second module in the feature learning module with channel number of 64>Will->To a third module of the first feature learning modules and will +.>Put to a second high-dimensional semantic feature set F ² Is a kind of medium.

5.2.4 the third module of the first feature learning module receives from the second module of the first feature learning moduleUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the number of channelsResults of the third module in the feature learning module of 128 +.>Will->To the fourth module of the first feature learning module and will +.>Put to the third high-dimensional semantic feature set F ³ Is a kind of medium.

5.2.5 the fourth module of the first feature learning module receives from the third module of the first feature learning moduleUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the fourth module in the feature learning module with 256 channels +. >Will->To the fifth module of the first feature learning module and will +.>Put to fourth high-dimensional semantic feature set F ⁴ Is a kind of medium.

5.2.7 the sixth module of the first feature learning modules receives from the fifth module of the first feature learning modulesSecond downsampling layer pair in sixth module>Downsampling to obtain the result +.f of the sixth module in the feature learning module with 512 channels>Will->Send to the first class prediction module and will +.>Put to the sixth high-dimensional semantic feature set F ⁶ Is a kind of medium.

L ₁ ＝L _ce (f(F ⁶ )，Y ¹ ' formula (1)

And sixthly, constructing all class image recognition systems for recognizing all class-oriented incremental learning scenes by adopting all class image recognition system construction methods. All kinds of image recognition systems facing to class increment learning scene are shown in figure 1, and a new kind of training image data set X _new The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first characteristic learning module and a first classification prediction module. Wherein X is _new Is CIFAR1Class 51 to 60, X in the 00 dataset _new The label denoted Y _new The second input preprocessing module has the same structure and function as the first input preprocessing module, the second self-supervision and enhancement module has the same structure and function as the first self-supervision and enhancement module, the second feature learning module has the same structure and function as the first feature learning module, and the second classification prediction module has a different structure and function from the first classification prediction template. The weight parameters of the second feature learning module are initialized by adopting the parameters of the first feature learning module, and the weight parameters of the second classification prediction module are initialized by adopting the parameters of the first classification prediction module. The second input preprocessing module, the second self-supervision enhancing module, the second feature learning module and the second classification prediction module are also all realized by the multi-layer convolutional neural network CNN in the deep learning framework PyTorch.

Second input preprocessing module and new type image data set X _new The second self-supervision enhancement and the second characteristic learning module are connected, and the second input preprocessing module is used for preprocessing the new image data set X _new Reading new class image set XX, XX= { XX ₁ ，xx ₂ ，...，xx _n …，xx _N }, for XX of XX _n Preprocessing such as random image clipping, horizontal overturning, brightness change, normalization and the like to obtain a preprocessed new image set X ² Collecting the preprocessed new image set X ² And corresponding label Y ² (Y ² ＝{yy ₁ ，yy ₂ ，...，yy _n ...，yy _N -yy) to a second self-supervising enhancement module _n Is xx _n Is a label of (a).

The second self-supervision enhancing module is connected with the second input preprocessing module and the second characteristic learning module, and receives X from the second input preprocessing module ² And Y ² For X ² The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ² ' sum tag set Y ² ' X is as follows ² ' and Y ² ' send to second feature learning modelA block.

The second classification prediction module is connected with the second feature learning module and consists of 1 full-connection layer. The output of the full-connection layer is expanded based on the output of the first classification prediction module, and the output is expanded into the number of categories (sum of the number of old categories and the number of new categories) of the images of all the categories. The second classification prediction module receives the high-dimensional semantic feature representation of the new class image from the second feature learning module, reduces the dimension of the high-dimensional semantic feature representation of the new class image to the number of classes of all images (new class + old class), calculates the difference between the prediction class and the real label as a loss value by using a cross entropy loss function, and optimizes the second feature learning module by using the back propagation of the loss value. Calculation of P using a consistency regularization loss function _old And P _aug And calculating the difference between the old class image and the new class image by using a knowledge distillation loss function, taking the sum of the three difference values as a loss value, and optimizing a second feature learning module by using the back propagation of the loss value.

Eleventh step, let the dimension of the full link layer parameter in the second classification prediction module be [512, old_classes+10 ] ]Wherein the total number of categories num_categories=100, the old categories old_categories=the number of old categories K _old =50, the dimension of the full-link layer parameter in the second class prediction module is [512, old_classes ]]Is initialized with full connection layer weight parameters of the first class prediction module, and otherwise [512, 10]The dimension of (2) is initialized by adopting a random assignment mode. Number of new categories K _new ＝10；

12.2.1 the second feature learning module receives X from the second self-supervised enhancement module ² ' and Y ² ' X is extracted by the characteristic extraction method described in step 5.2 ² ' extracting features to obtain X ² ' New class high-dimensional semantic feature set F ² ', F ² ' and Y ² ' send to second class prediction Module, F ² ′＝{FF ² ，FF ³ ，FF ⁴ ，FF ⁵ ，FF ⁶ }，FF ² Is X ² ' second high-dimensional semantic feature set of new class, FF ³ Is X ² ' third high-dimensional semantic feature set of new class, FF ⁴ Fourth high-dimensional semantic feature set, FF, which is a new class ⁵ Is X ² ' the fifth high-dimensional semantic feature set of new class, FF ⁶ Is X ² The sixth high-dimensional semantic feature set of the' new class.

12.2.2 the first feature learning module receives X from the second self-monitoring enhancement module ² ' and Y ² ' X is extracted by the characteristic extraction method described in step 5.2 ² ' extracting features to obtain X ² ' old class high-dimensional semantic feature set F ¹ ", F ¹ "send to second class prediction Module, F ¹ ″＝{FFF ² ，FFF ³ ，FFF ⁴ ，FFF ⁵ ，FFF ⁶ }，FFF ² Is X ² ' old class second highest dimensional semantic feature set, FFF ³ Is X ² ' old class third high-dimensional semantic feature set, FFF ⁴ Is X ² ' old class fourth high-dimensional semantic feature set, FFF ⁵ Is X ² ' old classFive-high-dimensional semantic feature set, FFF ⁶ Is X ² ' old class sixth high-dimensional semantic feature set.

12.2.4.2 old class prototype set P for 5.5 step computation _old Enhancement by Gaussian noise, and calculation of enhanced prototype P by equation (3) _aug

P _aug ＝P _old +e×r _t Formula (3)

E in the formula (3) represents Gaussian noise, e is subjected to normal distribution, r _t An uncertainty scale index, r, representing a control enhancement prototype _t By the formulaTo calculate +.>For->The square root is calculated. />D represents the dimension 512 dimension of the feature space, Σ _t，k Covariance matrix representing K-class features, T _r Representing the trace of the calculated matrix.

12.2.4.3 employs the symmetrical Kullback-Leibler (KL) technique by equation (4) (see literature "B.zheng et al.,"Consistency Regularization for Cross-Lingual Fine-Tuning.”arXiv，Jun.15，2021.Accessed：Jun.06，2022.[Online].Available：http://arxiv.org/abs/2106.08226The paper of ", b.zheng et al: consistency regularization across language fine tuning) calculates the old class prototype set P calculated in 5.5 steps _old And 5.6 steps of calculation to obtain enhanced prototype P _aug The consistency loss between the two is taken as prototype consistency loss L _p . Wherein f (·) represents a full-connection layer classifier of the classification prediction module in the classification system of all classes of images, KL _S Is a symmetric KL divergence;

L _p ＝KL _S (f(P _old )，f(P _aug ) Formula (4)

12.2.4.4 the knowledge distillation loss is calculated by a knowledge distillation method, and the knowledge distillation loss comprises a multi-scale self-care distillation loss L _MSFD Probability of feature similarity distillation loss L _FSPD And global characteristic distillation loss L _GFD The method comprises the following three parts:

12.2.4.4.1.2 from F ¹ "take out { FFF ] ² ，FFF ³ ，FFF ⁴ ，FFF ⁵ For F (V) ¹ Each feature representation in "computes a self-attention feature representation, FFF ² Self-attention characteristic representation of (2)FFF ² Channel number c=64, fff ³ Self-attention feature representation- >FFF ³ Channel number c=128, fff ⁴ Self-attention feature representation-> FFF ⁴ Channel number c=256, fff ⁵ Self-attention feature representation->FFF ⁵ Channel number c=512.

12.2.4.4.2.2 the formula of the Kullback-Leibler (KL) divergence is used (see literature "Joyce J M. Kullback-Leibler divergence [ M ]]v/International encyclopedia of statistical science, springer, berlin, heidelberg,2011:720-722 ", paper by Joyce J M et al: K-L divergence) computing feature similarity probability distillation loss L _FSPD =kl (p ", p ') (i.e. KL divergence is calculated for p" and p').

L _GFD ＝||FF ⁶ -FFF ⁶ || ² Formula (7)

and 13.2, constructing an x+1 class increment learning scene-oriented all-class image recognition system by adopting the all-class image recognition system construction method in the sixth step, wherein the system consists of an x+2 input preprocessing module, an x+2 self-supervision enhancing module, an x+2 characteristic learning module and an x+2 classification prediction module. The structure and the function of the x+2 input preprocessing module, the x+2 self-supervision and enhancement module and the x+2 characteristic learning module are the same as those of the second input preprocessing module, the second self-supervision and enhancement module and the second characteristic learning module. The dimension of the full link layer parameter of the x+2-th classification prediction module is initialized to [512, old classification + (x+1) 10];

13.4 x+2 input pretreatment Module from X _new Is of size N, xxx= { XXX ₁ ，xxx ₂ ，...，xxx _n ...，xxx _N N=64, 1.ltoreq.n.ltoreq.64; the x+2 input preprocessing module adopts the preprocessing method in the third step to perform XXX in XXX _n Preprocessing to obtain a new type of preprocessed image set X ³ Collecting the preprocessed new image set X ³ And corresponding label Y ³ (Y ³ ＝{yyy ₁ ，yyy ₂ ，...，yyy _n ...，yyy _N X+2 self-monitoring enhancement module, yyy) _n Is xxx _n Is a label of (a).

13.5 (x+2) th self-monitoring enhancement module receives X from (x+2) th input preprocessing module ³ And Y ³ For X ³ The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ³ ' and tag Y ³ ' X is as follows ³ ' and Y ³ ' send to the x+2 feature learning module.

13.8 let t=x+1, k _old ＝50+10x，K _new =10, the training method described in the twelfth step was applied to the enhanced new class data set X obtained in 13.5 steps ³ Training an x+2 feature learning module and an x+2 classification prediction module of all types of image recognition systems of an x+1 type increment-oriented learning scene to obtain the x+2 feature learning module and the x+2 classification prediction module And obtaining all kinds of image classification systems after the x+1st training by the optimal network weight parameters of the x+2 classification prediction module.

14.2A sixth input pretreatment Module applies the pretreatment method described in the third step to X _user Preprocessing to obtain a preprocessed image setWill->And corresponding tag set->Is sent to a sixth self-supervision and enhancement module, yy _n is xx _n Is a label of (a). />

14.3 the sixth self-monitoring enhancement module receives the preprocessed test image set from the sixth input preprocessing moduleAnd->For->Adopting the self-supervision label enhancement method in the fourth step to enhance, and generating an enhanced user image set +. >And tag->Will->And tag->Transmitting to a sixth feature learning module;

14.4 the sixth feature learning module receives from the self-monitoring enhancement moduleAnd->The feature extraction method described in step 5.2 is used for the ∈K>Extracting features to obtain->High-dimensional semantic feature set F' _user ，/> Will F' _user And->And sending the result to a sixth classification prediction module. />Is of the second highest dimensionSemantic feature set,/->Is a third high-dimensional semantic feature set, +.>Is the fourth high-dimensional semantic feature set, +.>Is the fifth high-dimensional semantic feature set, +.>Is a sixth high-dimensional semantic feature set;

14.5 the sixth classification prediction module receives the high-dimensional semantic feature representation F 'from the sixth feature learning module' _user F 'is set' _user To a dimension reduced toThe number of classes involved (i.e. from [256, 512 weight parameters of full connectivity layer ]]To [256,100 ]]Dimension), select [256,100 ]]And the class number corresponding to the vector with the maximum probability in the dimension is used as a classification result of the test image. And fifteenth step, ending.

In order to verify the image classification precision of the invention in a class-oriented incremental learning scene, an open source data set CIFAR100 (https:// www.cs.toronto.edu/-kriz/CIFAR. Html, 2009) collected by Alexander-kriging-Reed-Nel and Jeffy-Xindun is selected as a test data set, all types of test images in the data set are used as final test set data, and the image classification is carried out on the test set images in the experimental process by adopting the invention. The initial learning rate is 0.01 during each training, the batch processing size is N, N=64, N is not less than 1 and not more than 64, 100 training iteration total rounds epoch_max is set, and the learning rate is divided by 10 after 45 and 90 rounds. The initial number of old categories is 50, the remaining 50 categories are divided into 5 incremental phases, and the number of categories for each incremental phase is equal, all set to 10. And classifying the images in the test data set by adopting a final trained all-class image classification system, wherein the evaluation index is the average classification precision of all classes in the images in the test data set. All experiments were run on a server in the Linux system, with 1 GPU for the whole experiment. The results of the experiments are shown in Table 1,

The image classification method provided by the invention can improve the image classification precision in the class increment learning scene, can achieve 66.97% recognition performance, and the recognition performance of other class increment image classification methods is lower than that of the invention. As shown in table 1, when the same feature extraction network is used, the classification accuracy of 25.33% is obtained using the image classification method EWC method of parameter isolation. Image classification methods LwF and PASS methods using regularization strategies achieved classification accuracy of 33.97% and 65.46%, respectively. The classification accuracy of 63.62% was achieved using the exemplary-based image classification method uci method.

TABLE 1

The invention does not need to store any old class images when the finally trained all class image classification system classifies the images, effectively solves the problem of catastrophic forgetting of the old class, and achieves the same level identification precision with the method for storing the old class images (the image classification method based on the examples) on the premise of not storing the old class images, so the invention can avoid the problem of data privacy under the condition of ensuring the precision.

The method for classifying the class increment image based on knowledge distillation and consistency regularization provided by the invention is described in detail. The principles and embodiments of the present invention have been described herein above to assist in understanding the core concept of the present invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and such modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A class increment image classification method based on knowledge distillation and consistency regularization is characterized by comprising the following steps:

firstly, constructing an old class image classification system for identifying an old class oriented class increment learning scene; class increment learning scene-oriented old class image recognition system is composed of old class image data set X _old The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first characteristic learning module and a first classification prediction module;

the first input preprocessing module is connected with the old-class image data set, the first self-supervision enhancement and the first characteristic learning module and is used for processing the old-class image data set X _old Reading old class images, and inputting a first pair of preprocessing modules from X _old Preprocessing old class images in the old class image set X read in the database to obtain a preprocessed old class image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Representing the preprocessed old class image set X ¹ And tag set Y ¹ Sending the first self-supervision and enhancement module;

the first self-supervision enhancing module is connected with the first input preprocessing module and the first characteristic learning module, and receives X from the first input preprocessing module ¹ And Y ¹ For X ¹ And Y ¹ Enhancement is carried out by adopting a self-supervision tag enhancement method, and an enhanced old class image set X is generated ¹ ' at X ¹ Generating enhanced tag set Y on' basis ¹ ' X is as follows ¹ ' and Y ¹ ' send to the first feature learning module;

first feature learning module and first self-monitoringThe supervision enhancing module is connected with the first classification predicting module, and the first characteristic learning module receives enhanced X from the first self-supervision enhancing module ¹ ' and Y ¹ ' from X ¹ ' extracting a high-dimensional semantic feature representation set F ¹ ', F ¹ ' and Y ¹ ' send to the first classification prediction module; the first feature learning module is divided into six modules, wherein the first module consists of a first convolution layer, a first normalization layer, an activation function layer and a first downsampling layer; the second to fifth modules are composed of two residual units, and each residual unit is composed of 1 convolution layer, 1 normalization layer and an activation function layer; the sixth module consists of a second downsampling layer;

the first classification prediction module is connected with the first feature learning module and consists of 1 full-connection layer; the first classification prediction module receives a high-dimensional semantic feature representation set F from the first feature learning module ¹ ', F ¹ The dimension of' is reduced to the number of categories of the old-class image, then the difference between the predicted category and the real label is calculated by using a cross entropy loss function to be used as a loss value, and the first feature learning module is optimized by using the back propagation of the loss value;

Second step, constructing an old-class image data set X for training an old-class image recognition system facing class increment learning scene _old The method comprises the steps of carrying out a first treatment on the surface of the An open source data set CIFAR100 is adopted as a training set; the CIFAR100 data set has 100 classes, each class comprises 600 color images with the size of 32 multiplied by 32, and each class has 500 training images and 100 test images; all training images of the first 50 categories are selected as the old-category image dataset X _old Is X _old Dispensing label Y _old ；

Third step, the first input preprocessing module is used for preprocessing the first input from X _old Old class image set X of size N, x= { X ₁ ,x ₂ ,…,x _n …,x _N N=64, 1.ltoreq.n.ltoreq.x _n Representing the nth image in X, the first input preprocessing module adopts a preprocessing method to preprocess the X in X _n Preprocessing, including random cutting, horizontal turning, brightness change and normalization of the image to obtain preprocessed old class diagramImage set X ¹ ，X is to be ¹ Y for tag set of (C) ¹ Represents Y ¹ ＝{y ₁ ,y ₂ ,…,y _n …,y _N },/>Collecting the preprocessed old class images X ¹ And corresponding tag set Y ¹ Send to the first self-monitoring enhancement module, y _n Is x _n Is a label of (2);

fourth, the first self-monitoring enhancement module receives X from the first input preprocessing module ¹ And Y ¹ For X ¹ Enhancement is carried out by adopting a self-supervision tag enhancement method, and an enhanced old class image set X is generated ¹ ' and tag Y ^1′ X is taken as ¹ ' and Y ¹ ' send to the first feature learning module, the method is:

4.1 let variable n=1;

4.2 pairs X ¹ In (a) and (b)Respectively rotating by 90 degrees, 180 degrees and 270 degrees to obtain rotated image +.>Will->Andput into enhanced old-class image set X ¹ ' in;

4.4 if n<N, let n=n+1, turn 4.2; if n=n, the enhanced old-class image set is obtained And tag set Y ^1′ ,Y ^1′ ＝{{y ₁ ,y ₁ +1,y ₁ +2,y ₁ +3},{y ₂ ,y ₂ +1,y ₂ +2,y ₂ +3},…,{y _n ,y _n +1,y _n +2,y _n +3},…,{y _N ,y _N +1,y _N +2,y _N +3}, turn 4.5;

4.5X ¹ ' and Y ^1′ Sending the first characteristic learning module;

fifth step, X is adopted ¹ Training a first feature learning module and a first classification prediction module of an old class image recognition system facing a class increment learning scene to obtain optimal network weight parameters of the first feature learning module and the first classification prediction module; the method comprises the following steps:

5.1 initializing weight parameters in a first feature learning module and a first classification prediction module, enabling an initial learning rate to be 0.01, enabling a batch processing size to be N, enabling a total training iteration round number epoch_max to be 100, and enabling a current training round number epoch_cur to be 1;

5.2 the first feature learning module receives X from the first self-supervised enhancement module ¹ ' and Y ^1′ The characteristic extraction method is adopted for X ¹ ' extracting features to obtain X ¹ ' high-dimensional semantic feature set F ^1′ ，F ¹ ′＝{F ² ,F ³ ,F ⁴ ,F ⁵ ,F ⁶ }，F ² Representing a second high-dimensional semantic feature set, F ³ Representing a third high-dimensional semantic feature set, F ⁴ Representing a fourth high-dimensional semantic feature set, F ⁵ Representing a fifth high-dimensional semantic feature set, F ⁶ Represents the sixth high-dimensional semantic feature set and uses F ¹ ' and Y ^1′ Sending the first classification prediction module; the specific method comprises the following steps:

5.2.1 initializing n=1;

5.2.2A first module in the first feature learning Module adopts a convolution method to X ¹ Nth set of images inPerforming convolution operation for 1 time to obtain the result of the first convolution module of the first module>Will->Sending to the second module;

5.2.3 the second of the first feature learning modules receives from the first of the first feature learning modulesUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the second module in the feature learning module with channel number of 64>Will->To a third module of the first feature learning modules and will +.>Put to a second high-dimensional semantic feature set F ² In (a) and (b);

5.2.4 the third module of the first feature learning module receives from the second module of the first feature learning module Use of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the third module in the feature learning module with 128 channelsWill->To the fourth module of the first feature learning module and will +.>Put to the third high-dimensional semantic feature set F ³ In (a) and (b);

5.2.5 the fourth module of the first feature learning module receives from the third module of the first feature learning moduleUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the fourth module in the feature learning module with 256 channels +.>Will->To a fifth module of the first feature learning modules,and will->Put to fourth high-dimensional semantic feature set F ⁴ In (a) and (b);

5.2.6 the fifth module of the first feature learning modules receives from the fourth module of the first feature learning modulesUse of residual unit operation method pair->Performing convolution operation for 2 times to obtain the result of the fifth module in the feature learning module with channel number of 512 +.>Will->To the sixth module in the feature learning module and to +.>Put to the fifth high-dimensional semantic feature set F ⁵ In (a) and (b);

5.2.7 the sixth module of the first feature learning modules receives from the fifth module of the first feature learning modulesSecond downsampling layer pair in sixth module>Downsampling to obtain the result +.f of the sixth module in the feature learning module with 512 channels >Will->Send to the first class prediction moduleBlock, and will->Put to the sixth high-dimensional semantic feature set F ⁶ In (a) and (b);

5.2.8 if N < N, let n=n+1, turn 5.2.2; if n=n, five high-dimensional semantic feature sets are obtained

Will F ² ,F ³ ,F ⁴ ,F ⁵ ,F ⁶ Put to X ¹ ' high-dimensional semantic feature set F ^1′ In this case F ¹ ′＝{F ² ,F ³ ,F ⁴ ,F ⁵ ,F ⁶ }；

5.2.9 the sixth module in the first feature learning module learns F ^1′ And Y ^1′ Sending the first classification prediction module;

5.3 the first class prediction module receives F from a sixth module in the feature learning module ^1′ And Y ^1′ Calculating F by using the formula (1) ^1′ F in (F) ⁶ And Y is equal to ^1′ Cross entropy loss L of (2) ₁ F in formula (1) represents the full-connected-layer classifier in the first classification prediction module, F (F) ⁶ ) Represents F ⁶ Prediction category L processed by classifier f _ce (f(F ⁶ ),Y ^1′ ) Representing prediction category F (F ⁶ ) With the genuine label Y ^1′ Cross entropy loss between;

L ₁ ＝L _ce (f(F ⁶ )，Y ¹ ' formula (U)

5.4, the current training round number epoch_cur=epoch_cur+1; if the epoch_cur is less than or equal to the training iteration total round number epoch_max, turning to 5.2; if the epoch_cur > training iteration total round number epoch_max, ending training and turning to 5.5;

5.6, saving weight parameters obtained by a first feature learning module and a first classification prediction module in the trained old class image recognition system in a pth format to obtain a trained old class image classification system facing to class increment learning scenes;

Sixthly, constructing all kinds of image recognition systems for recognizing all kinds of class-oriented incremental learning scenes by adopting a construction method of all kinds of image recognition systems; all kinds of image recognition systems facing to class increment learning scene are formed by a new kind of training image data set X _new The system comprises a first input preprocessing module, a first self-supervision and enhancement module, a first characteristic learning module and a first classification prediction module; x is X _new For classes 51 to 60, X in CIFAR100 dataset _new The label denoted Y _new The structure and the function of the second input preprocessing module are the same as those of the first input preprocessing module, the structure and the function of the second self-supervision and enhancement module are the same as those of the first self-supervision and enhancement module, the structure and the function of the second characteristic learning module are the same as those of the first characteristic learning module, and the second classification prediction module is different from the first classification prediction template; the weight parameters of the second feature learning module are initialized by adopting the parameters of the first feature learning module, and the weight parameters of the second classification prediction module are initialized by adopting the parameters of the first classification prediction module;

second input preprocessing module and new type image data set X _new The second self-supervision enhancement and the second characteristic learning module are connected, and the second input preprocessing module is connected with the first self-supervision enhancement and the second characteristic learning module from X _new Reading new class image set XX, XX= { XX ₁ ,xx ₂ ,…,xx _n …,xx _N For XXXx of (1) _n Preprocessing to obtain a new type of preprocessed image set X ² Collecting the preprocessed new image set X ² And corresponding label Y ² (Y ² ＝{yy ₁ ,yy ₂ ,…,yy _n …,yy _N -yy) to a second self-supervising enhancement module _n Is xx _n Is a label of (2);

the second self-supervision enhancing module is connected with the second input preprocessing module and the second characteristic learning module, and receives X from the second input preprocessing module ² And Y ² For X ² Enhancement is carried out by adopting a self-supervision tag enhancement method, and a new enhanced image set X is generated ² ' sum tag set Y ² ' X is as follows ² ' and Y ² ' send to the second feature learning module;

the second feature learning module is connected with the second self-supervision and enhancement module and the second classification prediction module, and receives X from the second self-supervision and enhancement module ² ' and Y ² ' from X ² Extracting high-dimensional semantic feature representation of the new class image from the 'extracting part', and sending the high-dimensional semantic feature representation of the new class image to a second class prediction module; the structure and the function of the six modules of the second feature learning module are the same as those of the six modules of the first feature learning module;

The second classification prediction module is connected with the second feature learning module and consists of 1 full-connection layer; the output of the full connection layer is expanded on the basis of the output of the first classification prediction module, and the output is expanded into the sum of the number of categories of the images of all categories, namely the number of old categories and the number of new categories; the second classification prediction module receives the high-dimensional semantic feature representation of the new type image from the second feature learning module, reduces the dimension of the high-dimensional semantic feature representation of the new type image to the number of classes of all images, calculates the difference between the prediction class and the real label as a loss value by using a cross entropy loss function, and optimizes the second feature learning module by using the back propagation of the loss value; calculation of P using a consistency regularization loss function _old And P _aug Difference between them by using knowledge distillation loss functionCalculating the difference between the old class image and the new class image characteristic representation, taking the sum of the three difference values as a loss value, and optimizing a second characteristic learning module by using the back propagation of the loss value;

seventh step, constructing new class image data set X for training all class image recognition systems _new The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following steps: selecting 51 th to 60 th class images from CIFAR100 dataset as new class image dataset X _new ；

Eighth step, the second input preprocessing module is used for preprocessing the input data from X _new Image set XX, xx= { XX with size N is read in ₁ ,xx ₂ ,…,xx _n …,xx _N N=64, 1.ltoreq.n.ltoreq.64; the second input preprocessing module adopts the preprocessing method in the third step to perform XX in XX _n Preprocessing to obtain a new type of preprocessed image set X ² Collecting the preprocessed new image set X ² And corresponding label Y ² Send to a second self-supervision and enhancement module, Y ² ＝{yy ₁ ,yy ₂ ,…,yy _n …,yy _N }，yy _n Is xx _n Is a label of (2);

ninth step, the second self-monitoring enhancement module receives X from the second input preprocessing module ² And Y ² For X ² The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ² ' and tag Y ² ' X is as follows ² ' and tag Y ² ' send to the second feature learning module; let the number of enhancements t=1;

a tenth step of initializing weight parameters of second feature learning modules of all class image classification systems facing the class increment learning scene by adopting weight parameters of a first feature learning module in the class increment learning scene-oriented old class image classification system trained in the fifth step;

eleventh step, let the dimension of the full link layer parameter in the second classification prediction module be [512, old_classes+10 ]]Wherein the total number of categories num_categories=100, the old categories old_categories=the number of old categories K _old =50, the dimension of the full link layer parameter in the second class prediction module is [512, old_classes]Is initialized with full connection layer weight parameters of the first class prediction module, and otherwise [512,10 ]]Initializing the dimension of the number by adopting a random assignment mode; number of new categories K _new ＝10；

Twelfth, training the new enhanced data set X obtained in the ninth step ² Training a second feature learning module and a second classification prediction module of all types of image recognition systems facing to a class increment learning scene to obtain optimal network weight parameters of the second feature learning module and the second classification prediction module, and obtaining all types of image classification systems after the first training, wherein the method comprises the following steps of:

12.1, setting the initial learning rate to be 0.01, setting the total training iteration round number epoch_max to be 100, and setting the current training round number epoch_cur to be 1;

12.2X ² Each image in' is input to a second feature learning module of all types of image classification systems, and then the second feature learning module and the second classification prediction module are trained by using knowledge distillation loss, cross entropy loss and consistency regularization loss to obtain optimal network weight parameters in the second feature learning module and the second classification prediction module, wherein the method comprises the following steps:

12.2.1 the second feature learning module receives X from the second self-supervised enhancement module ^2′ And Y ^2′ X is extracted by adopting the characteristic extraction method in the step 5.2 ^2′ Extracting features to obtain X ^2′ New class high-dimensional semantic feature set F ^2′ F is to F ^2′ And Y ^2′ Send to the second classification prediction module, F ^2′ ＝{FF ² ,FF ³ ,FF ⁴ ,FF ⁵ ,FF ⁶ }，FF ² Is X ^2′ Is a second high-dimensional semantic feature set, FF, of a new class of (1) ³ Is X ^2′ Third-high-dimensional semantic feature set of new classes, FF ⁴ Fourth high-dimensional semantic feature set, FF, which is a new class ⁵ Is X ^2′ Is a fifth high-dimensional semantic feature set, FF, of a new class of (1) ⁶ Is X ^2′ A sixth high-dimensional semantic feature set of the new class;

12.2.2 first feature learningThe module receives X from the second self-monitoring enhancement module ^2′ And Y ^2′ X is extracted by adopting the characteristic extraction method in the step 5.2 ² ' extracting features to obtain X ² ' old class high-dimensional semantic feature set F ^1″ F is to F ^1″ Send to the second classification prediction module, F ^1″ ＝{FFF ² ,FFF ³ ,FFF ⁴ ,FFF ⁵ ,FFF ⁶ }，FFF ² Is X ² ' old class second highest dimensional semantic feature set, FFF ³ Is X ² ' old class third high-dimensional semantic feature set, FFF ⁴ Is X ² ' old class fourth high-dimensional semantic feature set, FFF ⁵ Is X ² ' old class fifth high-dimensional semantic feature set, FFF ⁶ Is X ² ' old class sixth high-dimensional semantic feature set;

12.2.3 the second class prediction module of the all-class image classification system receives F from the second feature learning module ² ' and Y ^2′ Receiving F from a first feature learning module ^1″ ；

12.2.4 training the second feature learning module and the second classification prediction module with a sum of cross entropy loss, knowledge distillation loss, and consistency regularization loss by:

12.2.4.1 calculating a sixth high-dimensional semantic feature set FF ⁶ And Y ^2′ The difference between them as a new class cross entropy class loss L ₂ ,L ₂ ＝L _ce (f(FF ⁶ ,Y ^2′ ))；

P _aug ＝P _old +e×r _t Formula (3)

E in the formula (3) represents Gaussian noise, e is subjected to normal distribution, r _t An uncertainty scale index, r, representing a control enhancement prototype _t By the formulaCalculate->Then pair->Calculating square root; />D represents the dimension 512 dimension of the feature space, Σ _t,k Covariance matrix representing K-class features, T _r Representing the trace of the calculated matrix;

12.2.4.3 calculating old class prototype set P by adopting symmetrical KL technology according to formula (4) _old And enhanced prototype P _aug Consistency loss between as consistency loss L _p Wherein f (·) represents a full-connection layer classifier of the classification prediction module in the classification system of all classes of images, KL _S Is a symmetric KL divergence;

L _p ＝KL _S (f(P _old )，f(P _aug ) Formula (4)

12.2.4.4.1.1 from F ^2′ Take out { FF } ² ,FF ³ ,FF ⁴ ,FF ⁵ For F (V) ² Each feature representation in' computes a self-attention feature representation, FF ² Self-attention characteristic representation of (2)FF ² Channel number c=64, ff ³ Self-attention characteristic representation of (2)FF ³ Channel number c=128, ff ⁴ Self-attention feature of (2)Representation->FF ⁴ Channel number c=256, ff ⁵ Self-attention feature representation->FF ⁵ C=512, C being the number of channels represented by the self-care feature;

12.2.4.4.1.2 from F ^1″ Take out { FFF } ² ,FFF ³ ,FFF ⁴ ,FFF ⁵ For F (V) ^1″ Each feature representation in (a) calculates a self-attention feature representation, FFF ² Self-attention characteristic representation of (2)FFF ² Channel number c=64, fff ³ Self-attention feature representation->FFF ³ Channel number c=128, fff ⁴ Self-attention feature representation-> FFF ⁴ Channel number c=256, fff ⁵ Self-attention feature representation->FFF ⁵ Channel number c=512;

12.2.4.4.1.3 calculation of multiscale self-noted distillation loss L _MSFD ， ‖q‖ ² Representing norming q;

12.2.4.4.2.1 from F ² ' fetch FF in ⁶ From F ¹ "take out FFF in ⁶ Calculating a first feature similarity probability distribution p 'by a similarity probability distribution technique using formula (5), and calculating a second feature similarity probability distribution p' using formula (6);

12.2.4.4.2.2 calculating the probabilistic distillation loss L of the feature similarity by using KL divergence formula _FSPD ＝KL(p″,p′)；

12.2.4.4.3 from F ² ' fetch FF in ⁶ From F ¹ "take out FFF in ⁶ Calculating global characteristic distillation loss L by adopting a formula (7) _GFD ；

L _GFD ＝||FF ⁶ -FFF ⁶ || ² Formula (7)

12.2.4.4.4 calculation of total knowledge distillation loss L _kd ，L _kd ＝L _MSFD +L _FSPD +L _GFD ；

12.2.4.4.5 utilizes the overall optimization objective L _total Updating parameters of a second feature learning module and a second classification prediction module in all types of image classification systems, L _total ＝L _clf +λ ₁ L _p +λ ₂ L _kd The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _clf Representing the loss of classification of new class feature representation and old class prototype and enhanced prototype by all class image classification systems, L _clf ＝L _ce (f(P _old ；P _aug ,Y ^1′ )+L ₂ ，L _ce (f(P _old ；P _aug ),Y ^1′ ) Representing old class prototypes, enhancing prototypes and old class labels Y ^1′ Classification loss of L ₂ ＝L _ce (f(FF ⁶ ),Y ^2′ ) Representing new class characteristics representing FF ⁶ With new class label Y ^2′ Cross entropy classification loss of (2); lambda (lambda) ₁ A first weight factor denoted as loss, lambda ₂ A second weight factor denoted as loss;

12.2.5 let the current training round number epoch_cur=epoch_cur+1; if the epoch_cur is less than or equal to the training iteration total round number epoch_max, turning to 12.2.4.3; if the epoch_cur > training iteration total round number epoch_max, obtaining all types of image classification systems after the first training, and turning to a thirteenth step;

13.2, constructing an x+1 class increment learning scene-oriented all class image recognition system by adopting the all class image recognition system construction method in the sixth step, wherein the system consists of an x+2 input preprocessing module, an x+2 self-supervision enhancement module, an x+2 characteristic learning module and an x+2 classification prediction module; the structure and the function of the x+2 input preprocessing module, the x+2 self-supervision and enhancement module and the x+2 characteristic learning module are the same as those of the second input preprocessing module, the second self-supervision and enhancement module and the second characteristic learning module; the dimension of the full link layer parameter of the x+2-th classification prediction module is initialized to [512, old_classification+ (x+1) 10];

13.4 x+2 input pretreatment Module from X _new Is of size N, xxx= { XXX ₁ ,xxx ₂ ,…,xxx _n …,xxx _N The x+2 input preprocessing module adopts the preprocessing method in the third step to process the XXX in the XXX _n Preprocessing to obtain a new type of preprocessed image set X ³ Collecting the preprocessed new image set X ³ And corresponding label Y ³ Send to the x+2 self-monitoring enhancement module, Y ³ ＝{yyy ₁ ,yyy ₂ ,…,yyy _n …,yyy _N }，yyy _n Is xxx _n Is a label of (2);

13.5 (x+2) th self-monitoring enhancement module receives X from (x+2) th input preprocessing module ³ And Y ³ For X ³ The self-supervision label enhancement method in the fourth step is adopted for enhancement, and a new enhanced image set X is generated ³ ' and tag Y ³ ' X is as follows ³ ' and Y ^3′ Transmitting the data to an x+2 feature learning module;

13.7, the dimension of the full-connection layer parameter in the x+2 classification prediction module is [512, old_classes+10 (x+1) ], wherein the part of the full-connection layer parameter in the second classification prediction module with the dimension of [512, old_classes+10x ] adopts the full-connection layer weight parameter of the x+1 classification prediction module to initialize, and the dimension of [512,10] adopts a random assignment mode to initialize;

13.8 let t=x+1, k _old ＝50+10x,K _new =10, using the training method described in the twelfth step using X ³ Training an x+2 feature learning module and an x+2 classification prediction module of all types of image recognition systems facing to the class increment learning scene of the x+1 th, so as to obtain optimal network weight parameters of the x+2 feature learning module and the x+2 classification prediction module, and obtaining all types of image classification systems after the x+1 th training;

13.9 if x < 5, let x=x+1, turn 13.2; if x=5, obtaining all kinds of image recognition systems after final training, wherein the image recognition systems comprise a sixth input preprocessing module, a sixth self-supervision and enhancement module, a sixth feature learning module and a sixth classification prediction module, and turning to a fourteenth step;

fourteenth step, the final trained all-class image classification system is adopted to test the image X input by the user _user Performing image classification to obtain a predicted image classification result; the method comprises the following steps:

14.2A sixth input pretreatment Module applies the pretreatment method described in the third step to X _user Preprocessing to obtain a preprocessed image setWill->And corresponding tag set->Send to the sixth self-supervision enhancement module, < >> yy _n Is xx _n Is a label of (2);

14.3 the sixth self-monitoring enhancement module receives the preprocessed test image set from the sixth input preprocessing moduleAnd->For->By self-heating as described in the fourth stepThe supervision tag enhancement method is enhanced to generate an enhanced user image set +.>And tag->Will->And tag->Transmitting to a sixth feature learning module;

14.4 the sixth feature learning module receives from the self-monitoring enhancement moduleAnd->The feature extraction method described in step 5.2 is used for the ∈K>Extracting features to obtain->High-dimensional semantic feature set F' _user ,/> Will F' _user And->Sending the result to a sixth classification prediction module; />Is the second high-dimensional semantic featureSyndrome set, ->Is a third high-dimensional semantic feature set, +.>Is the fourth high-dimensional semantic feature set, +.>Is the fifth high-dimensional semantic feature set, +.>Is a sixth high-dimensional semantic feature set;

14.5 the sixth classification prediction module receives the high-dimensional semantic feature representation F 'from the sixth feature learning module' _user F 'is set' _user To a dimension reduced toThe number of classes involved (i.e. from [256,512 weight parameters of full connectivity layer ]]To [256,100 ]]Dimension), select [256,100 ]]The class number corresponding to the vector with the maximum probability in the dimension is used as a classification result of the test image;

and fifteenth step, ending.

2. The knowledge distillation and consistency regularization-based class incremental image classification method of claim 1, wherein in a first step the first input preprocessing module, the first self-supervised enhancement module, the first feature learning module, and the first classification prediction module are each implemented in a deep learning framework PyTorch by a multi-layer convolutional neural network CNN; the first feature learning module is a ResNet18 network, the convolution kernel size of the first convolution layer of the first module of the first feature learning module is 3 multiplied by 3, the step length is 1, and the filling size is 1; each residual error unit of the second to fifth modules of the first feature learning module consists of 1 convolution layer, 1 normalization layer and an activation function layer, and the convolution kernel size of the convolution layer of the residual error unit in the second module is 3×3, the step size is 1, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the third module of the first feature learning module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fourth module of the first feature learning module is 3×3, the step size is 2, and the filling size is 1; the convolution kernel size of the convolution layer of the residual unit in the fifth module of the first feature learning module is 3×3, the step size is 2, and the filling size is 1; the step length of the second downsampling layer of the sixth module of the first feature learning module is 1, and no filling exists; the activation function layers adopted by the first feature learning module all adopt ReLU functions.

3. The knowledge distillation and consistency regularization-based class delta image classification method of claim 1, wherein the second input preprocessing module, the second self-supervision enhancement module, the second feature learning module and the second class prediction module in the sixth step are all implemented in a deep learning framework PyTorch by a multi-layer convolutional neural network CNN; the second feature learning module is a ResNet18 network.

4. The knowledge distillation and consistency regularization-based class delta image classification method as recited in claim 1, wherein in a third step said first input preprocessing module applies a preprocessing method to X of X _n Preprocessing to obtain a preprocessed old class image set X ¹ X is taken as ¹ Y for tag set of (C) ¹ Represents, and X is taken as ¹ And Y ¹ The method for sending the first self-supervision and enhancement module is as follows:

3.1 let variable n=1;

3.2 x _n Conversion to RGB color space, obtaining 3-channel x _n ；

3.3 x of 3 channels _n Is normalized to 32 x 32 to obtain normalized x _n ；

3.4 normalizing x _n Converting from vector form to tensor form to obtain x of tensor form _n X in tensor form _n By usingIndicating, will->Put into the preprocessed image set X ¹ ；

3.5 x _n Tag y of (2) _n Put into tag set Y ¹ ，y _n ∈Y _old ；

3.6 if n<N, let n=n+1, turn 3.2; if n=n, a preprocessed image set X is obtained ¹ And tag set Y ¹ Wherein, the method comprises the steps of, wherein,Y ¹ ＝{y ₁ ,y ₂ ,…,y _n …,y _N 3.7;

5. The knowledge distillation and consistency regularization-based class delta image classification method as recited in claim 1, wherein a first module of said first feature learning module of step 5.2.2 uses a convolution method for X ¹ Nth set of images inPerforming a first convolution operation to obtain the result of the first convolution module of the first module>Will->The method for sending to the second module is as follows:

5.2.2.1 first convolutional layer pair of first module in first feature learning modulePerforming two-dimensional convolution to obtain a two-dimensional convolution result +.>Will->Send to the first normalization layer,>the input channel of each image in (a) is 3;

5.2.2.3 pairs of activation function layers of a first module of the first feature learning modulesPerforming nonlinear activation to obtain nonlinear activation result +. >Will->Sending to the first downsampling layer;

5.2.2.4 first downsampling layer pair of first module of first feature learning modulePerforming maximum pooling operation to obtain the result of the first module in the feature learning module with channel number of 64>Will->And sending the result to a second module in the first characteristic learning module.

6. The knowledge distillation and consistency regularization based incremental image classification method of claim 1, wherein said λ is step 12.2.4.3.5 ₁ ＝λ ₂ ＝10。