CN114463605A

CN114463605A - Continuous learning image classification method and device based on deep learning

Info

Publication number: CN114463605A
Application number: CN202210381239.2A
Authority: CN
Inventors: 许俊杰; 王瑞轩; 谢旭辰; 黄钰竣
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-05-10
Anticipated expiration: 2042-04-13
Also published as: CN114463605B

Abstract

The invention discloses a continuous learning image classification method and device based on deep learning, wherein the method comprises the following steps: constructing a task continuous learning model with task specific batch normalization, wherein parameters of all convolution kernels in a feature extractor of the task continuous learning model are fixed in all tasks, and when each new task is learned, parameters of a batch normalization layer BN corresponding to each convolution kernel are learned together with a task specific classification head; performing incremental training on the task continuous learning model, and adding a new task specific batch normalization layer and a classification head when a new task comes; and after the incremental training is finished, obtaining a trained task continuous learning model, and inputting the image task to be classified into the trained task continuous learning model to finish the classification task. The invention effectively solves the problem of catastrophic forgetting by using batch normalized BN existing in the task continuous learning model.

Description

Continuous learning image classification method and device based on deep learning

Technical Field

The invention relates to the technical field of image classification, in particular to a continuous learning image classification method and device based on deep learning.

Background

Artificial intelligence, especially deep learning, has achieved great success in image classification. However, the model has a problem, a catastrophic forgetting problem. Namely: after learning new knowledge, the content of previous training is forgotten almost completely. In short, in the process of continuous learning, in the learning of each task, the model is updated by using different data, and the performance of the old task is greatly reduced by learning a new task. Therefore, it is crucial how to let the deep learning model have a continuous learning capability: i.e., the ability to continuously learn a new task without forgetting how to perform a previously trained task.

In recent years, various methods have been proposed to make classifiers capable of task incremental learning. One method is based on regularization, i.e. regularization of the change of the model parameters, which weights the importance of all parameters to preserve old knowledge in different ways, and preserves the knowledge about old classes by limiting the variation of parameters that are important for old tasks. However, this method often leads to a stiff model, which, although saving old-class knowledge to some extent, limits the learning of new knowledge by the model, and often falls into a situation where the new knowledge is not well learned and the related knowledge of old tasks cannot be well saved. Another approach is distillation-based, i.e. attempts are made to extract the saved knowledge about the old tasks in the old classifier into the new classifier by knowledge distillation. However, as continuous learning progresses and more tasks arrive, the two methods still suffer from serious catastrophic forgetfulness and it is difficult to balance learning new knowledge and retaining old knowledge. In order to make the classifier more flexible to learn new knowledge, there is another scalable network-based approach, i.e. to try to add new convolution kernels, convolution layers or even sub-networks when learning a new task. However, such extension-based approaches inevitably change the structure of the classifier, especially the feature extractor part, and require more and more storage space for storing the extended network model. Furthermore, determining when and where to add new network components is extremely challenging.

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a continuous learning image classification method and device based on deep learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a continuous learning image classification method based on deep learning, which comprises the following steps:

constructing a task continuous learning model with task specific batch normalization, wherein parameters of all convolution kernels in a feature extractor of the task continuous learning model are fixed in all tasks, and when each new task is learned, parameters of a batch normalization layer BN corresponding to each convolution kernel are learned together with a task specific classification head; the convolution kernel is a convolution kernel pre-trained in a first stage so that visually similar inputs produce similar feature vectors from the feature extractor;

performing incremental training on a task continuous learning model, adding a new task specific batch normalization layer and a classification head when a new task comes, when the training is in the first stage, a layer batch normalization layer is arranged behind each convolution layer, and the task continuous learning model determines which batch normalization layer and classification head the data flow to according to the current task label, namely when the current task label is greater than 1, the parameter of the convolution kernel in the feature extractor is not updated, and only the parameter of the batch normalization layer and classification head corresponding to the added new task is updated;

and after the incremental training is finished, obtaining a trained task continuous learning model, and inputting the image task to be classified into the trained task continuous learning model to finish the classification task.

As a preferred technical solution, the pre-trained convolution kernel is obtained by:

the feature extractor is trained from the beginning by using a training set of a first task, namely in the first task, the parameters of the whole neural network are trained by using all current training data, a new task comes, a group of new batch normalization layers and classification heads are added, the trained convolution kernel parameters of the first stage are fixed, and the new coming data are used for training the corresponding newly added batch normalization layers and classification heads.

As a preferred technical solution, the batch normalization layer BN is defined as follows:

during classifier training, given a batch of input images, forlThe first of the convolutional layerjSet convolution kernels with

Represents the convolution output of a certain image,

a set of convolution outputs representing all images in the batch,

representing the output of the batch normalization layer BN, then

The batch normalization layer BN operation is defined as:

wherein the content of the first and second substances,

and

respectively represent

The mean and standard deviation of all elements,

and

is a parameter to be learned in a BN, the BN is specific to each convolution kernel, each convolution kernel has a specific BN, and specific parameters are provided

。

Preferably, the task-oriented continuous learning model is multi-headed, and the task-oriented continuous learning model in the t-th stage comprises the feature extractor and a classification head specific to the task.

As a preferable mode, in the first aspect𝑡After the stage training is finished, because the task increment knows the task label corresponding to the input data, the test sample is subjected to

The task continuous learning model determines the flow direction of the sample according to the task label of the sample to obtain a vector

Wherein the feature extractor𝑓(+ -) corresponds to a parameter of

To the sample

The prediction tags of (a) are:

。

wherein the content of the first and second substances,

the model in the task continuous learning is used for testing the sample

Is predicted to bejThe probability of a class is determined by the probability of the class,n _trepresenting taskstThe number of categories in (1).

As a preferable technical solution, when the task continuous learning model is incrementally trained, the test data of any one task is predicted based on the convolution kernel trained by the first task and the BN corresponding to each task.

As a preferred technical solution, the task continuous learning model is to evaluate the performance of the model at each stage through the change of the MCR of the model at different stages, where the MCR is expressed as follows:

wherein the content of the first and second substances,𝑁_𝑡show to stage𝑡The number of all the known classes is,iis shown asiThe number of the categories is one,

indicating the recall rate.

The invention provides a continuous learning image classification system based on deep learning, which is applied to the continuous learning image classification method based on deep learning, and comprises a task continuous learning model construction module, an incremental training module and a classification module;

the task continuous learning model building module is used for building a task continuous learning model with task specific batch normalization, parameters of all convolution kernels in a feature extractor of the task continuous learning model are fixed in all tasks, and when each new task is learned, parameters of a batch normalization layer BN corresponding to each convolution kernel and a task specific classification head are learned together; the convolution kernel is a convolution kernel pre-trained in a first stage so that visually similar inputs produce similar feature vectors from the feature extractor;

the incremental training module is used for performing incremental training on the task continuous learning model, a new task specific batch normalization layer and a classification head are added when a new task comes, when the training reaches the first stage, a layer batch normalization layer exists behind each convolution layer, the task continuous learning model determines which batch normalization layer and classification head the data flow to according to the current task label, namely when the current task label is greater than 1, the parameter of the convolution kernel in the feature extractor is not updated, and only the parameter of the batch normalization layer and the classification head corresponding to the added new task is updated;

and the classification module is used for obtaining a trained task continuous learning model after incremental training is finished, inputting the image task to be classified into the trained task continuous learning model, and finishing the classification task.

Yet another aspect of the present invention provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method for deep learning based continuous learning image classification.

Still another aspect of the present invention provides a computer-readable storage medium storing a program which, when executed by a processor, implements the method for continuous learning image classification based on deep learning.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the knowledge of each task is well preserved and cannot be forgotten. Since the feature extractor part is not changed after the first task and since the number of BN parameters is very limited, for each task the task specific BN parameters can be stored directly with the task specific classifier head with negligible additional memory usage. Resulting in knowledge of the old task not being forgotten when learning the new task.

(2) The learning ability of the present invention for a new task is good enough because the feature extractor of the present invention is either a pre-trained initial feature extractor that is good enough or from a feature extractor trained after the first task without pre-training and has a task-specific BN, resulting in its learning ability being good enough to learn a new task. And the remaining methods are, for example: the method of constraining parameter modification and the knowledge-based distillation methods, their limitations, constrain the model's ability to learn new tasks.

(3) The invention has small memory space. This is because the invention utilizes the fixed feature to extract the convolution kernel part after training the first task, and compared with the convolution kernel part, the BN corresponding to each convolution kernel has only two parameters. And other methods for modifying the model structure need to add a new convolution kernel, a new sublayer and the like, so that the storage capacity is large.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for continuous learning image classification based on deep learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of model training according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a continuous learning image classification system based on deep learning according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings in the present application, and it should be understood that the accompanying drawings are only for illustrative purposes and are not to be construed as limiting the present patent. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The CNN classifier is composed of a feature extractor and a classification head. In the CNN model, the feature extractor is typically composed of multiple convolutional layers, and the classification header h (-) is typically composed of one or two fully-connected layers, with the final softmax output. Each convolution layer in the feature extractor includes a plurality of convolution kernels, each convolution kernel corresponding to a BN (Batch Normalization) and a nonlinear activation function.

In conventional task incremental learning strategies, different tasks share the feature extractor portion of the classifier, with a classification header specific to the task. At test time, for any new test image, the task identity of the image is known, so the shared feature extractor and corresponding task-specific classification header are known, in conjunction with which the class is predicted. This means that all convolution kernel parameters and the BN parameters corresponding to all convolution kernels grow incrementally with the arrival of the task. Changes in the parameters of the feature extractor, especially the convolution kernel portion, are the source of catastrophic forgetting, resulting in a degradation of performance of tasks learned in the past.

Deep learning techniques, particularly convolutional neural networks, have been widely used for intelligent diagnosis of various diseases based on medical images. However, current intelligent diagnostic systems can only help diagnose a specific set of diseases, in part because of the difficulty in collecting training data for all disease diagnostic tasks simultaneously. This results in more and more independent intelligent diagnostic systems, makes it difficult for a medical center to manage multiple separate systems, and results in medical personnel spending more time learning to use various systems. One solution to this problem is to develop a single intelligent system that can learn progressively more and more tasks, each for diagnosing a particular set of diseases. In order to prevent the system from expanding rapidly, it is generally assumed that the underlying classifier has a feature extractor shared by tasks, but has a plurality of task-specific classifier heads, that is, a plurality of tasks extract data features by using the same feature extractor, and after obtaining feature vectors, each task classifies data by using a classification head specific to the task. And assuming that the Task to which the data belongs is known during the test, i.e. the user knows which sort header should be applied for the test data, this problem is called the Task incremental Learning (Task incremental Learning) problem.

Continuous learning is set up such that over time, a classifier learns more and more tasks. Each time the classifier learns a new task, a new set of classes is classified, the classes learned by different tasks are generally considered non-overlapping. In the process of learning a new task, no data of previously learned tasks are retained, only training data of the new task is available.

When one CNN classifier is incrementally updated to process multiple classification tasks, most existing TIL strategies assume that all of these tasks share the same feature extractor, but have task-specific classification headers for the different tasks. This means that the parameters and BN parameters of all convolution kernels in the feature extractor are updated incrementally on the task, and all tasks share the same set of updated kernel parameters and BN parameters. Parameters in the feature extractor are the root of catastrophic forgetfulness in the TIL, and in order to adapt to the current task better, parameters related to the old task are changed inevitably, and data of the new task and the old task are not overlapped and have large difference, which causes the performance of the previous learning task to be reduced. To alleviate or ideally avoid the catastrophic forgetting problem, it is necessary to update the feature extractor as little as possible or not at all.

Referring to fig. 1, the method for classifying continuously learned images based on deep learning of the present embodiment includes the following steps:

s1, constructing a task continuous learning model with task specific batch normalization, wherein parameters of all convolution kernels in a feature extractor of the task continuous learning model are fixed in all tasks, and when each new task is learned, parameters of a batch normalization layer BN corresponding to each convolution kernel and a task specific classification head are learned together; the convolution kernel is a convolution kernel pre-trained in a first stage so that visually similar inputs produce similar feature vectors from the feature extractor; a well-trained convolution kernel is acquired in the initial stage and in the subsequent incremental stage.

It will be appreciated that since the parameters of all convolution kernels in the feature extractor are fixed in all tasks, it is crucial to obtain well-trained convolution kernels. In general, if two input data are visually different, an ideal feature extractor should output two different feature vectors, while two similar inputs should produce two similar feature vectors from the feature extractor, i.e., the picture with the larger difference should have a larger distance in the feature space, and the feature vectors of the similar pictures should have a closer distance in the feature space. A well-trained convolution kernel can be obtained by: (strategy 1) training the feature extractor from scratch using the training set of the first task; (strategy 2) training with a fixed pre-trained feature extractor, especially using a large-scale public dataset (such as ImageNet) or other relevant dataset when the pre-trained feature extractor is good; (strategy 3) the pre-trained feature extractor is fine-tuned using the training set of the first task.

Further, the pre-trained convolution kernel is obtained by:

In an embodiment of the present application, the above strategy 1 is adopted to train a convolution kernel, specifically:

order to

A set of training data representing different incremental phases,

wherein𝑋^𝑡 =

To represent a task𝑡The corresponding training data is then used to generate,

is shown as𝑖The image of one of the samples is taken,

is shown as𝑖One-hot (one-hot) labeling of individual samples,𝑛_𝑡representing tasks𝑡The number of corresponding training samples. Since the classifier proposed by the method is multi-headed, each stage𝑡Is usually composed of a feature extractor𝑓(. dash) and a classification header ℎ specific to the task_𝑡(. charpy) composition. For each input sample image

Passing through a feature extractor𝑓(. charpy) and classification head ℎ_𝑡(+) -making a vector

Wherein𝑐_𝑡Representing tasks𝑡The number of classes corresponding to the middle training data. Vector quantity

Finally, a Softmax function is input to obtain a probability vector

In which

Presentation input

Belong to the class𝑗Probability of (c):

thus, the cross-entropy penalty for model optimization is:

wherein, the first and the second end of the pipe are connected with each other,𝜃𝑡representing tasks𝑡Optimized model parameters are needed. When the temperature is higher than the set temperature𝑡Model parameters that need to be optimized when =1, i.e. in the initial phase

Wherein

The parameters of the convolutional layer are represented by,

representing the parameters corresponding to the batch normalization,

representing the parameters corresponding to the classification head.

S2, performing incremental training on the task continuous learning model;

in convolutional neural networks, each convolutional layer is usually followed by a batch normalization layer, eachEach batch of normalized convolutional layers corresponds to a convolutional kernel. In order to realize that the model can learn new knowledge through new data while keeping the knowledge about the old task, after the training in the initial stage is finished, the new training data𝑋_𝑡For the moment, the model needs to add a new task-specific batch normalization layer and classification header, where𝑡>1. Referring to FIG. 2, when the training proceeds to the stage𝑡At the same time, each layer of the convolution layer will be followed by𝑡The layer batch unifies layers, but the model decides which batch unifies layer and sort head the data flows to based on the current task. Namely when𝑡>1 hour, the model parameters to be optimized are

I.e. not updating the parameters of the convolution kernel in the feature extractor, only updating the added tasks𝑡Parameters of the corresponding batch normalization layer and classification header.

At a stage𝑡After training is finished, because the task increment knows the task label corresponding to the input data, the test sample is subjected to

Wherein the feature extractor𝑓(+ -) corresponds to a parameter of

To the sample

The prediction tags of (a) are:

。

further, the batch normalization layer BN is defined as follows:

Representing the convolution output of a certain image,

a set of convolution outputs representing all images in the batch,

representing the output of the batch normalization layer BN, then

The batch normalization layer BN operation is defined as:

wherein the content of the first and second substances,

and

respectively represent

The mean and standard deviation of all elements,

and

。

And S3, obtaining a trained task continuous learning model after the incremental training is finished, and inputting the image task to be classified into the trained task continuous learning model to finish the classification task.

Furthermore, in order to eliminate the influence caused by the sample imbalance, the index used by the embodiment for comparing the performance of each method is the Class Mean Recall (MCR). The MCR considers each class as equal regardless of the sample number of each class, so that the performance of model classification can be better reflected. The embodiment can see the forgetting degree of the model to the old knowledge through the MCR change of each model at different stages. For a certain class, the recall ratio is calculated as follows:

wherein the content of the first and second substances,𝑠_𝑖presentation class𝑖The number of samples of (a) to (b),𝑟_𝑘presentation class𝑖Sample of (1)𝑘Whether or not it is correctly classified when𝑟_𝑘When =1, represents a sample𝑘Is correctly classified into𝑖When is coming into contact with𝑟_𝑘When =1, represents a sample𝑘Are classified into other classes by errors. At the stage of entering increment𝑡After model training is finished, the MCR of the model is expressed as:

wherein the content of the first and second substances,

show to stage𝑡The number of all known classes.

In one embodiment of the application, the method is applied to medical image data MedMNIST-v2, and the effectiveness of the method is demonstrated by comparing with other classical continuous learning methods.

(1) Introduction to data set

MedMNIST-v2 is composed of a plurality of small data sets, including 12 2D medical image data sets and 6 3D medical image data sets, and the method of the present application is evaluated experimentally mainly on the 2D image data sets, and each small 2D data set of MedMNIST-v2 is a task. After excluding one multi-label task (i.e., ChestMNIST), one regression with order task (i.e., retinannist) and two tasks sharing the same raw patient data as the task organannist (i.e., organcninist and OrganSM-NIST), the selected 8 tasks of this example were performed, each picture having a size of 28 x 28 pixels. The selected 8 tasks involved multiple image modalities including pathology, dermatoscopy, X-ray, CT, ultrasound, and microscopy images. Wherein PathMNIST, octminst, and BloodMNIST are three-channel images, and the remaining 5 tasks are grayscale images, and in the experimental process, the embodiment copies the value of a single channel to the other two channels to generate corresponding three-channel images. Also, the PathMNIST, OCTMNIST, tissue MNIST, and OrganAMIST each class contains images that are orders of magnitude far beyond other data sets, so only 20% of these data sets are randomly sampled for model training and evaluation. The statistical information of each sub data set used in the experiment is shown in table 1, and it can be seen that the data from different sub data sets is very different, and there is a certain difference in the number of categories among the data sets. Therefore, it is very challenging to solve the task-sustained learning on the MedMNIST-v2 dataset.

TABLE 1MedMNIST-v2 data set statistics

(2) Experimental setup:

incremental setting: for the MedMNIST-v2 dataset, the training data of each increment stage is a subdata set, and the embodiment simulates various increment scenes by randomly disturbing the appearance sequence of different subdata sets. The difference between the sub data sets is large, there are many image modalities, and thus the setting is very complicated. It is worth mentioning that this embodiment is the first to do task incremental work on the dataset. To demonstrate the stability of the method proposed in this example, all experiments on MedMNIST-v2 this example was run through 5 replicates to obtain the mean and standard deviation of performance.

(3) Data enhancement: in the model training phase, data of the image is augmented by firstly adjusting the input image to 32 × 32 size, then filling 4 pixels around and randomly cutting into 32 × 32 size, then randomly horizontally turning the image with a probability of 0.5, then randomly changing the brightness of the image, finally converting into tensor and carrying out normalization. In the test phase, the input image is likewise first adjusted to a size of 32 × 32, then converted into a tensor and normalized.

Selecting super parameters: in the experiment, except for special descriptions, ResNet18 is used as a skeleton network by default, a random gradient descent (SGD) is used as an Optimizer (Optimizer), and the Batch Size (Batch Size) is 64. The training round (Epoch) was 70, the initial Learning Rate (Learning Rate) was 0.001, and the Learning Rate was reduced by 10 times at rounds 49 and 63, respectively. In the optimizer, the Weight Decay coefficient (Weight Decay) is 0.0002 and the Momentum (Momentum) is 0.9.

The comparison method comprises the following steps: the classical task continuous learning method compared with the embodiment comprises the following steps: LwF, MAS, EWC, and RWilk. For a more comprehensive comparison, the method originally used for class continuous learning: iCaRL and UCIR were also adapted to the task-sustained learning scenario for comparison. It is particularly emphasized that in all experiments, the method proposed in this example, as well as other classical methods, did not retain old samples during the continuous learning phase, whereas iCaRL and UCIR retained 20 samples per old class according to the settings of the original paper.

(4) Experimental analysis:

in order to show the performance of different models on the MedMNIST-v2 data set as much as possible, the present embodiment randomly shuffles the order of occurrence of the selected 8 sub-data sets, and experiments are performed in 3 of the orders. In order to compare all the methods fairly, the present embodiment performs hyper-parameter adjustment on all the methods with the best effort, strives for the accuracy of all the methods on the training set to be as high as possible, and screens out the optimal model by using the verification set. In addition, to compare the stability of all methods fairly, each set of experiments in this example was repeated with 5 identical random seeds, preserving the accuracy under different random seeds, and calculating the average MCR and the standard deviation of MCR for each stage.

The Joint method establishes an upper bound on model performance at different stages. Due to the inconsistency of the difficulty of different tasks, the performance of the Joint method in three different increment sequences is also inconsistent at different stages. However, after learning of 8 tasks is completed, the final performance of the Joint method is about 80% regardless of any sequence. Meanwhile, in all methods, the performance of the method provided by the embodiment is still far beyond that of the super-large part method under the condition of not storing any old class samples. In the experiment of increment sequence 1, the performance of all methods is about 93% in the first stage because no increment is involved, and finally the performance of the method provided by the embodiment is 74.80%, and the difference from the performance upper bound of Joint is only 5.55%, which is very close. In contrast, the best performance of the other methods was UCIR, which had a final performance of 65.95% by 8.85% of the method proposed in this example. The worst method is LwF, which has a final performance of only 20.33%, which means that the old knowledge is not well preserved by knowledge distillation without preserving the old class samples due to the excessive domain shift from task to task. The classical method iCaRL which also utilizes the knowledge for distillation can finally reach 24.83% of performance, which is improved by 4.5% compared with LwF. This performance improvement should be due to the partial old class samples saved. This shows that for this type of method, retaining part of the old class samples can effectively improve the performance of the model in the continuous learning process, but because the number of the stored old class samples is limited, the relief of catastrophic forgetting is also limited. After learning by the second task DE, the performance of all models is more or less degraded, since the second task DE is a relatively difficult task. The iCaRL and UCIR have not significantly degraded the performance of the first three tasks due to the preservation of part of the old samples, and there is no significant gap with the method proposed in this embodiment. While other methods such as RWalk, MAS and EWC degrade dramatically on the second task, perhaps because these tasks, although originally proposed to solve task incremental learning, do not have a large domain offset from task to task on the current MedMNIST-v2 data set, and therefore the model can also mitigate the catastrophic forgetting of the model by limiting the variation of important parameters to some extent without preserving part of the old class data. However, in the current scenario of task increment more realistic, the domain shift between tasks is too large, so the limitation on parameter change not only can not well preserve the knowledge related to the old task, but also is not beneficial to the learning of new knowledge by the model. In the increment order 1, when the model learns the task BL, the performance of all methods is substantially improved because the task is relatively simple and the number of categories is large. Therefore, the model performs well on the task, can make up performance gaps brought by performance differences on other tasks to a certain extent, and finally improves the overall performance of the model.

Referring to fig. 3, in another embodiment of the present application, a continuous learning image classification system 100 based on deep learning is provided, which includes a task continuous learning model building module 101, an incremental training module 102, and a classification module 103;

the task continuous learning model building module 101 is configured to build a task continuous learning model with task-specific batch normalization, parameters of all convolution kernels in a feature extractor of the task continuous learning model are fixed in all tasks, and when each new task is learned, parameters of a batch normalization layer BN corresponding to each convolution kernel are learned together with a task-specific classification header; the convolution kernel is a convolution kernel pre-trained in a first stage so that visually similar inputs produce similar feature vectors from the feature extractor;

the incremental training module 102 is configured to perform incremental training on a task continuous learning model, add a new task-specific batch normalization layer and a new classification head when a new task comes, and when the training is performed to the first stage, there is a layer batch normalization layer behind each convolution layer, where the task continuous learning model determines to which batch normalization layer and classification head the data flows according to the current task, that is, when the current task is greater than 1, the parameter of the convolution kernel in the feature extractor is not updated, and only the parameter of the batch normalization layer and classification head corresponding to the added new task is updated;

the classification module 103 is configured to obtain a trained task continuous learning model after incremental training is completed, and input an image task to be classified into the trained task continuous learning model to complete a classification task.

It should be noted that, the continuous learning image classification system based on deep learning of the present invention corresponds to the continuous learning image classification method based on deep learning of the present invention one-to-one, and the technical features and the beneficial effects thereof described in the embodiment of the continuous learning image classification method based on deep learning are all applicable to the embodiment of the continuous learning image classification based on deep learning, and specific contents can be referred to the description in the embodiment of the method of the present invention, which is not described herein again, and thus is stated herein.

In addition, in the implementation of the continuous learning image classification system based on deep learning of the above embodiment, the logical division of each program module is only an example, and in practical applications, the above function distribution may be performed by different program modules according to needs, for example, due to the configuration requirements of corresponding hardware or the convenience of implementation of software, that is, the internal structure of the continuous learning image classification system based on deep learning is divided into different program modules to perform all or part of the above described functions.

Referring to fig. 4, in an embodiment, an electronic device for implementing a method for continuous learning image classification based on deep learning is provided, where the electronic device 200 may include a first processor 201, a first memory 202, and a bus, and may further include a computer program, such as a continuous learning image classification program 203 based on deep learning, stored in the first memory 202 and executable on the first processor 201.

The first memory 202 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The first memory 202 may in some embodiments be an internal storage unit of the electronic device 200, such as a removable hard disk of the electronic device 200. The first memory 202 may also be an external storage device of the electronic device 200 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 200. Further, the first memory 202 may also include both an internal storage unit and an external storage device of the electronic device 200. The first memory 202 may be used to store not only application software installed in the electronic device 200 and various types of data, such as codes of the continuous learning image classification program 203 based on deep learning, but also data that has been output or is to be output temporarily.

The first processor 201 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The first processor 201 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions of the electronic device 200 and processes data by running or executing programs or modules stored in the first memory 202 and calling data stored in the first memory 202.

Fig. 4 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 4 does not constitute a limitation of the electronic device 200, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

The deep learning based continuous learning image classification program 203 stored in the first memory 202 of the electronic device 200 is a combination of a plurality of instructions, which when executed in the first processor 201, can implement:

performing incremental training on a task continuous learning model, adding a new task specific batch normalization layer and a classification head when a new task comes, when the training is in the first stage, a layer batch normalization layer is arranged behind each convolution layer, and the task continuous learning model determines which batch normalization layer and classification head the data flow to according to the current task, namely when the data flow is more than 1, not updating the parameter of the convolution kernel in the feature extractor, and only updating the parameter of the batch normalization layer and the classification head corresponding to the added new task;

Further, the modules/units integrated with the electronic device 200, if implemented in the form of software functional units and sold or used as independent products, may be stored in a non-volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The continuous learning image classification method based on deep learning is characterized by comprising the following steps of:

2. The continuous learning image classification method based on deep learning of claim 1, wherein the pre-trained convolution kernel is obtained by:

3. The continuous learning image classification method based on deep learning according to claim 1, wherein the batch normalization layer BN is defined as follows:

Represents the convolution output of a certain image,

represents the convolution output of all images in the batchThe collection of the data is carried out,

representing the output of the batch normalization layer BN, then

The batch normalization layer BN operation is defined as:

wherein the content of the first and second substances,

and

respectively represent

The mean and standard deviation of all elements,

and

。

4. The method for continuously learning and classifying images based on deep learning of claim 1, wherein the task continuous learning model is multi-headed, and the task continuous learning model in the t stage comprises a feature extractor and a classification head specific to the task.

5. The continuous learning image classification method based on deep learning of claim 1, wherein the method is applied to the first step𝑡After the stage training is finished, because the task increment knows the task label corresponding to the input data, the test sample is subjected to

Wherein the feature extractor𝑓(+ -) corresponds to a parameter of

To the sample

The prediction tags of (a) are:

wherein the content of the first and second substances,

the model in the task continuous learning is used for testing the sample

6. The method of claim 1, wherein when the task-based continuous learning model is incrementally trained, the test data for any one task is predicted based on the convolution kernel trained by the first task and the BN corresponding to each task.

7. The continuous learning image classification method based on deep learning of claim 1, wherein the task continuous learning model is used for evaluating the performance of the model at each stage through the change of the class mean recall (MCR) of the model at different stages, and the MCR is expressed as follows:

indicating the recall rate.

8. The continuous learning image classification system based on deep learning is applied to the continuous learning image classification method based on deep learning of any one of claims 1 to 7, and is characterized by comprising a task continuous learning model building module, an incremental training module and a classification module;

the incremental training module is used for performing incremental training on the task continuous learning model, a new task-specific batch normalization layer and a new classification head are added when a new task comes, when the training is carried out to the first stage, a layer batch normalization layer is arranged behind each convolution layer, the task continuous learning model determines which batch normalization layer and classification head the data flow to according to the current task label, namely when the current task label is greater than 1, the parameter of the convolution kernel in the feature extractor is not updated, and only the parameter of the batch normalization layer and the classification head corresponding to the added new task is updated;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the method of deep learning based continuous learning image classification according to any one of claims 1-7.

10. A computer-readable storage medium storing a program, wherein the program, when executed by a processor, implements the method for continuous learning image classification based on deep learning according to any one of claims 1 to 7.