CN113610106B

CN113610106B - Feature compatible learning method and device between models, electronic equipment and medium

Info

Publication number: CN113610106B
Application number: CN202110750264.9A
Authority: CN
Inventors: 段凌宇; 白燕; 吴生森
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2023-10-24
Anticipated expiration: 2041-07-01
Also published as: CN113610106A

Abstract

The application provides a feature compatible learning method and device between models, electronic equipment and a medium. The method comprises the following steps: determining feature compatible learning information of the first model and the second model according to model parameters determined by the first model, model parameters to be learned of the second model and a new training data set, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training of the second model; the feature compatible learning information and the target loss for supervising the second model are used as the final optimization target of the second model together, so that feature compatible learning between the first model and the second model is completed.

Description

Feature compatible learning method and device between models, electronic equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a feature compatible learning method and device between models, electronic equipment and a computer readable storage medium.

Background

In target re-recognition systems, deployed models need to be updated frequently in order to achieve higher performance. The new model may be trained from a larger data set, using a more advanced network architecture or loss function. Once the model is updated, features of the entire database need to be re-extracted to ensure feature compatibility. Because the database contains millions or even tens of millions of images, feature re-extraction is very time consuming and computationally expensive. Furthermore, in practical application systems, the computing resources are very limited, and users often do not have temporary GPU resources that can be used in large amounts.

Disclosure of Invention

The application aims to provide a feature compatibility learning method and device between models, electronic equipment and a computer readable storage medium.

The first aspect of the present application provides a feature compatible learning method between models, including:

determining feature compatible learning information of the first model and the second model according to model parameters determined by the first model, model parameters to be learned of the second model and a new training data set, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training of the second model;

and taking the feature compatibility learning information and target loss for supervising the second model together as a final optimization target of the second model so as to complete feature compatibility learning between the first model and the second model.

A second aspect of the present application provides an inter-model feature-compatible learning apparatus, comprising:

the determining module is used for determining feature compatible learning information of the first model and the second model according to the model parameters determined by the first model, the model parameters to be learned of the second model and a new training data set, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training of the second model;

and the feature compatibility module is used for taking the feature compatibility learning information and the target loss for supervising the second model together as the final optimization target of the second model so as to complete feature compatibility learning between the first model and the second model.

A third aspect of the present application provides an electronic apparatus, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to perform the method according to the first aspect of the application.

A fourth aspect of the application provides a computer readable medium having stored thereon computer readable instructions executable by a processor to implement the method according to the first aspect of the application.

Compared with the prior art, the feature compatible learning method between the models provided by the application has the advantages that the feature compatible learning information of the first model and the second model is determined according to the model parameters of the first model and the second model and the new training data set, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training the second model; the feature compatible learning information and the target loss for supervising the second model are used as the final optimization target of the second model together, so that feature compatible learning between the first model and the second model is completed.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic diagram of feature compatibility learning between new and old versions of models;

FIG. 2 is a flow chart of a method for feature compatibility learning between models provided by the application;

FIG. 3 shows a network fabric split schematic of the model;

FIG. 4 illustrates a representative feature migration loss schematic based on metric space;

FIG. 5 illustrates a structural regularization diagram at the level of inter-model network components;

FIG. 6 is a schematic diagram of an inter-model feature-compatible learning device according to the present application;

FIG. 7 shows a schematic diagram of an electronic device provided by the present application;

fig. 8 shows a schematic diagram of a computer readable storage medium provided by the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.

In addition, the terms "first" and "second" etc. are used to distinguish different objects and are not used to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

At present, a large-scale video data analysis system needs frequent model updating and deployment, and feature interoperability problems are faced between feature descriptors extracted by different versions of models. Every time the system is updated, all features in the database need to be re-extracted, which consumes computational and memory resources.

As shown in fig. 1, in order for features in the new space and the old space to interoperate (be compatible), feature compatibility learning between the new and old version models is required. Some existing feature interoperation methods want to learn an additional mapping model to transform the features of model a into the features of model B. However, an additional feature re-extraction procedure is still required before the new model features are compared to the database features. Furthermore, while there are works involving additional regularization strategies, using old classifiers to assist in new metric feature learning, based on such compatibility strategies, additional feature re-extraction procedures can be avoided while compatible features are obtained. However, when the classifier or the supervision loss of the old model is different from the new model, the use of the classifier-based supervision is limited, and serious performance degradation is faced. In addition, this work limits the categories of training data of the new model that need to overlap with training data of the old model, limiting its use scenario.

In view of the foregoing, embodiments of the present application provide a method and apparatus for learning feature compatibility between models, an electronic device, and a computer-readable storage medium, which are described below with reference to the accompanying drawings.

Referring to fig. 2, a flowchart of a method for learning feature compatibility between models according to some embodiments of the present application is shown, and the method may include the following steps S101 to S102:

step S101: determining feature compatible learning information of the first model and the second model according to the model parameters determined by the first model, the model parameters to be learned of the second model and the new training data set;

wherein the feature-compatible learning information comprises structural regularities representing feature migration losses and/or network components, the new training dataset being used for second model training;

in practical application, the first model may be an old model, the model parameters of which are determined, and the second model may be a new model, the model parameters of which need to be learned.

Fig. 3 is a schematic diagram illustrating a network structure splitting of a model, and as shown in fig. 3, network components of each model may include a feature extraction module and a task head module, and of course, may include only any one of the feature extraction module and the task head module. The feature extraction module is used for feature extraction, and the task head module is used for performing tasks such as classification, detection and the like according to the extracted features. The measurement space of the feature extraction module and the discrimination space of the task head module both contain a large amount of information, and can be used for feature compatibility learning.

The feature compatible learning strategy of the application comprises feature level representing feature migration loss and network component level structure regularization.

Specifically, samples belonging to the same class will be clustered together in the metric space. Thus, the application uses the sample mean feature to represent each category, and describes the whole manifold structure by using all the representative features, and realizes the alignment of the feature space by migrating the representative features of the new model and the old model. Determining a representative feature migration loss of the first model and the second model in S101 includes:

extracting the characteristics of all samples in the new training data set by using a first model to obtain first model characteristics;

extracting representative features of each category corresponding to the first model based on the first model features;

extracting the characteristics of all samples in the new training data set by using a second model to obtain second model characteristics, and calculating the similarity between each second model characteristic and each representative characteristic;

and calculating the migration loss of the representative features based on the similarity so as to realize the feature compatibility of the first model and the second model.

In the application, global regulation and control of new and old feature spaces are realized based on the representative feature migration loss of the measurement space, as shown in fig. 4, an old model is the first model in the figure, and a new model is the second model.

In fig. 4, the old model parameters and the new training data set participating in the new model training are determined. Considering that the old space is fixed and not modified with the training of the new model, the old model can be used to extract features of all samples in the new training dataset, resulting in fixed old model features (first model features).

The representative features of each class of old model features are designed to represent popular structures of the old model feature space. By passing knowledge of representative features, new model feature spaces can be embedded into old model feature spaces to achieve feature alignment explicitly.

A representative feature for each category is obtained using the mean feature, where the representative feature for each category can be expressed as:

where P (c) represents the sample set of class c, and P (c) is the number of samples of class c.Representative sample x passes the old model->Extracted features. For the C categories in the new training dataset, C representative features are obtained, which can robustly describe the entire embedding space.

The present application designs a prediction strategy based on the representative features by using the representative features of all the old models to perform global optimization. The prediction does not classify features based on additional classifiers, but rather implements distance optimization of the display based on similarity metrics. Given the features of a new model, calculating the similarity between the new model and the representative features of all old models, and predicting which category of the old model the new model belongs to according to the similarity, wherein the new model is formally described as follows:

wherein ,for the second model parameter, M _o Representing a set of features for the first model, +.> T _N For new training data sets, symbols<.,.>Cosine distance, x, representing sample characteristics ^c Sample with category label c, ++>Is a representative feature of category c. The loss can be maximized +.>And->Similarity between them while minimizing +.>Similarity to all other different classes represents features such that they tend to be 0. Thus, all old representative features can bridge between the old and new feature spaces and support global optimization well.

The application also designs the regular structure of the network component level. A network may be divided into a feature extraction module and a task header module. Taking the classifier task header as an example, the classification criteria of the feature structure are provided in the classifier, so if the features of two models can be compatible with each other, the features of one model can also meet the classification criteria of the other model. Based on the above, the application designs a module level interoperation mechanism to realize structural regularization so as to further improve compatibility.

Specifically, the network component of the first model comprises a first feature extraction module and a first task head module; the network component of the second model includes a second feature extraction module and a second task head module.

In step S101, determining structural regularities of network components of the first model and the second model includes:

determining a first supervision loss of a first reorganization network after reorganization of the first feature extraction module and the second task head module;

determining a second supervision loss of a second recombinant network after the second feature extraction module and the first task head module are recombined;

and determining an optimization target for regularization of the mutual structures of the first model and the second model according to the first supervision loss and the second supervision loss.

Specifically, as shown in fig. 5, the model is divided into two parts: the system comprises a feature extraction module and a task head module for a target task. Such task head moduleThe intrinsic structural information of the feature space is included and can be regarded as a rule of the feature space. Taking the classifier header as an example, it contains the classification rules of the features. According to this rule, the embedded features in feature space F can be mapped to class probabilities in classifier hypothesis space P>Thus, features derived from one model may also yield good predictions when passing through the classifier of the other model if features from the two models can be matched to each other. The new classifier should obtain the correct predictions based on the old model features. I.e. old model feature extraction module->And a new model task head module h _N After recombination, the recombination network can still obtain a better prediction result. Its supervised loss can be expressed as: wherein ,/>Representing the use of the first feature extraction module +.>And a second task head module h _N In the dataset T _N To perform specific target tasks, T _N Is a new training data set. The task types are not limited to classification tasks. L (L) _{CE_N} Is a supervised loss of the new model.

Further, the distribution structure of the new features should satisfy the rules of the old classifier. I.e. new model feature extraction moduleAnd old model task head module h _O After recombination, the recombination network can still obtain a better prediction result. Its supervised loss can be expressed as: /> wherein ,/>Representing the use of the second feature extraction module +.>And a first task head module h _O In the dataset T _N′ To perform specific target tasks, T _N′ A set of samples having the same class for a new training dataset as an old training dataset for a first model training. L (L) _{CE_O} Is a supervision loss of the old model.

Thus, optimization objective for mutual structure regularizationCan be expressed as:

as can be seen, network component interoperability can provide dual supervision: the old rules will guide the training of the new feature extraction module, and at the same time, the old feature extraction module will also help to formulate the feature rules of the new task head module. Here, L _{CE_O} For supervision loss of old model, L _{CE_N} Is a supervised loss of the new model. The application does not limit the loss type of the new model and the old model, and does not limit the loss type of the new model and the old model to be the same, and the loss is determined by the original new model or the old model.

Step S102: and taking the feature compatibility learning information and target loss for supervising the second model together as a final optimization target of the second model so as to complete feature compatibility learning between the first model and the second model.

Specifically, the feature-compatible learning information includes representative feature migration lossAnd/or the structural regularities of network components>The application is not limited to the simultaneous use of the representative feature migration loss and the structural regularization of the network component, and can realize feature compatibility by independently using the representative feature migration loss or the structural regularization of the network component.

Thus, the second model is the final optimization objective L _all Three situations can be included:

first case:

second case:

third case:

wherein ,L_{CE_N} Is a goal for supervising the new model and is not limited to classifying losses.

In summary, the method of the present application includes feature compatibility policies that represent feature migration and structural regularities. Wherein, representative feature migration learning includes: obtaining network and network parameters of the old model; based on the new training data set, obtaining old model features on the new training data set; extracting representative features of all old features of each category; in the training process of the new model, calculating the similarity between each new model characteristic and all the representative characteristics of the old model; and calculating the classification loss of the representative features based on the similarity, and realizing the compatibility of the new model features and the old model features. The structure regular reorganizes a new model feature extraction module and an old model feature extraction module and a task head module, and specifically comprises the following steps: the new model feature extraction module and the old model task head module are recombined, and the new model feature is constrained by using rules of the old model task module; and the new model task head module is restructured by using the characteristics of the old model. The method can realize feature compatible learning, so that new model features compatible with old model features are obtained while new model training data, supervision loss and network structure are not constrained, and features of the whole old database are not required to be extracted again after the old model is updated to the new model, thereby saving time and reducing calculation resource consumption.

In the above embodiment, a method for learning feature compatibility between models is provided, and correspondingly, the application also provides a device for learning feature compatibility between models. Referring to fig. 6, a schematic diagram of an inter-model feature-compatible learning apparatus according to some embodiments of the application is shown. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

As shown in fig. 6, the inter-model feature compatibility learning device 10 may include:

a determining module 101, configured to determine feature-compatible learning information of the first model and the second model according to the model parameters determined by the first model and model parameters to be learned by the second model, and a new training data set, where the feature-compatible learning information includes structural regularities representing feature migration loss and/or network components, and the new training data set is used for training the second model;

and the feature compatibility module 102 is used for taking the feature compatibility learning information and the target loss for supervising the second model together as the final optimization target of the second model so as to complete feature compatibility learning between the first model and the second model.

According to some embodiments of the application, the determining module 101 is specifically configured to:

calculating a representative feature migration loss based on the similarity to realize feature compatibility of the first model and the second model;

the expression representing the characteristic migration loss is as follows:

wherein ,for the second model parameter, M _o Representing a set of features for the first model, +.> T _N For new training data sets, symbols<.,.>Cosine distance, x, representing sample characteristics ^c Sample with category label c, ++>Is a representative feature of category c.

According to some embodiments of the application, the network component of the first model includes a first feature extraction module and a first task head module; the network component of the second model includes a second feature extraction module and a second task head module.

determining an optimization target for regularization of the mutual structures of the first model and the second model according to the first supervision loss and the second supervision loss;

the expression of the optimization objective is as follows:

wherein ,L_{CE_O} For the first supervision loss, L _{CE_N} Is a second supervision loss;

representing the use of the first feature extraction module +.>And a second task head module h _N In the dataset T _N To perform specific target tasks, T _N Is a new training data set;

representing the use of the second feature extraction module +.>And a first task head module h _O In the dataset T _N′ To perform specific target tasks, T _N′ For a set of samples having the same class in a new training dataset as in an old training dataset, the old training datasetThe training dataset is used for a first model training.

The feature compatible learning device 10 between models provided in the embodiment of the present application has the same beneficial effects as the feature compatible learning method between models provided in the foregoing embodiment of the present application due to the same inventive concept.

The embodiment of the application also provides electronic equipment, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer and the like, corresponding to the feature compatible learning method between models provided by the previous embodiment, so as to execute the feature compatible learning method between models.

Referring to fig. 7, a schematic diagram of an electronic device according to some embodiments of the present application is shown. As shown in fig. 7, the electronic device 20 includes: a processor 200, a memory 201, a bus 202 and a communication interface 203, the processor 200, the communication interface 203 and the memory 201 being connected by the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the inter-model feature compatibility learning method provided in any of the foregoing embodiments of the present application when executing the computer program.

The feature compatible learning method between the electronic equipment provided by the embodiment of the application and the model provided by the embodiment of the application has the same beneficial effects as the method adopted, operated or realized by the same inventive concept.

The embodiment of the present application further provides a computer readable storage medium corresponding to the inter-model feature compatible learning method provided in the foregoing embodiment, referring to fig. 8, the computer readable storage medium is shown as an optical disc 30, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the inter-model feature compatible learning method provided in any of the foregoing embodiments.

It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.

The computer readable storage medium provided by the above embodiment of the present application and the feature compatible learning method between models provided by the embodiment of the present application have the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.

Claims

1. A method of feature-compatible learning between models for object re-recognition, comprising:

determining feature compatible learning information of the first model and the second model according to model parameters determined by the first model, model parameters to be learned by the second model and a data set of a new training target image, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training the second model;

the determining representative feature migration loss of the first model and the second model comprises:

the expression representing the characteristic migration loss is as follows:

wherein ,for the second model parameter, M _o Representing a set of features for the first model, +.> T _N For new training data sets, symbols<.,.>Cosine distance, x, representing sample characteristics ^c Sample with category label c, ++>Is a representative feature of category c;

the determining structural regularization of the network components of the first model and the second model includes:

the expression of the optimization objective is as follows:

representing the use of the second feature extraction module +.>And a first task head module h _O In the dataset T _N′ To perform specific target tasks, T _N′ Sample sets of the same class for a new training dataset and an old training dataset, the old training dataset being used for a first model training;

taking the feature compatible learning information and target loss for supervising the second model as a final optimization target of the second model together so as to complete feature compatible learning between the first model and the second model;

the network component of the first model comprises a first feature extraction module and a first task head module; the network component of the second model includes a second feature extraction module and a second task head module.

2. A model-to-model feature-compatible learning device for object re-recognition, comprising:

the determining module is used for determining feature compatible learning information of the first model and the second model according to the model parameters determined by the first model, the model parameters to be learned by the second model and a data set of a new training target image, wherein the feature compatible learning information comprises structural regularities representing feature migration loss and/or network components, and the new training data set is used for training the second model;

the determining module is specifically configured to:

the expression representing the characteristic migration loss is as follows:

the determining module is specifically configured to:

the expression of the optimization objective is as follows:

the feature compatibility module is used for taking the feature compatibility learning information and target loss for supervising the second model together as a final optimization target of the second model so as to complete feature compatibility learning between the first model and the second model;

3. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor executes to implement the method as claimed in claim 1 when running the computer program.

4. A computer readable storage medium having stored thereon computer readable instructions executable by a processor to implement the method as recited in claim 1.