CN114743041B

CN114743041B - Construction method and device of pre-training model decimation frame

Info

Publication number: CN114743041B
Application number: CN202210225051.9A
Authority: CN
Inventors: 张兆翔; 常清; 彭君然
Original assignee: Institute of Automation of Chinese Academy of Science; Huawei Cloud Computing Technologies Co Ltd
Current assignee: Institute of Automation of Chinese Academy of Science; Huawei Cloud Computing Technologies Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-01-03
Anticipated expiration: 2042-03-09
Also published as: CN114743041A

Abstract

The invention provides a construction method and a device of a pre-training model decimation frame, wherein the method comprises the following steps: selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained based on a self-supervision comparison learning frame, and calculating the similarity between the first model and a trained super-network pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The method can realize high-efficiency customized extraction of the downstream tasks, and the extracted model has excellent generalization capability.

Description

Construction method and device of pre-training model decimation frame

Technical Field

The invention relates to the field of contrast self-supervision learning, in particular to a construction method and a device of a pre-training model decimation frame.

Background

The model self-supervision pre-training is an important and challenging computer vision task, can be realized by extracting the self-supervision pre-training model in a corresponding extraction frame, and has wide application in the fields with labeled data, such as medical image diagnosis, image segmentation and the like.

In each application scene, model self-supervision pre-training is carried out based on the existing extraction framework, the labeling order of magnitude similar to a COCO data set is achieved, the cost is too high and difficult to realize, only a small amount of low-cost labeling data can be generally obtained, the model training is very difficult due to limited data, and the pre-trained model is generally required to be selected to finely adjust the weight in a downstream data set. Meanwhile, in various application scenarios, the difference of usable hardware resources is very obvious, the models capable of being deployed are different, separate pre-training is required for different models, the reusability of the models is very poor, and the hardware resources are very wasted.

Disclosure of Invention

The invention provides a construction method and a device of various pre-training model decimation frames, which are used for solving the defects that the model convergence of the existing contrast learning self-supervision method is slow and the selection of a downstream model consumes too many resources in the prior art, and can improve the convergence speed of the model, reduce the resource waste and improve the reusability of the model.

In a first aspect, the invention provides a method for constructing a pre-trained model decimation frame, which comprises the following steps: selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained on the basis of the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model on the basis of the downstream migration task and the downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.

Further, training the constructed hyper-network pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained hyper-network pre-training model, comprising: inputting the image data set into the self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model; and training the constructed ultra-net pre-training model based on the loss function to obtain a trained ultra-net pre-training model.

Further, the step of inputting the image data set into the self-supervision comparison learning framework for calculation to obtain a loss function of the constructed hyper-network pre-training model comprises the following steps: in each training turn, dividing the picture data in the picture data set according to a preset number of batches, and respectively performing data enhancement twice on each picture data of each batch to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch; extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting the sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure; and respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervision and comparison learning frame for calculation to obtain the loss function.

Further, the data enhancement is performed twice on each batch of picture data respectively to obtain two sets of data-enhanced picture data corresponding to each batch of picture data, including: and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch.

Further, the calculating the similarity between the first model and the trained super-net pre-training model based on the downstream migration task and the downstream migration data set comprises the steps of screening a first model intermediate feature similarity graph and a trained super-net pre-training model intermediate feature similarity graph obtained by reasoning the first model and the trained super-net pre-training model in the downstream migration data set based on the downstream migration task; and calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.

Further, the step of determining the target pre-training model sharing weight with the trained super-net pre-training model based on the calculation result of the similarity comprises the step of taking the first model with the maximum similarity with the trained super-net pre-training model as the target pre-training model.

In a second aspect, the present invention also provides, for example, an apparatus for constructing a pre-trained model decimation frame, including: the first selection module is used for selecting an image data set and a self-supervision comparison learning frame; the training module is used for training the constructed super-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained super-network pre-training model; the second selection module is used for selecting a downstream migration task and a downstream migration data set; the calculation module is used for screening a first model meeting a preset condition in a sampling space acquired based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model based on the downstream migration task and the downstream migration data set; and the determining module is used for determining a target pre-training model sharing weight with the trained hyper-network pre-training model based on the calculation result of the similarity so as to obtain a pre-training model decimation frame.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the method for constructing a pre-trained model decimation framework according to the first aspect.

In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for constructing a pre-trained model decimation frame according to the first aspect.

In a fifth aspect, the embodiment of the present invention further provides a computer program product, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the steps of the method for constructing a pre-trained model decimation framework according to the first aspect.

The construction method and the device of the pre-training model decimation frame provided by the invention select an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained based on an automatic supervision comparison learning frame, and calculating the similarity between the first model and the trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The efficient downstream task customized extraction can be realized by determining the pre-training model decimation frame obtained by the target pre-training model, the extracted model has excellent generalization capability, can better adapt to the downstream task, and the model extraction can be carried out without any downstream supervision training. In an actual application scenario, when the downstream task has more tag data, compared with the conventional method that gradient updating training needs to be performed on the downstream task by consuming a large amount of GPU resources when model selection is performed, and excessive resources are consumed, the method only needs to perform reasoning on the downstream data set by the model, and can select the optimal model without any GPU hardware requirement.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for constructing a pre-trained model decimation frame according to the present invention;

FIG. 2 is a schematic flow chart illustrating an embodiment of a method for obtaining a trained hyper-network pre-training model according to the present invention;

FIG. 3 is a flowchart illustrating an embodiment of a method for obtaining a loss function according to the present invention;

FIG. 4 is a schematic diagram of a sampling space provided by the present invention;

FIG. 5 is a schematic flow chart illustrating a similarity calculation method according to an embodiment of the present invention;

FIG. 6 is a conduit diagram for constructing a pre-trained model decimation framework provided by the present invention;

FIG. 7 is a flow chart for constructing a pre-trained model decimation framework provided by the present invention;

FIG. 8 is a schematic structural component diagram of an embodiment of a device for constructing a pre-trained model decimation frame provided by the present invention;

fig. 9 illustrates a physical structure diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of an embodiment of a method for constructing a pre-trained model decimation frame according to the present invention. As shown in fig. 1, the method for constructing the pre-trained model decimation frame may include the following steps:

s101, selecting an image data set and an automatic supervision contrast learning framework.

In step S101, the image dataset may be ImageNet or COCO, which is not limited in the embodiment of the present invention. The self-supervision contrast learning framework may be a MoCo framework, a SimCLR framework, a BYOL framework, or a Dino framework, which is not limited in the embodiment of the present invention.

And S102, training the constructed ultra-net pre-training model according to the image data set and the self-supervision contrast learning framework to obtain the trained ultra-net pre-training model.

S103, selecting a downstream migration task and a downstream migration data set.

In step S103, the migration downstream task may be picture classification, or may be picture detection, or may also be picture segmentation, which is not limited in this embodiment of the present invention. The downstream migration data set may be a data set used for deployment of a final task of a model such as PASCAL VOC, COCO, or Pets, which is not limited in the embodiment of the present invention.

S104, screening a first model meeting preset conditions in a sampling space obtained based on the self-supervision comparison learning frame, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set.

In step S104, the preset condition may be that FLOPs, a parameter amount, or a maximum memory limit are specified in advance, which is not limited in the embodiment of the present invention.

And S105, determining a target pre-training model sharing weight with the trained ultra-net pre-training model based on the calculation result of the similarity, and obtaining a pre-training model decimation frame.

In step S105, the first model with the maximum similarity to the trained supernet pre-training model may be used as a target pre-training model, and the process of obtaining the target pre-training model is a process of constructing a preferred frame of the pre-training model.

The construction method of the pre-training model decimation frame provided by the embodiment of the invention comprises the steps of selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The efficient downstream task customized extraction can be realized by determining the pre-training model decimation frame obtained by the target pre-training model, the extracted model has excellent generalization capability, can better adapt to the downstream task, and can be extracted without any downstream supervision training. In an actual application scenario, when the downstream task has more tag data, compared with the conventional method that gradient updating training needs to be performed on the downstream task by consuming a large amount of GPU resources when model selection is performed, and excessive resources are consumed, the method only needs to perform reasoning on the downstream data set by the model, and can select the optimal model without any GPU hardware requirement.

Fig. 2 is a schematic flow chart of an embodiment of a method for obtaining a trained supernet pre-training model according to the present invention. As shown in fig. 2, the method for obtaining the trained hyper-net pre-training model may include the following steps:

s201, inputting the image data set into a self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model.

S202, training the constructed ultra-net pre-training model based on the loss function to obtain the trained ultra-net pre-training model.

In step S201 and step S202, a loss function may be reduced according to a back propagation algorithm and an SGD optimizer to train the constructed hyper-network pre-training model, and the trained hyper-network pre-training model is obtained through multiple iterative training.

The method for acquiring the trained ultra-net pre-training model can complete a tailorable dynamic ultra-net in one-time training through acquisition of the loss function and design of the dynamic network structure, and can extract pre-training subnets meeting various operation requirements.

Fig. 3 is a flowchart illustrating an embodiment of a method for obtaining a loss function according to the present invention.

As shown in fig. 3, the method for obtaining the loss function may include the following steps:

s301, in each training turn, dividing the picture data in the picture data set according to a preset number of batches, and respectively performing data enhancement twice on each picture data of each batch to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch.

In step S301, the preset number may be 4 times, or may be 16 times, or may also be 48 times, which is not limited in the embodiment of the present invention. Assuming that 256 pieces of picture data are in total in the picture data set, and the preset number is 4 times, 64 pieces of picture data exist in each batch, and data enhancement is performed on each piece of picture data of the 64 pieces of picture data in each batch twice, so that two sets of picture data corresponding to the 64 pieces of picture data in each batch can be obtained, that is, each batch has two sets of picture data, and each set of picture data includes 64 pieces of picture data.

S302, extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting a sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure.

In step S302, the residual neural network serves as a feature extraction backbone network of the self-supervised contrast learning framework. The sampling space when the residual neural network is used as the feature extraction backbone network can be as shown in fig. 4, and the sampling space defines the variation range of the depth and width of the model, so that the whole possible model space is determined.

The self-supervision contrast learning framework comprises a gradient updating branch network and a non-gradient updating branch network, and the gradient updating branch network is modified according to the randomly selected model, so that the modified gradient updating branch network and the selected model have the same structure.

And S303, respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervision and comparison learning frame for calculation to obtain a loss function.

In step S303, the two sets of data-enhanced picture data corresponding to each batch of picture data are respectively input to the modified gradient update branch network and the non-gradient update branch network in the supervised contrast learning framework, so as to calculate a loss function of each branch, and count the two loss functions as L.

In the method for obtaining a loss function provided by the embodiment of the invention, in each iteration round in training, the network structure of the gradient updating branch can be randomly sampled in the model space, and the non-gradient updating branch is always kept as the maximum network structure in the model space of the gradient updating branch, so that the loss function which is beneficial to convergence of the constructed super-network pre-training model can be obtained.

In some optional embodiments, performing data enhancement on each piece of picture data of each batch twice respectively to obtain two sets of data-enhanced picture data corresponding to each piece of picture data of each batch may include: and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch. Wherein the scaling of each picture data is a random multi-scale scaling.

Fig. 5 is a schematic flow chart of an embodiment of a similarity calculation method according to the present invention. As shown in fig. 5, the similarity calculation method may include the steps of:

s501, based on a downstream migration task, a first model intermediate feature similarity graph and a trained ultra-net pre-training model intermediate feature similarity graph are obtained by screening the first model and the trained ultra-net pre-training model and carrying out reasoning in a downstream migration data set.

In step S501, the intermediate feature similarity graph may be a feature graph finally output by the classification task selection model, a feature graph output by each stage of the backbone network is detected and selected, and a feature graph output by the last stage of the backbone network is segmented and selected.

And reasoning the first model and the trained super-net pre-training model in a downstream migration data set to obtain a corresponding intermediate feature map set, and screening the intermediate feature similar map of the first model and the intermediate feature similar map of the trained super-net pre-training model in the obtained intermediate feature map set based on a downstream migration task.

And S502, carrying out similarity calculation on the intermediate feature similarity graph of the first model and the intermediate feature similarity graph of the trained super-net pre-training model to obtain the similarity of the first model and the trained super-net pre-training model.

In step S502, the similarity between the first model and the trained supernet pre-training model is determined by calculating the similarity between the first model intermediate feature similarity map and the trained supernet pre-training model intermediate feature similarity map.

Figure 6 is a conduit diagram for constructing a pre-trained model decimation framework provided by the present invention. As shown in fig. 6, the step of constructing the pre-trained model decimation framework may include: performing data enhancement on input pictures twice to obtain two groups of data enhanced pictures, wherein one group of pictures is input into a dynamic ultra-net, the other group of pictures is input into a fixed network structure to obtain two loss functions, the two loss functions are uniformly recorded as loss functions, and the constructed ultra-net pre-training model is trained on the basis of the loss functions to obtain a trained ultra-net pre-training model; based on downstream tasks such as picture classification and picture segmentation, an intermediate feature picture of a model screened from a sampling space and an intermediate feature picture of a trained super-network pre-training model are obtained, similarity calculation is performed on the intermediate feature picture and the trained super-network pre-training model, a target pre-training model can be obtained, and construction of a pre-training model decimation frame is completed.

Figure 7 is a flow chart for constructing a pre-trained model decimation framework provided by the invention. As shown in fig. 7, the step of constructing the pre-trained model decimation framework may include: selecting an image data set and an automatic supervision algorithm frame; determining a sampling space of the constructed hyper-network pre-training model; preprocessing the picture data set; randomly sampling a model in a sampling space, and modifying a gradient branch network of an auto-supervision algorithm frame; inputting the preprocessed picture data set into a modified gradient branch network and a non-gradient branch network of an automatic supervision algorithm frame for forward propagation, and calculating to obtain a loss function; training the constructed hyper-network pre-training model based on the loss function; screening a model meeting a defined condition in a sampling space, and determining a downstream migration task and a downstream migration data set; reasoning the trained super-net pre-training model and the screened model in a downstream migration data set; selecting an intermediate characteristic diagram of the trained ultra-net pre-training model and an intermediate characteristic diagram of the screened model obtained by inference according to a downstream migration task; and performing similarity calculation based on the two models, and screening out an optimal model, namely a target pre-training model.

Fig. 8 is a schematic structural component diagram of an embodiment of a construction apparatus for a pre-trained model decimation frame provided by the present invention. As shown in fig. 8, the apparatus for constructing a pre-trained model decimation frame includes:

a first selection module 801, configured to select an image data set and a self-supervision comparison learning framework;

the training module 802 is used for training the constructed super-network pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained super-network pre-training model;

a second selecting module 803, configured to select a downstream migration task and a downstream migration data set;

a calculating module 804, configured to screen a first model meeting a preset condition in a sampling space obtained based on a self-supervision comparison learning framework, and calculate a similarity between the first model and a trained supernet pre-training model based on a downstream migration task and a downstream migration data set;

and the determining module 805 is configured to determine, based on the calculation result of the similarity, a target pre-training model sharing the weight with the trained hyper-network pre-training model to obtain a pre-training model decimation frame.

Optionally, the training module 802 includes:

the first calculation unit is used for inputting the image data set into a self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model;

and the training unit is used for training the constructed ultra-net pre-training model based on the loss function to obtain the trained ultra-net pre-training model.

Optionally, the first computing unit includes:

the enhancement unit is used for dividing the picture data in the picture data set according to a preset number of batches in each training turn, and performing data enhancement on each picture data of each batch twice respectively to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch;

the modification subunit is used for extracting a backbone network based on the characteristics of the self-supervision contrast learning frame, setting a sampling space, randomly selecting a model structure in the sampling space, and modifying the gradient updating branch network in the supervision contrast learning frame based on the selected model structure;

and the calculating subunit is used for respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervised contrast learning framework for calculation to obtain the loss function.

Optionally, the enhancement unit is further configured to perform scaling, flipping, color conversion, and cropping on each piece of picture data of each batch respectively twice, so as to obtain two sets of data-enhanced picture data corresponding to each piece of picture data of each batch.

Optionally, the calculation module 804 includes:

the screening unit is used for screening a first model intermediate characteristic similar graph and a trained ultra-net pre-training model intermediate characteristic similar graph, which are obtained by reasoning the first model and the trained ultra-net pre-training model in a downstream migration data set, based on a downstream migration task;

and the second calculation unit is used for calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.

Optionally, the determining module 805 includes:

and the determining unit is used for taking the first model with the maximum similarity with the trained hyper-network pre-training model as a target pre-training model.

Fig. 9 illustrates a schematic physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor) 901, a communication interface (communication interface) 902, a memory (memory) 903 and a communication bus 904, wherein the processor 901, the communication interface 902 and the memory 903 are communicated with each other through the communication bus 904. The processor 901 may call logic instructions in the memory 903 to perform the following method of building a pre-trained model decimation framework:

selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.

In addition, the logic instructions in the memory 903 may be implemented in a software functional unit and stored in a computer readable storage medium when the logic instructions are sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for constructing a pre-training model decimation frame provided in the above embodiments:

In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the method for constructing a pre-trained model decimation frame provided in the above embodiments:

The above-described embodiments of the apparatus are merely illustrative, wherein the modules illustrated as separate components may or may not be separate, and the components shown as modules may or may not be second modules, may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for constructing a pre-trained model decimation frame, comprising:

selecting an image data set and a self-supervision comparison learning frame;

training the constructed ultra-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-network pre-training model;

selecting a downstream migration task and a downstream migration data set;

screening a first model meeting preset conditions in a sampling space obtained based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-net pre-training model based on the downstream migration task and the downstream migration data set;

determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame;

training the constructed super-net pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained super-net pre-training model, and the method comprises the following steps:

inputting the image data set into the self-supervision contrast learning framework for calculation to obtain a loss function of the constructed super-network pre-training model, dividing the picture data in the image data set according to a preset number of batches in each training turn, and respectively performing data enhancement on each picture data of each batch twice to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch;

extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting the sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure;

inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervised contrast learning framework respectively for calculation to obtain the loss function;

calculating the similarity of the first model and the trained hyper-network pre-training model based on the downstream migration task and the downstream migration dataset, including:

based on the downstream migration task, screening a first model intermediate feature similarity graph and a trained ultra-net pre-training model intermediate feature similarity graph, which are obtained by reasoning the first model and the trained ultra-net pre-training model in the downstream migration data set;

and calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.

2. The method for constructing the pre-trained model decimation frame according to claim 1, wherein the constructed hyper-network pre-training model is trained according to the image data set and the self-supervised contrast learning frame to obtain a trained hyper-network pre-training model, further comprising:

and training the constructed ultra-net pre-training model based on the loss function to obtain a trained ultra-net pre-training model.

3. The method for constructing a pre-trained model decimation frame according to claim 2, wherein the data enhancement is performed twice on each batch of picture data respectively to obtain two sets of data enhanced picture data corresponding to each batch of picture data, and the method comprises:

and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch.

4. The method for constructing a pre-trained model decimation framework according to claim 1, wherein said determining a target pre-trained model sharing weights with said trained hyper-net pre-trained model based on the calculation result of said similarity comprises:

and taking the first model with the maximum similarity with the trained hyper-network pre-training model as the target pre-training model.

5. An apparatus for constructing a pre-trained model decimation frame, comprising:

the first selection module is used for selecting an image data set and a self-supervision comparison learning frame;

the training module is used for training the constructed super-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained super-network pre-training model;

the second selection module is used for selecting a downstream migration task and a downstream migration data set;

the calculation module is used for screening a first model meeting a preset condition in a sampling space acquired based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model based on the downstream migration task and the downstream migration data set;

the determining module is used for determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame;

training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model, and the method comprises the following steps:

calculating the similarity of the first model and the trained hyper-network pre-training model based on the downstream migration task and the downstream migration data set, including:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the method of construction of a pre-trained model decimation framework as claimed in any of the claims 1 to 4.

7. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for constructing a pre-trained model decimation framework as defined in any of the claims 1 to 4.