CN114743041B - Construction method and device of pre-training model decimation frame - Google Patents

Construction method and device of pre-training model decimation frame Download PDF

Info

Publication number
CN114743041B
CN114743041B CN202210225051.9A CN202210225051A CN114743041B CN 114743041 B CN114743041 B CN 114743041B CN 202210225051 A CN202210225051 A CN 202210225051A CN 114743041 B CN114743041 B CN 114743041B
Authority
CN
China
Prior art keywords
model
trained
training model
training
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210225051.9A
Other languages
Chinese (zh)
Other versions
CN114743041A (en
Inventor
张兆翔
常清
彭君然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, Huawei Cloud Computing Technologies Co Ltd filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202210225051.9A priority Critical patent/CN114743041B/en
Publication of CN114743041A publication Critical patent/CN114743041A/en
Application granted granted Critical
Publication of CN114743041B publication Critical patent/CN114743041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a construction method and a device of a pre-training model decimation frame, wherein the method comprises the following steps: selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained based on a self-supervision comparison learning frame, and calculating the similarity between the first model and a trained super-network pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The method can realize high-efficiency customized extraction of the downstream tasks, and the extracted model has excellent generalization capability.

Description

Construction method and device of pre-training model decimation frame
Technical Field
The invention relates to the field of contrast self-supervision learning, in particular to a construction method and a device of a pre-training model decimation frame.
Background
The model self-supervision pre-training is an important and challenging computer vision task, can be realized by extracting the self-supervision pre-training model in a corresponding extraction frame, and has wide application in the fields with labeled data, such as medical image diagnosis, image segmentation and the like.
In each application scene, model self-supervision pre-training is carried out based on the existing extraction framework, the labeling order of magnitude similar to a COCO data set is achieved, the cost is too high and difficult to realize, only a small amount of low-cost labeling data can be generally obtained, the model training is very difficult due to limited data, and the pre-trained model is generally required to be selected to finely adjust the weight in a downstream data set. Meanwhile, in various application scenarios, the difference of usable hardware resources is very obvious, the models capable of being deployed are different, separate pre-training is required for different models, the reusability of the models is very poor, and the hardware resources are very wasted.
Disclosure of Invention
The invention provides a construction method and a device of various pre-training model decimation frames, which are used for solving the defects that the model convergence of the existing contrast learning self-supervision method is slow and the selection of a downstream model consumes too many resources in the prior art, and can improve the convergence speed of the model, reduce the resource waste and improve the reusability of the model.
In a first aspect, the invention provides a method for constructing a pre-trained model decimation frame, which comprises the following steps: selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained on the basis of the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model on the basis of the downstream migration task and the downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.
Further, training the constructed hyper-network pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained hyper-network pre-training model, comprising: inputting the image data set into the self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model; and training the constructed ultra-net pre-training model based on the loss function to obtain a trained ultra-net pre-training model.
Further, the step of inputting the image data set into the self-supervision comparison learning framework for calculation to obtain a loss function of the constructed hyper-network pre-training model comprises the following steps: in each training turn, dividing the picture data in the picture data set according to a preset number of batches, and respectively performing data enhancement twice on each picture data of each batch to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch; extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting the sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure; and respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervision and comparison learning frame for calculation to obtain the loss function.
Further, the data enhancement is performed twice on each batch of picture data respectively to obtain two sets of data-enhanced picture data corresponding to each batch of picture data, including: and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch.
Further, the calculating the similarity between the first model and the trained super-net pre-training model based on the downstream migration task and the downstream migration data set comprises the steps of screening a first model intermediate feature similarity graph and a trained super-net pre-training model intermediate feature similarity graph obtained by reasoning the first model and the trained super-net pre-training model in the downstream migration data set based on the downstream migration task; and calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.
Further, the step of determining the target pre-training model sharing weight with the trained super-net pre-training model based on the calculation result of the similarity comprises the step of taking the first model with the maximum similarity with the trained super-net pre-training model as the target pre-training model.
In a second aspect, the present invention also provides, for example, an apparatus for constructing a pre-trained model decimation frame, including: the first selection module is used for selecting an image data set and a self-supervision comparison learning frame; the training module is used for training the constructed super-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained super-network pre-training model; the second selection module is used for selecting a downstream migration task and a downstream migration data set; the calculation module is used for screening a first model meeting a preset condition in a sampling space acquired based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model based on the downstream migration task and the downstream migration data set; and the determining module is used for determining a target pre-training model sharing weight with the trained hyper-network pre-training model based on the calculation result of the similarity so as to obtain a pre-training model decimation frame.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the method for constructing a pre-trained model decimation framework according to the first aspect.
In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for constructing a pre-trained model decimation frame according to the first aspect.
In a fifth aspect, the embodiment of the present invention further provides a computer program product, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the steps of the method for constructing a pre-trained model decimation framework according to the first aspect.
The construction method and the device of the pre-training model decimation frame provided by the invention select an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space obtained based on an automatic supervision comparison learning frame, and calculating the similarity between the first model and the trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The efficient downstream task customized extraction can be realized by determining the pre-training model decimation frame obtained by the target pre-training model, the extracted model has excellent generalization capability, can better adapt to the downstream task, and the model extraction can be carried out without any downstream supervision training. In an actual application scenario, when the downstream task has more tag data, compared with the conventional method that gradient updating training needs to be performed on the downstream task by consuming a large amount of GPU resources when model selection is performed, and excessive resources are consumed, the method only needs to perform reasoning on the downstream data set by the model, and can select the optimal model without any GPU hardware requirement.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for constructing a pre-trained model decimation frame according to the present invention;
FIG. 2 is a schematic flow chart illustrating an embodiment of a method for obtaining a trained hyper-network pre-training model according to the present invention;
FIG. 3 is a flowchart illustrating an embodiment of a method for obtaining a loss function according to the present invention;
FIG. 4 is a schematic diagram of a sampling space provided by the present invention;
FIG. 5 is a schematic flow chart illustrating a similarity calculation method according to an embodiment of the present invention;
FIG. 6 is a conduit diagram for constructing a pre-trained model decimation framework provided by the present invention;
FIG. 7 is a flow chart for constructing a pre-trained model decimation framework provided by the present invention;
FIG. 8 is a schematic structural component diagram of an embodiment of a device for constructing a pre-trained model decimation frame provided by the present invention;
fig. 9 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of an embodiment of a method for constructing a pre-trained model decimation frame according to the present invention. As shown in fig. 1, the method for constructing the pre-trained model decimation frame may include the following steps:
s101, selecting an image data set and an automatic supervision contrast learning framework.
In step S101, the image dataset may be ImageNet or COCO, which is not limited in the embodiment of the present invention. The self-supervision contrast learning framework may be a MoCo framework, a SimCLR framework, a BYOL framework, or a Dino framework, which is not limited in the embodiment of the present invention.
And S102, training the constructed ultra-net pre-training model according to the image data set and the self-supervision contrast learning framework to obtain the trained ultra-net pre-training model.
S103, selecting a downstream migration task and a downstream migration data set.
In step S103, the migration downstream task may be picture classification, or may be picture detection, or may also be picture segmentation, which is not limited in this embodiment of the present invention. The downstream migration data set may be a data set used for deployment of a final task of a model such as PASCAL VOC, COCO, or Pets, which is not limited in the embodiment of the present invention.
S104, screening a first model meeting preset conditions in a sampling space obtained based on the self-supervision comparison learning frame, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set.
In step S104, the preset condition may be that FLOPs, a parameter amount, or a maximum memory limit are specified in advance, which is not limited in the embodiment of the present invention.
And S105, determining a target pre-training model sharing weight with the trained ultra-net pre-training model based on the calculation result of the similarity, and obtaining a pre-training model decimation frame.
In step S105, the first model with the maximum similarity to the trained supernet pre-training model may be used as a target pre-training model, and the process of obtaining the target pre-training model is a process of constructing a preferred frame of the pre-training model.
The construction method of the pre-training model decimation frame provided by the embodiment of the invention comprises the steps of selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame. The efficient downstream task customized extraction can be realized by determining the pre-training model decimation frame obtained by the target pre-training model, the extracted model has excellent generalization capability, can better adapt to the downstream task, and can be extracted without any downstream supervision training. In an actual application scenario, when the downstream task has more tag data, compared with the conventional method that gradient updating training needs to be performed on the downstream task by consuming a large amount of GPU resources when model selection is performed, and excessive resources are consumed, the method only needs to perform reasoning on the downstream data set by the model, and can select the optimal model without any GPU hardware requirement.
Fig. 2 is a schematic flow chart of an embodiment of a method for obtaining a trained supernet pre-training model according to the present invention. As shown in fig. 2, the method for obtaining the trained hyper-net pre-training model may include the following steps:
s201, inputting the image data set into a self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model.
S202, training the constructed ultra-net pre-training model based on the loss function to obtain the trained ultra-net pre-training model.
In step S201 and step S202, a loss function may be reduced according to a back propagation algorithm and an SGD optimizer to train the constructed hyper-network pre-training model, and the trained hyper-network pre-training model is obtained through multiple iterative training.
The method for acquiring the trained ultra-net pre-training model can complete a tailorable dynamic ultra-net in one-time training through acquisition of the loss function and design of the dynamic network structure, and can extract pre-training subnets meeting various operation requirements.
Fig. 3 is a flowchart illustrating an embodiment of a method for obtaining a loss function according to the present invention.
As shown in fig. 3, the method for obtaining the loss function may include the following steps:
s301, in each training turn, dividing the picture data in the picture data set according to a preset number of batches, and respectively performing data enhancement twice on each picture data of each batch to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch.
In step S301, the preset number may be 4 times, or may be 16 times, or may also be 48 times, which is not limited in the embodiment of the present invention. Assuming that 256 pieces of picture data are in total in the picture data set, and the preset number is 4 times, 64 pieces of picture data exist in each batch, and data enhancement is performed on each piece of picture data of the 64 pieces of picture data in each batch twice, so that two sets of picture data corresponding to the 64 pieces of picture data in each batch can be obtained, that is, each batch has two sets of picture data, and each set of picture data includes 64 pieces of picture data.
S302, extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting a sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure.
In step S302, the residual neural network serves as a feature extraction backbone network of the self-supervised contrast learning framework. The sampling space when the residual neural network is used as the feature extraction backbone network can be as shown in fig. 4, and the sampling space defines the variation range of the depth and width of the model, so that the whole possible model space is determined.
The self-supervision contrast learning framework comprises a gradient updating branch network and a non-gradient updating branch network, and the gradient updating branch network is modified according to the randomly selected model, so that the modified gradient updating branch network and the selected model have the same structure.
And S303, respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervision and comparison learning frame for calculation to obtain a loss function.
In step S303, the two sets of data-enhanced picture data corresponding to each batch of picture data are respectively input to the modified gradient update branch network and the non-gradient update branch network in the supervised contrast learning framework, so as to calculate a loss function of each branch, and count the two loss functions as L.
In the method for obtaining a loss function provided by the embodiment of the invention, in each iteration round in training, the network structure of the gradient updating branch can be randomly sampled in the model space, and the non-gradient updating branch is always kept as the maximum network structure in the model space of the gradient updating branch, so that the loss function which is beneficial to convergence of the constructed super-network pre-training model can be obtained.
In some optional embodiments, performing data enhancement on each piece of picture data of each batch twice respectively to obtain two sets of data-enhanced picture data corresponding to each piece of picture data of each batch may include: and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch. Wherein the scaling of each picture data is a random multi-scale scaling.
Fig. 5 is a schematic flow chart of an embodiment of a similarity calculation method according to the present invention. As shown in fig. 5, the similarity calculation method may include the steps of:
s501, based on a downstream migration task, a first model intermediate feature similarity graph and a trained ultra-net pre-training model intermediate feature similarity graph are obtained by screening the first model and the trained ultra-net pre-training model and carrying out reasoning in a downstream migration data set.
In step S501, the intermediate feature similarity graph may be a feature graph finally output by the classification task selection model, a feature graph output by each stage of the backbone network is detected and selected, and a feature graph output by the last stage of the backbone network is segmented and selected.
And reasoning the first model and the trained super-net pre-training model in a downstream migration data set to obtain a corresponding intermediate feature map set, and screening the intermediate feature similar map of the first model and the intermediate feature similar map of the trained super-net pre-training model in the obtained intermediate feature map set based on a downstream migration task.
And S502, carrying out similarity calculation on the intermediate feature similarity graph of the first model and the intermediate feature similarity graph of the trained super-net pre-training model to obtain the similarity of the first model and the trained super-net pre-training model.
In step S502, the similarity between the first model and the trained supernet pre-training model is determined by calculating the similarity between the first model intermediate feature similarity map and the trained supernet pre-training model intermediate feature similarity map.
Figure 6 is a conduit diagram for constructing a pre-trained model decimation framework provided by the present invention. As shown in fig. 6, the step of constructing the pre-trained model decimation framework may include: performing data enhancement on input pictures twice to obtain two groups of data enhanced pictures, wherein one group of pictures is input into a dynamic ultra-net, the other group of pictures is input into a fixed network structure to obtain two loss functions, the two loss functions are uniformly recorded as loss functions, and the constructed ultra-net pre-training model is trained on the basis of the loss functions to obtain a trained ultra-net pre-training model; based on downstream tasks such as picture classification and picture segmentation, an intermediate feature picture of a model screened from a sampling space and an intermediate feature picture of a trained super-network pre-training model are obtained, similarity calculation is performed on the intermediate feature picture and the trained super-network pre-training model, a target pre-training model can be obtained, and construction of a pre-training model decimation frame is completed.
Figure 7 is a flow chart for constructing a pre-trained model decimation framework provided by the invention. As shown in fig. 7, the step of constructing the pre-trained model decimation framework may include: selecting an image data set and an automatic supervision algorithm frame; determining a sampling space of the constructed hyper-network pre-training model; preprocessing the picture data set; randomly sampling a model in a sampling space, and modifying a gradient branch network of an auto-supervision algorithm frame; inputting the preprocessed picture data set into a modified gradient branch network and a non-gradient branch network of an automatic supervision algorithm frame for forward propagation, and calculating to obtain a loss function; training the constructed hyper-network pre-training model based on the loss function; screening a model meeting a defined condition in a sampling space, and determining a downstream migration task and a downstream migration data set; reasoning the trained super-net pre-training model and the screened model in a downstream migration data set; selecting an intermediate characteristic diagram of the trained ultra-net pre-training model and an intermediate characteristic diagram of the screened model obtained by inference according to a downstream migration task; and performing similarity calculation based on the two models, and screening out an optimal model, namely a target pre-training model.
Fig. 8 is a schematic structural component diagram of an embodiment of a construction apparatus for a pre-trained model decimation frame provided by the present invention. As shown in fig. 8, the apparatus for constructing a pre-trained model decimation frame includes:
a first selection module 801, configured to select an image data set and a self-supervision comparison learning framework;
the training module 802 is used for training the constructed super-network pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained super-network pre-training model;
a second selecting module 803, configured to select a downstream migration task and a downstream migration data set;
a calculating module 804, configured to screen a first model meeting a preset condition in a sampling space obtained based on a self-supervision comparison learning framework, and calculate a similarity between the first model and a trained supernet pre-training model based on a downstream migration task and a downstream migration data set;
and the determining module 805 is configured to determine, based on the calculation result of the similarity, a target pre-training model sharing the weight with the trained hyper-network pre-training model to obtain a pre-training model decimation frame.
Optionally, the training module 802 includes:
the first calculation unit is used for inputting the image data set into a self-supervision comparison learning framework for calculation to obtain a loss function of the constructed super-network pre-training model;
and the training unit is used for training the constructed ultra-net pre-training model based on the loss function to obtain the trained ultra-net pre-training model.
Optionally, the first computing unit includes:
the enhancement unit is used for dividing the picture data in the picture data set according to a preset number of batches in each training turn, and performing data enhancement on each picture data of each batch twice respectively to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch;
the modification subunit is used for extracting a backbone network based on the characteristics of the self-supervision contrast learning frame, setting a sampling space, randomly selecting a model structure in the sampling space, and modifying the gradient updating branch network in the supervision contrast learning frame based on the selected model structure;
and the calculating subunit is used for respectively inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervised contrast learning framework for calculation to obtain the loss function.
Optionally, the enhancement unit is further configured to perform scaling, flipping, color conversion, and cropping on each piece of picture data of each batch respectively twice, so as to obtain two sets of data-enhanced picture data corresponding to each piece of picture data of each batch.
Optionally, the calculation module 804 includes:
the screening unit is used for screening a first model intermediate characteristic similar graph and a trained ultra-net pre-training model intermediate characteristic similar graph, which are obtained by reasoning the first model and the trained ultra-net pre-training model in a downstream migration data set, based on a downstream migration task;
and the second calculation unit is used for calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.
Optionally, the determining module 805 includes:
and the determining unit is used for taking the first model with the maximum similarity with the trained hyper-network pre-training model as a target pre-training model.
Fig. 9 illustrates a schematic physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor) 901, a communication interface (communication interface) 902, a memory (memory) 903 and a communication bus 904, wherein the processor 901, the communication interface 902 and the memory 903 are communicated with each other through the communication bus 904. The processor 901 may call logic instructions in the memory 903 to perform the following method of building a pre-trained model decimation framework:
selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.
In addition, the logic instructions in the memory 903 may be implemented in a software functional unit and stored in a computer readable storage medium when the logic instructions are sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for constructing a pre-training model decimation frame provided in the above embodiments:
selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the method for constructing a pre-trained model decimation frame provided in the above embodiments:
selecting an image data set and a self-supervision comparison learning frame; training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model; selecting a downstream migration task and a downstream migration data set; screening a first model meeting preset conditions in a sampling space acquired based on a self-supervision comparison learning framework, and calculating the similarity between the first model and a trained super-net pre-training model based on a downstream migration task and a downstream migration data set; and determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame.
The above-described embodiments of the apparatus are merely illustrative, wherein the modules illustrated as separate components may or may not be separate, and the components shown as modules may or may not be second modules, may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for constructing a pre-trained model decimation frame, comprising:
selecting an image data set and a self-supervision comparison learning frame;
training the constructed ultra-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-network pre-training model;
selecting a downstream migration task and a downstream migration data set;
screening a first model meeting preset conditions in a sampling space obtained based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-net pre-training model based on the downstream migration task and the downstream migration data set;
determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame;
training the constructed super-net pre-training model according to the image data set and the self-supervision contrast learning framework to obtain a trained super-net pre-training model, and the method comprises the following steps:
inputting the image data set into the self-supervision contrast learning framework for calculation to obtain a loss function of the constructed super-network pre-training model, dividing the picture data in the image data set according to a preset number of batches in each training turn, and respectively performing data enhancement on each picture data of each batch twice to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch;
extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting the sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure;
inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervised contrast learning framework respectively for calculation to obtain the loss function;
calculating the similarity of the first model and the trained hyper-network pre-training model based on the downstream migration task and the downstream migration dataset, including:
based on the downstream migration task, screening a first model intermediate feature similarity graph and a trained ultra-net pre-training model intermediate feature similarity graph, which are obtained by reasoning the first model and the trained ultra-net pre-training model in the downstream migration data set;
and calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.
2. The method for constructing the pre-trained model decimation frame according to claim 1, wherein the constructed hyper-network pre-training model is trained according to the image data set and the self-supervised contrast learning frame to obtain a trained hyper-network pre-training model, further comprising:
and training the constructed ultra-net pre-training model based on the loss function to obtain a trained ultra-net pre-training model.
3. The method for constructing a pre-trained model decimation frame according to claim 2, wherein the data enhancement is performed twice on each batch of picture data respectively to obtain two sets of data enhanced picture data corresponding to each batch of picture data, and the method comprises:
and respectively carrying out scaling, overturning, color conversion and cutting on each image data of each batch twice to obtain two groups of data enhanced image data corresponding to each image data of each batch.
4. The method for constructing a pre-trained model decimation framework according to claim 1, wherein said determining a target pre-trained model sharing weights with said trained hyper-net pre-trained model based on the calculation result of said similarity comprises:
and taking the first model with the maximum similarity with the trained hyper-network pre-training model as the target pre-training model.
5. An apparatus for constructing a pre-trained model decimation frame, comprising:
the first selection module is used for selecting an image data set and a self-supervision comparison learning frame;
the training module is used for training the constructed super-network pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained super-network pre-training model;
the second selection module is used for selecting a downstream migration task and a downstream migration data set;
the calculation module is used for screening a first model meeting a preset condition in a sampling space acquired based on the self-supervision comparison learning framework, and calculating the similarity between the first model and the trained super-network pre-training model based on the downstream migration task and the downstream migration data set;
the determining module is used for determining a target pre-training model sharing weight with the trained super-network pre-training model based on the calculation result of the similarity to obtain a pre-training model decimation frame;
training the constructed ultra-net pre-training model according to the image data set and the self-supervision comparison learning framework to obtain a trained ultra-net pre-training model, and the method comprises the following steps:
inputting the image data set into the self-supervision contrast learning framework for calculation to obtain a loss function of the constructed super-network pre-training model, dividing the picture data in the image data set according to a preset number of batches in each training turn, and respectively performing data enhancement on each picture data of each batch twice to obtain two groups of data-enhanced picture data corresponding to each picture data of each batch;
extracting a backbone network based on the characteristics of the self-supervision contrast learning framework, setting the sampling space, randomly selecting a model structure in the sampling space, and modifying a gradient updating branch network in the supervision contrast learning framework based on the selected model structure;
inputting the two groups of data-enhanced picture data corresponding to each batch of picture data into the modified gradient updating branch network and the non-gradient updating branch network in the supervised contrast learning framework respectively for calculation to obtain the loss function;
calculating the similarity of the first model and the trained hyper-network pre-training model based on the downstream migration task and the downstream migration data set, including:
based on the downstream migration task, screening a first model intermediate feature similarity graph and a trained ultra-net pre-training model intermediate feature similarity graph, which are obtained by reasoning the first model and the trained ultra-net pre-training model in the downstream migration data set;
and calculating the similarity of the first model intermediate feature similarity graph and the trained ultra-net pre-training model intermediate feature similarity graph to obtain the similarity of the first model and the trained ultra-net pre-training model.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the method of construction of a pre-trained model decimation framework as claimed in any of the claims 1 to 4.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for constructing a pre-trained model decimation framework as defined in any of the claims 1 to 4.
CN202210225051.9A 2022-03-09 2022-03-09 Construction method and device of pre-training model decimation frame Active CN114743041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210225051.9A CN114743041B (en) 2022-03-09 2022-03-09 Construction method and device of pre-training model decimation frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210225051.9A CN114743041B (en) 2022-03-09 2022-03-09 Construction method and device of pre-training model decimation frame

Publications (2)

Publication Number Publication Date
CN114743041A CN114743041A (en) 2022-07-12
CN114743041B true CN114743041B (en) 2023-01-03

Family

ID=82274841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210225051.9A Active CN114743041B (en) 2022-03-09 2022-03-09 Construction method and device of pre-training model decimation frame

Country Status (1)

Country Link
CN (1) CN114743041B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245141B (en) * 2023-01-13 2024-06-04 清华大学 Transfer learning architecture, method, electronic device and storage medium
CN116912623B (en) * 2023-07-20 2024-04-05 东北大学 Contrast learning method and system for medical image dataset

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596138A (en) * 2018-05-03 2018-09-28 南京大学 A kind of face identification method based on migration hierarchical network
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN112016531A (en) * 2020-10-22 2020-12-01 成都睿沿科技有限公司 Model training method, object recognition method, device, equipment and storage medium
CN113781518A (en) * 2021-09-10 2021-12-10 商汤集团有限公司 Neural network structure searching method and device, electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084245B (en) * 2019-04-04 2020-12-25 中国科学院自动化研究所 Weak supervision image detection method and system based on visual attention mechanism reinforcement learning
CN113344016A (en) * 2020-02-18 2021-09-03 深圳云天励飞技术有限公司 Deep migration learning method and device, electronic equipment and storage medium
CN113705276A (en) * 2020-05-20 2021-11-26 武汉Tcl集团工业研究院有限公司 Model construction method, model construction device, computer apparatus, and medium
CN111783951B (en) * 2020-06-29 2024-02-20 北京百度网讯科技有限公司 Model acquisition method, device, equipment and storage medium based on super network
CN113344089B (en) * 2021-06-17 2022-07-01 北京百度网讯科技有限公司 Model training method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596138A (en) * 2018-05-03 2018-09-28 南京大学 A kind of face identification method based on migration hierarchical network
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN112016531A (en) * 2020-10-22 2020-12-01 成都睿沿科技有限公司 Model training method, object recognition method, device, equipment and storage medium
CN113781518A (en) * 2021-09-10 2021-12-10 商汤集团有限公司 Neural network structure searching method and device, electronic device and storage medium

Also Published As

Publication number Publication date
CN114743041A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN107330956B (en) Cartoon hand drawing unsupervised coloring method and device
US10552737B2 (en) Artificial neural network class-based pruning
CN114743041B (en) Construction method and device of pre-training model decimation frame
CN108229526B (en) Network training method, network training device, image processing method, image processing device, storage medium and electronic equipment
JP6848071B2 (en) Repetitive multi-scale image generation using neural networks
EP3951702A1 (en) Method for training image processing model, image processing method, network device, and storage medium
CN110189260B (en) Image noise reduction method based on multi-scale parallel gated neural network
US11449707B2 (en) Method for processing automobile image data, apparatus, and readable storage medium
CN111523640A (en) Training method and device of neural network model
CN109615614B (en) Method for extracting blood vessels in fundus image based on multi-feature fusion and electronic equipment
CN112614072B (en) Image restoration method and device, image restoration equipment and storage medium
CN111047563A (en) Neural network construction method applied to medical ultrasonic image
CN112836820B (en) Deep convolution network training method, device and system for image classification task
CN112598110B (en) Neural network construction method, device, equipment and medium
CN113239875A (en) Method, system and device for acquiring human face features and computer readable storage medium
CN116363261A (en) Training method of image editing model, image editing method and device
CN116128036A (en) Incremental learning method, device, equipment and medium based on cloud edge collaborative architecture
CN111860465A (en) Remote sensing image extraction method, device, equipment and storage medium based on super pixels
CN111353577B (en) Multi-task-based cascade combination model optimization method and device and terminal equipment
CN116777732A (en) Image generation method, device, equipment and storage medium based on random noise
CN111784726A (en) Image matting method and device
CN116011550A (en) Model pruning method, image processing method and related devices
CN116128044A (en) Model pruning method, image processing method and related devices
CN113657136B (en) Identification method and device
CN114139720A (en) Government affair big data processing method and device based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221028

Address after: 100190 No. 95 East Zhongguancun Road, Beijing, Haidian District

Applicant after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 100190 No. 95 East Zhongguancun Road, Beijing, Haidian District

Applicant before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

GR01 Patent grant
GR01 Patent grant