CN113705276A - Model construction method, model construction device, computer apparatus, and medium - Google Patents

Model construction method, model construction device, computer apparatus, and medium Download PDF

Info

Publication number
CN113705276A
CN113705276A CN202010431405.6A CN202010431405A CN113705276A CN 113705276 A CN113705276 A CN 113705276A CN 202010431405 A CN202010431405 A CN 202010431405A CN 113705276 A CN113705276 A CN 113705276A
Authority
CN
China
Prior art keywords
network
trained
training
model
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010431405.6A
Other languages
Chinese (zh)
Inventor
李叶伟
陈浩鹏
熊宇龙
李渊
向少雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan TCL Group Industrial Research Institute Co Ltd
Original Assignee
Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan TCL Group Industrial Research Institute Co Ltd filed Critical Wuhan TCL Group Industrial Research Institute Co Ltd
Priority to CN202010431405.6A priority Critical patent/CN113705276A/en
Publication of CN113705276A publication Critical patent/CN113705276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The model construction method is suitable for the technical field of model construction, and provides a model construction method, a model construction device, computer equipment and a medium, wherein the model construction method is used for training a super network with a search space to obtain the trained super network; and searching a network frame to be trained from the trained hyper-network according to a preset search condition, and finally training and optimizing the network frame to be trained based on the training sample set to obtain a target model, so that the original model frame does not need to be subjected to structure optimization and content optimization, and the efficiency of model construction is improved.

Description

Model construction method, model construction device, computer apparatus, and medium
Technical Field
The present application relates to a model building method, a model building apparatus, a computer device, and a computer-readable storage medium.
Background
In recent years, with the development of artificial intelligence technology, operations such as recognition and judgment of real world objects by using mathematical operation models have been developed in more and more fields to relieve manual labor. For example, the face recognition technology adopts a face recognition model to perform feature recognition on a face image, and further determines whether the source of the face image is legal, that is, whether the identity of a user is legal.
In the related art, when a face recognition model is constructed, an original model is constructed based on a neural network, and then the original model is trained and verified by constructing a training sample set and a verification set. However, when constructing the original model based on the neural network, it is necessary to select a corresponding model frame according to actual requirements, and perform operations such as structure optimization and content optimization on the model frame, for example, deleting or adding a hierarchical structure of the model frame; for another example, a channel deletion or a channel addition is performed on a certain level in the model framework. Therefore, the model building process in the existing model building scheme is complicated, and the problem of low model building efficiency exists.
Disclosure of Invention
In view of this, embodiments of the present application provide a model building method, a model building apparatus, a computer device, and a computer-readable storage medium, so as to solve the problem that the existing model building scheme has low model building efficiency.
A first aspect of an embodiment of the present application provides a model building method, including:
training the super network with the search space to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks;
searching from the trained hyper-network to obtain a network frame to be trained according to a preset search condition;
and training the network frame to be trained by utilizing a training sample set to obtain a target model.
In the foregoing scheme, before the step of training the network frame to be trained by using the training sample set to obtain the target model, the method further includes:
acquiring a sample image set containing a human face;
intercepting a face area from each sample image in the sample image set to obtain a face image sample set;
and zooming and labeling each face image sample in the face image sample set to obtain the training sample set.
In the above solution, a plurality of candidate network frames in the search space are connected to each other, and each candidate network frame includes a plurality of substructures;
the training of the super network with the search space to obtain the trained super network comprises the following steps:
performing structure search based on all the substructures in the search space to determine a single-path supernet;
and carrying out sampling training on the super network according to the single-path super network to obtain the trained super network.
In the foregoing solution, the performing a structure search based on all the substructures in the search space to determine a single-path supernet includes:
performing structure search on all the substructures in the search space according to preset ultra-network attribute information to obtain a single-path ultra-network; wherein a transition probability between two adjacent substructures in the single-path super-network is the largest.
In the scheme, the preset search condition corresponds to a target deployment platform;
the searching from the trained super network to obtain the network frame to be trained according to the preset search condition comprises the following steps:
searching a network frame to be trained from the trained hyper-network based on the following formula contained in the preset search condition;
ACCval(a) (a,a∈A,A>0)
Latency(a,h)≤LatCh
wherein a represents the candidate network framework and satisfies the condition (a, a belongs to A, and A is more than 0); a is the number of the candidate network frames in the search space, and A is more than 0; ACC (adaptive cruise control)val(a) To verify the accuracy; h represents a target deployment platform, and h is greater than 0; latency (a, h) is an objective function; latChDeploying a latency of the platform for the target.
In the above scheme, the obtaining a training sample set, and training the network frame to be trained by using the training sample set to obtain a target model includes:
and training the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model.
In the above scheme, the training the network frame to be trained by using the back propagation and gradient optimization method and using the training sample set to obtain a target model, includes:
identifying the objective function Latency (a, h) as a convergence condition; wherein Latency (a, h) is less than or equal to LatCh,LatChDeploying a time delay of the platform for the target;
and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model meeting the convergence condition.
A second aspect of an embodiment of the present application provides a model building apparatus, including:
the sampling training unit is used for training the super network constructed with the search space to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks;
the searching unit is used for searching the trained super network to obtain a network frame to be trained according to a preset searching condition;
and the model training unit is used for training the network frame to be trained by utilizing a training sample set to obtain a target model.
In the foregoing solution, the model building apparatus further includes:
the image acquisition unit is used for acquiring a sample image set containing a human face;
the image intercepting unit is used for intercepting a face area from each sample image in the sample image set to obtain a face image sample set;
and the sample generating unit is used for scaling and labeling each face image sample in the face image sample set to obtain the training sample set.
In the above solution, a plurality of candidate network frames in the search space are connected to each other, and each candidate network frame includes a plurality of substructures;
the sampling training unit is specifically configured to perform structure search based on all the substructures in the search space, and determine a single-path supernet; and carrying out sampling training on the super network according to the single-path super network to obtain the trained super network.
In the above scheme, the sampling training unit is further specifically configured to perform structure search on all the substructures in the search space according to preset attribute information of the super-network, so as to obtain a single-path super-network; wherein a transition probability between two adjacent substructures in the single-path super-network is the largest.
In the scheme, the preset search condition corresponds to a target deployment platform;
the searching unit is specifically configured to search a network frame to be trained from the trained super network based on the following formula included in the preset searching condition;
ACCval(a) (a,a∈A,A>0)
Latency(a,h)≤LatCh
wherein a represents the candidate network framework and satisfies the condition (a, a belongs to A, and A is more than 0); a is the number of the candidate network frames in the search space, and A is more than 0; ACC (adaptive cruise control)val(a) To verify the accuracy; h represents a target deployment platform, and h is greater than 0; latency (a, h) is an objective function; latChDeploying a latency of the platform for the target.
In the above scheme, the model training unit is specifically configured to train the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain the target model.
In the above scheme, the model training unit is specifically configured to identify the objective function Latency (a, h) as a convergence condition; wherein Latency (a, h) is less than or equal to LatCh,LatChDeploying a time delay of the platform for the target; and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model meeting the convergence condition.
A third aspect of embodiments of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the computer device, and the processor implements the steps of the model building method provided in the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the model construction method provided by the first aspect.
A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a computer device, causes the computer device to perform the steps of the model building method according to any one of the first aspect.
The model construction method, the model construction device, the computer equipment and the computer readable storage medium provided by the embodiment of the application have the following beneficial effects:
according to the model construction method provided by the embodiment of the application, a trained hyper-network is obtained by training the hyper-network with a search space, and the search space is pre-constructed in the hyper-network, so that all candidate network frames needing to be searched can be contained in the hyper-network, and when the hyper-network is trained, all internal substructures can share parameters when different sub-networks are constructed, so that the hyper-network can be trained to a certain degree, and the sub-networks can be sampled and indexes can be evaluated; according to the preset search conditions, the network frame to be trained is searched from the trained hyper-network, and finally the network frame to be trained is trained and optimized based on the training sample set to obtain the target model, so that the structure optimization and the content optimization of the original model frame are not needed, the steps of model construction are simplified, and the efficiency of model construction is improved.
In addition, the model searching condition corresponds to the target deployment platform, and the network frame to be trained is searched out from the trained hyper-network and is the optimal network path, so that the network frame to be trained is the network frame which is searched out to be matched with the model deployment limiting condition best, and finally the target model is obtained by training and optimizing the network frame to be trained, and the matching degree between the computing capabilities of the target model and the target deployment platform can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flow chart of an implementation of a model building method provided in an embodiment of the present application;
FIG. 2 is a flow chart of an implementation of a model building method according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a candidate network framework in an embodiment of the present application;
fig. 4 is a block diagram of a model building apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of a computer device according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that, in the model building methods provided in all embodiments of the present application, the execution subject is a computer device for building a model, such as a server for model deployment, a computer node for model deployment in a distributed system, and the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a model building method according to an embodiment of the present disclosure.
The model construction method shown in fig. 1 includes the following steps:
s11: training the super network with the search space to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks.
In step S11, a search space is pre-constructed in the super network, and the search space includes a plurality of candidate network frames. Each candidate network frame is composed of a plurality of substructures, a part of the substructures among the candidate network frames are connected to form a sub-network, and all the sub-networks form a super network.
It should be noted that the super network is constructed based on a Neural Architecture Search (NAS) method. In the NAS, an algorithm is used for replacing a manual design model frame, and an optimal neural network frame is automatically searched in a massive search space, namely a candidate network frame to be optimized is searched from a plurality of candidate network frames. Here, a plurality of candidate network frameworks are integrated, that is, a part or all of the substructures in each candidate network framework are connected according to a certain integration logic, so as to form a plurality of sub-networks, and the plurality of sub-networks form a super-network.
The search space is a method for integrating substructures with different channel numbers (different widths) into a super network by taking the substructures contained in all the sub-networks in the super network as structure search objects on the basis of the network structure of the super network. Because it is difficult to integrate substructures of different widths into a super network according to the conventional NAS method, the search cost can be greatly reduced by constructing a search space containing all the structures to be searched.
Before the search space is constructed, each candidate network frame can be regarded as an independent module block, and the width and the spatial resolution of each block are different. Based on all blocks (all candidate network frameworks) in the super network, the process of constructing the search space is to connect a plurality of blocks with different widths and spatial resolutions with each other, namely to use a plurality of hierarchical structures in each block as a plurality of substructures in each candidate network framework, and then to connect in a purposeful and directional manner.
In this embodiment, training is performed on the super network in which the search space is constructed, and actually, sampling training is performed on the super network in which the search space is constructed. Because the core idea of the super network is to train a large number of network structures, that is, a large number of candidate network frames, simultaneously in a parameter sharing manner, in this embodiment, in order to treat all candidate network frames equally, when performing sampling training on the super network constructed with a search space, a method without super parameters is used to perform uniform sampling training on the super network, and then the trained super network is obtained.
S12: and searching a network frame to be trained from the trained super network according to a preset search condition.
In step S12, the preset search condition is related to the predicted performance index of the network framework to be trained.
In this embodiment, since all the substructures in the piconet share parameters when constructing different subnetworks, the subnetwork can be sampled and index evaluated only by training the piconet to a certain degree, and retraining of the subnetwork is not needed. And selecting the optimal candidate network frame by evaluating the performance of the candidate network frame on the super network, namely searching the network frame to be trained from the trained super network.
It should be noted that, because the blocks with different widths and spatial resolutions are connected with each other in the search space, the process of searching the network frame to be trained from the trained super network based on the pre-configured model deployment limiting condition is a process of optimizing the transition probability between the blocks to select an optimal path, that is, the searched network frame to be trained is an optimal candidate network frame under the operational capability of the target deployment platform represented by the model deployment limiting condition.
It can be understood that, in practical application, the corresponding model deployment limiting conditions may be set according to the specific performance or speed requirement of the target deployment platform, so as to search and deploy the optimal candidate network framework of the target deployment platform.
S13: and training the network frame to be trained by utilizing a training sample set to obtain a target model.
In step S13, the training sample set includes a plurality of training samples, where each training sample includes a sample feature.
Taking the target model for face recognition as an example, the training samples are a plurality of image samples containing faces, and in each image sample containing a face, the face region is a sample feature of the image sample.
The network frame to be trained is trained by using the training sample set, and the obtained target model, for example, by using a plurality of image samples containing faces and the network frame to be trained, can be used for recognizing the face region from the image containing the faces.
It should be understood that, in the model construction process, different training sample sets can be selected and obtained according to different purposes or purposes of the target model.
As can be seen from the above, in the model construction method provided in this embodiment, a trained hyper network is obtained by training a hyper network constructed with a search space, and the search space is pre-constructed in the hyper network, so that all candidate network frames to be searched can be included in the hyper network, and when the hyper network is trained, all internal substructures can share parameters when different sub-networks are constructed, so that the hyper network can be trained to a certain extent, and then the sub-networks can be sampled and indexes can be evaluated; according to the preset search conditions, the network frame to be trained is searched from the trained hyper-network, and finally the network frame to be trained is trained and optimized based on the training sample set to obtain the target model, so that the structure optimization and the content optimization of the original model frame are not needed, the steps of model construction are simplified, and the efficiency of model construction is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of a model building method according to another embodiment of the present application. With respect to the embodiment corresponding to fig. 1, the model building method provided in this embodiment may further include steps S21 to S22 before step S11, and further include steps S23 to S25 before step S13. The details are as follows:
s21: a plurality of candidate network frameworks are obtained.
S22: and constructing a search space of the super network based on a plurality of the candidate network frameworks.
In this embodiment, the plurality of candidate network frames may be candidate network frames selected by the model deployment tool, or candidate network frames obtained from the target database.
In practical application, in the process of model deployment, a network framework meeting the model construction requirement can be selected from a target database as a candidate network framework through a model deployment tool, wherein information in the target database is used for describing a corresponding relation between the network framework and applicable equipment information thereof, and the applicable equipment information is used for distinguishing a deployment platform. When a search space of the hyper-network is constructed, a network frame can be obtained from a target database as a candidate network frame based on the applicable device information corresponding to the target deployment platform.
It should be appreciated that there are different computational logic and memory complexities between multiple candidate network frameworks. Fig. 3 shows a schematic diagram of a candidate network framework in the present embodiment. The module blocks shown in fig. 3 may be constructed based on 4 basic structures of the currently valid model, that is, A, B, C, D in fig. 3. The search space in this embodiment contains 32 candidate blocks.
The candidate network frameworks in all embodiments of the application can be constructed based on at least one of ShuffleNet V2, SPOS, DARTS and MobileNet V3.
In this embodiment, candidate network architectures can be searched for deployment platforms such as a DSP, an ARM CPU, and an NPU, and the deployment platforms are not limited thereto.
After the space of the super network is constructed, the super network is subjected to sampling training, i.e., step S11 is performed.
S11: and training the super network constructed with the search space to obtain the trained super network.
As a possible implementation manner of this embodiment, a plurality of candidate network frames in the search space are connected to each other, each of the candidate network frames includes a plurality of substructures, and step S11 specifically includes: performing structure search based on all the substructures in the search space to determine a single-path supernet; and carrying out sampling training on the super network according to the single-path super network to obtain the trained super network.
In this embodiment, to reduce the weight coupling of the hypernetwork, only the single-path hypernetwork in the hypernetwork is activated in each iterative training process by determining the single-path hypernetwork. The multiple substructures in a single candidate network framework are at different levels, respectively. When the single-path super network carries out sampling training on the super network, the selection of the substructures is guided without any super parameters, and all the substructures in the super network are treated equally by adopting a uniform sampling mode.
In practical application, the attribute information of the single-path super network can be configured according to actual requirements, and structure search is carried out according to the attribute information. Specifically, different types of selection units are defined to search different structure variables, and further channel number search in a search space for searching a complex model structure is supported. The selection unit is used for searching a substructure, such as the number of channels of a convolutional layer, randomly selecting the number of channels and segmenting corresponding sub-tensors for convolution during the period of hyper-network training by pre-distributing a weight tensor with the maximum number of channels to realize structure search, and further selecting the substructures positioned on different levels from different candidate network frames to be connected to form the single-path hyper-network.
It should be noted that, the model structure search is performed by using the super network, so that the network framework to be trained can be searched, the key reason is that in the verification set, the precision of any substructure using the multiplexing weight is highly reliable, that is, when the weight is required to approximate the optimal weight, the approximation effect is in direct proportion to the degree to which the training loss function is minimized. It follows that optimization of the weights of the super-network should be done simultaneously with optimization of all sub-structures in the search space. Therefore, when the super network is uniformly sampled and trained, all internal substructures share parameters when different sub-networks are constructed, and the sub-networks can be sampled and indexes can be evaluated only by training the super network to a certain degree, namely, the sub-networks do not need to be retrained.
In all embodiments of the present application, where the super-network has multiple selectable substructures per layer, the training super-network typically selects a single path to train through a uniform path sampling method. That is, training a super-network to uniformly sample is random, since all candidate network frameworks can optimize their weights simultaneously. To reduce weight coupling in the super-network, a simple search space containing only a single-path architecture, i.e., a single-path super-network, is used. For training, a method without hyper-parameters is used, all candidate network architectures are treated equally by uniform sampling, model searching automation is realized, and meanwhile model construction efficiency is improved.
As a possible implementation manner of this embodiment, the performing a structure search based on all the substructures in the search space to determine a single-path supernet includes:
performing structure search on all the substructures in the search space according to preset ultra-network attribute information to obtain a single-path ultra-network; wherein a transition probability between two adjacent substructures in the single-path super-network is the largest.
In the embodiment, as the candidate network frameworks in the search space are connected with each other, the corresponding connection relationship also exists between the substructure in each candidate network framework and the substructures in other candidate network frameworks, and the super-network is formed. The number of candidate network frames, the transmission logic in each candidate network frame, such as the input channel, the output channel and the space size of the candidate network frame, and the step length used in each candidate network frame are controlled by defining preset attribute information of the hyper-network, and then all the substructures in the search space are subjected to structure search, so that the single-path hyper-network with the maximum transition probability between every two adjacent substructures is obtained.
It should be understood that the transition probability is used to describe a matching degree between two adjacent substructures, and in the process of performing the structure search, because a plurality of candidate network frameworks in the search space are connected with each other, so that a corresponding connection relationship also exists between a substructure in each candidate network framework and a substructure in other candidate network frameworks, each two adjacent substructures may not be from the same candidate network framework, but the matching degree between the two adjacent substructures is the highest based on the preset extranet attribute information.
The preset super-network attribute information is defined as shown in table 1, and a candidate network frame to be searched is indicated by marking TBS in a module Block column. In table 1, Input shape represents the Input model, Block represents the module, channels represent the channels, repeat is the number of repetitions, stride is the step size.
Figure BDA0002500732670000111
Figure BDA0002500732670000121
TABLE 1
S12: and searching the trained hyper-network to obtain a network frame to be trained according to a preset search condition.
As a possible implementation manner of this embodiment, the preset search condition corresponds to the target deployment platform, and step S12 specifically includes:
searching a network frame to be trained from the trained hyper-network based on the following formula contained in the preset search condition;
ACCval(a) (a,a∈A,A>0)
Latency(a,h)≤LatCh
wherein a represents the candidate network framework and satisfies the condition (a, a belongs to A, and A is more than 0); a is the number of the candidate network frames in the search space, and A is more than 0; ACC (adaptive cruise control)val(a) To verify the accuracy; h represents a target deployment platform, and h is greater than 0; latency (a, h) is an objective function; latChDeploying a latency of the platform for the target.
In this embodiment, the preset search condition corresponds to the target deployment platform, the preset search condition is related to the operational capability of the target deployment platform, and the preset search condition can also be used for representing the operational capability of the target deployment platform, so as to distinguish the candidate network framework suitable for the target deployment platform.
It is understood that a target deployment platform refers to a hardware device for deploying and providing computational resources for a target model.
By limiting the time delay of the target function and the target deployment platform, a network framework to be trained which is more suitable for the target deployment platform can be searched from the trained hyper-network. In practical application, the time delay LatC of the target deployment platformhMay be obtained by the line termination unit LTU or the delay predictor method.
In this other embodiment, lfw may be preferably used as the verification set.
With respect to the embodiment shown in fig. 1, the model building method provided in this embodiment further includes steps S23 to S25 before step S13. The details are as follows:
s23: a sample image set containing a human face is acquired.
S24: and intercepting a face area from each sample image in the sample image set to obtain a face image sample set.
S25: and zooming and labeling each face image sample in the face image sample set to obtain the training sample set.
In this embodiment, the sample image set includes a plurality of sample images, and each sample image includes at least one face region. And identifying and intercepting a face region of each sample image, namely, positioning the face region in the sample image, namely identifying the position of the face region in the sample image, and intercepting the face region from the sample image to further obtain a face image sample set.
It should be noted that, scaling is performed on each face image sample in the face image sample set, and the face image samples are subjected to uniform normalization processing. And each human face image sample in the human face image sample set is annotated, so that the human face image samples are distinguished by more detailed features.
Because different faces have size differences in the same or different images, scaling each face image sample is performed to perform standardized processing on the sample, which is beneficial to improving the efficiency of model training. It is to be understood that the scaling of each face image sample may be a long-edge scaling, i.e. scaling the face image sample in the horizontal direction.
In order to distinguish each face image sample, each face image sample in the face image sample set is also labeled in the embodiment, where the label may be an identifier for distinguishing a face image, and may be at least one of a name, a gender, a race, and a number of people.
In this embodiment, the steps S21 to S23 and the steps S11 to S12 are not performed in sequence, and the step S13 may be performed after each face image sample in the face image sample set is scaled and labeled to obtain a training sample set.
S13: and acquiring a training sample set, and training the network frame to be trained by using the training sample set to obtain a target model.
As a possible implementation manner of this embodiment, step S13 specifically includes:
and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model.
In this embodiment, a Back Propagation (BP) algorithm and a gradient optimization method are used to perform training optimization on the network frame to be trained according to the objective function and the training sample set. The gradient optimization method can be not limited to known optimization algorithms such as Adam algorithm, RMSprop algorithm and SGD algorithm; the objective function may be, but is not limited to, an AM-Softmax function, an ArcNegFace function, a CosFace function, and an ArcNegFace function.
As a possible implementation manner of this embodiment, the training the network framework to be trained by using the back propagation and gradient optimization method and using the training sample set to obtain the target model includes:
identifying the objective function Latency (a, h) as a convergence condition; wherein Latency (a, h) is less than or equal to LatCh,LatChDeploying a time delay of the platform for the target;
and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model meeting the convergence condition.
In this embodiment, the convergence condition configured when the network frame to be trained is the same as the target function in the preset search condition, and the target function in the preset search condition is used as the convergence condition for training the network frame to be trained, so that the target model is more suitable for the target deployment platform, that is, more suitable for the time delay requirement of the target deployment platform.
It should be understood that, since the embodiments of the present application do not relate to how to configure an optimization strategy, and it belongs to the prior art that training a model by using an optimization algorithm and an objective function in the field of connection between a deep neural network and the model, details of an optimization scheme and configuration of the objective function are not described here.
As can be seen from the above, in the model construction method provided in this embodiment, a trained hyper network is obtained by training a hyper network constructed with a search space, and the search space is pre-constructed in the hyper network, so that all candidate network frames to be searched can be included in the hyper network, and when the hyper network is trained, all internal substructures can share parameters when different sub-networks are constructed, so that the hyper network can be trained to a certain extent, and then the sub-networks can be sampled and indexes can be evaluated; according to the preset search conditions, the network frame to be trained is searched from the trained hyper-network, and finally the network frame to be trained is trained and optimized based on the training sample set to obtain the target model, so that the structure optimization and the content optimization of the original model frame are not needed, the steps of model construction are simplified, and the efficiency of model construction is improved.
In addition, the model searching condition corresponds to the target deployment platform, and the network frame to be trained is searched out from the trained hyper-network and is the optimal network path, so that the network frame to be trained is the network frame which is searched out to be matched with the model deployment limiting condition best, and finally the target model is obtained by training and optimizing the network frame to be trained, and the matching degree between the computing capabilities of the target model and the target deployment platform can be improved.
In addition, by constructing a search space of the super network based on a plurality of candidate network frames, the method takes width search as a starting point, and simultaneously can search the position and the global depth of network down-sampling, is not limited to the number of layers in each candidate network frame, and the number of substructures of each candidate network frame can be searched, so that the flexibility of network structure search is improved, and a realization basis is provided for the model to be deployed on deployment platforms with different requirements.
Referring to fig. 4, fig. 4 is a block diagram illustrating a model building apparatus according to an embodiment of the present disclosure. The model building apparatus in this embodiment includes units for performing the steps in the embodiments corresponding to fig. 1 to 2. Please refer to fig. 1 to 2 and fig. 1 to 2 for the corresponding embodiments. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 3, the model building apparatus 30 includes: a sampling training unit 31, a search unit 32, and a model training unit 33. Wherein:
a sampling training unit 31, configured to train a super network in which a search space is constructed, to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks.
And the searching unit 32 is configured to search the trained super network to obtain a network frame to be trained according to a preset search condition.
And the model training unit 33 is configured to train the network frame to be trained by using a training sample set to obtain a target model.
As an embodiment of the present application, the model building apparatus 30 further includes: an image acquisition unit 36, an image cutout unit 37, and a sample generation unit 38.
An image obtaining unit 36 is configured to obtain a sample image set including a human face.
An image clipping unit 37, configured to clip a face region from each sample image in the sample image set, so as to obtain a face image sample set.
And the sample generating unit 38 is configured to scale and label each facial image sample in the facial image sample set to obtain the training sample set.
As an embodiment of the application, a plurality of candidate network frameworks in a search space are connected with each other, and each candidate network framework comprises a plurality of substructures.
The sampling training unit 31 is specifically configured to perform structure search based on all the substructures in the search space, and determine a single-path supernet; and carrying out sampling training on the super network according to the single-path super network to obtain the trained super network.
As an embodiment of the present application, the sampling training unit 31 is further specifically configured to perform structure search on all the substructures in the search space according to preset attribute information of the piconet, so as to obtain a single-path piconet; wherein a transition probability between two adjacent substructures in the single-path super-network is the largest.
As an embodiment of the application, a preset search condition corresponds to a target deployment platform; the search unit 32 is used in particular for,
searching a network frame to be trained from the trained hyper-network based on the following formula contained in the preset search condition;
ACCval(a) (a,a∈A,A>0)
Latency(a,h)≤LatCh
wherein a represents the candidate network framework and satisfies the condition (a, a belongs to A, and A is more than 0); a is the number of the candidate network frames in the search space, and A is more than 0; ACC (adaptive cruise control)val(a) To verify the accuracy; h represents a target deployment platform, and h is greater than 0; latency (a, h) is an objective function; latChDeploying a latency of the platform for the target.
As an embodiment of the present application, the model training unit 33 is specifically configured to use a back propagation and gradient optimization method to train the network frame to be trained by using the training sample set, so as to obtain a target model.
As an embodiment of the present application, the model training unit 33 is specifically configured to identify the objective function Latency (a, h) as a convergence condition; wherein Latency (a, h) is less than or equal to LatCh,LatChDeploying a time delay of the platform for the target; and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model meeting the convergence condition.
As can be seen from the above, in the scheme provided in this embodiment, a trained hyper network is obtained by training a hyper network in which a search space is constructed, and the search space is constructed in advance in the hyper network, so that all candidate network frames that need to be searched can be included in the hyper network, and when the hyper network is trained, all internal substructures can share parameters when different sub-networks are constructed, so that the hyper network can be trained to a certain extent, and then the sub-networks can be sampled and indexes can be evaluated; according to the preset search conditions, the network frame to be trained is searched from the trained hyper-network, and finally the network frame to be trained is trained and optimized based on the training sample set to obtain the target model, so that the structure optimization and the content optimization of the original model frame are not needed, the steps of model construction are simplified, and the efficiency of model construction is improved.
In addition, the model searching condition corresponds to the target deployment platform, and the network frame to be trained is searched out from the trained hyper-network and is the optimal network path, so that the network frame to be trained is the network frame which is searched out to be matched with the model deployment limiting condition best, and finally the target model is obtained by training and optimizing the network frame to be trained, and the matching degree between the computing capabilities of the target model and the target deployment platform can be improved.
In addition, by constructing a search space of the super network based on a plurality of candidate network frames, the method takes width search as a starting point, and simultaneously can search the position and the global depth of network down-sampling, is not limited to the number of layers in each candidate network frame, and the number of substructures of each candidate network frame can be searched, so that the flexibility of network structure search is improved, and a realization basis is provided for the model to be deployed on deployment platforms with different requirements.
Fig. 5 is a block diagram of a computer device according to an embodiment of the present disclosure. As shown in fig. 5, the computer device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a program of a model building method, stored in said memory 41 and executable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the embodiments of the model construction methods described above, such as S11 to S13 shown in fig. 1 or S21 to S25 and S11 to S13 shown in fig. 2. Alternatively, when the processor 40 executes the computer program 42, the functions of the units in the embodiment corresponding to fig. 3 are implemented, for example, the functions of the units 31 to 33 shown in fig. 4, or the functions of the units 31 to 38 shown in fig. 4 specifically refer to the description in the embodiment corresponding to fig. 4, which is not described herein again.
Illustratively, the computer program 42 may be divided into one or more units, which are stored in the memory 41 and executed by the processor 40 to accomplish the present application. The one or more units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program 42 in the computer device 4. For example, the computer program 42 may be partitioned into a sampling training unit, a search unit, and a model training unit, each unit functioning as described above.
The computer device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 5 is merely an example of a computer device 4 and is not intended to limit computer device 4 and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. The memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the computer device 4. The memory 41 is used for storing the computer program and other programs and data required by the computer device. The memory 41 may also be used to temporarily store data that has been output or is to be output. Gradient waveform adjustment-based gradient field control method and magnetic resonance imaging equipment
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of model construction, comprising:
training the super network with the search space to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks;
searching from the trained hyper-network to obtain a network frame to be trained according to a preset search condition;
and training the network frame to be trained by utilizing a training sample set to obtain a target model.
2. The model building method according to claim 1, wherein before the step of training the network framework to be trained by using the training sample set to obtain the target model, the method further comprises:
acquiring a sample image set containing a human face;
intercepting a face area from each sample image in the sample image set to obtain a face image sample set;
and zooming and labeling each face image sample in the face image sample set to obtain the training sample set.
3. The model building method of claim 1, wherein a plurality of candidate web frames in the search space are interconnected, each of the candidate web frames comprising a plurality of substructures;
the training of the super network with the search space to obtain the trained super network comprises the following steps:
performing structure search based on all the substructures in the search space to determine a single-path supernet;
and carrying out sampling training on the super network according to the single-path super network to obtain the trained super network.
4. The model building method of claim 3, wherein said performing a structure search based on all of said substructures in said search space to determine a single-path supernet comprises:
performing structure search on all the substructures in the search space according to preset ultra-network attribute information to obtain a single-path ultra-network; wherein a transition probability between two adjacent substructures in the single-path super-network is the largest.
5. The model building method according to claim 1, wherein the preset search condition corresponds to a target deployment platform;
the searching from the trained super network to obtain the network frame to be trained according to the preset search condition comprises the following steps:
searching a network frame to be trained from the trained hyper-network based on the following formula contained in the preset search condition;
ACCval(a) (a,a∈A,A>0)
Latency(a,h)≤LatCh
wherein a represents the candidate network framework and satisfies the condition (a, a belongs to A, and A is more than 0); a is the number of the candidate network frames in the search space, and A is more than 0; ACC (adaptive cruise control)val(a) To verify the accuracy; h represents a target deployment platform, and h is greater than 0; latency (a, h) is an objective function; latChDeploying a latency of the platform for the target.
6. The model building method according to claim 5, wherein the obtaining a training sample set and training the network framework to be trained by using the training sample set to obtain a target model comprises:
and training the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model.
7. The model building method according to claim 6, wherein the training the network framework to be trained by using the set of training samples to obtain the target model by using a back propagation and gradient optimization method comprises:
identifying the objective function Latency (a, h) as a convergence condition; wherein Latency (a, h) is less than or equal to LatCh,LatChDeploying a time delay of the platform for the target;
and training and optimizing the network frame to be trained by using the training sample set by using a back propagation and gradient optimization method to obtain a target model meeting the convergence condition.
8. A model building apparatus, comprising:
the sampling training unit is used for training the super network constructed with the search space to obtain a trained super network; wherein the search space contains a plurality of candidate web frameworks;
the searching unit is used for searching the trained super network to obtain a network frame to be trained according to a preset searching condition;
and the model training unit is used for training the network frame to be trained by utilizing a training sample set to obtain a target model.
9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored in the memory and executable on the computer device, the processor implementing the steps of the model construction method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the model building method according to any one of claims 1 to 7.
CN202010431405.6A 2020-05-20 2020-05-20 Model construction method, model construction device, computer apparatus, and medium Pending CN113705276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010431405.6A CN113705276A (en) 2020-05-20 2020-05-20 Model construction method, model construction device, computer apparatus, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010431405.6A CN113705276A (en) 2020-05-20 2020-05-20 Model construction method, model construction device, computer apparatus, and medium

Publications (1)

Publication Number Publication Date
CN113705276A true CN113705276A (en) 2021-11-26

Family

ID=78645633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010431405.6A Pending CN113705276A (en) 2020-05-20 2020-05-20 Model construction method, model construction device, computer apparatus, and medium

Country Status (1)

Country Link
CN (1) CN113705276A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743041A (en) * 2022-03-09 2022-07-12 中国科学院自动化研究所 Construction method and device of pre-training model decimation frame
CN116051964A (en) * 2023-03-30 2023-05-02 阿里巴巴(中国)有限公司 Deep learning network determining method, image classifying method and device
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015390A1 (en) * 2015-07-20 2017-01-26 University Of Maryland, College Park Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
KR101713891B1 (en) * 2016-01-08 2017-03-09 (주)모자이큐 User Admittance System using Partial Face Recognition and Method therefor
CN108491812A (en) * 2018-03-29 2018-09-04 百度在线网络技术(北京)有限公司 The generation method and device of human face recognition model
CN109615073A (en) * 2018-12-03 2019-04-12 郑州云海信息技术有限公司 A kind of construction method of neural network model, equipment and storage medium
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium
CN111047563A (en) * 2019-11-26 2020-04-21 深圳度影医疗科技有限公司 Neural network construction method applied to medical ultrasonic image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015390A1 (en) * 2015-07-20 2017-01-26 University Of Maryland, College Park Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
KR101713891B1 (en) * 2016-01-08 2017-03-09 (주)모자이큐 User Admittance System using Partial Face Recognition and Method therefor
CN108491812A (en) * 2018-03-29 2018-09-04 百度在线网络技术(北京)有限公司 The generation method and device of human face recognition model
CN109615073A (en) * 2018-12-03 2019-04-12 郑州云海信息技术有限公司 A kind of construction method of neural network model, equipment and storage medium
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium
CN111047563A (en) * 2019-11-26 2020-04-21 深圳度影医疗科技有限公司 Neural network construction method applied to medical ultrasonic image

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743041A (en) * 2022-03-09 2022-07-12 中国科学院自动化研究所 Construction method and device of pre-training model decimation frame
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116188834B (en) * 2022-12-08 2023-10-20 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116051964A (en) * 2023-03-30 2023-05-02 阿里巴巴(中国)有限公司 Deep learning network determining method, image classifying method and device
CN116051964B (en) * 2023-03-30 2023-06-27 阿里巴巴(中国)有限公司 Deep learning network determining method, image classifying method and device

Similar Documents

Publication Publication Date Title
CN111819580A (en) Neural architecture search for dense image prediction tasks
CN110689038A (en) Training method and device of neural network model and medical image processing system
US11625433B2 (en) Method and apparatus for searching video segment, device, and medium
CN113705276A (en) Model construction method, model construction device, computer apparatus, and medium
US20190317965A1 (en) Methods and apparatus to facilitate generation of database queries
CN113761261A (en) Image retrieval method, image retrieval device, computer-readable medium and electronic equipment
JP2022543954A (en) KEYPOINT DETECTION METHOD, KEYPOINT DETECTION DEVICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
CN113361636B (en) Image classification method, system, medium and electronic device
US20230092619A1 (en) Image classification method and apparatus, device, storage medium, and program product
CN114329029B (en) Object retrieval method, device, equipment and computer storage medium
CN113033507B (en) Scene recognition method and device, computer equipment and storage medium
CN114925238B (en) Federal learning-based video clip retrieval method and system
CN114298997B (en) Fake picture detection method, fake picture detection device and storage medium
CN113192175A (en) Model training method and device, computer equipment and readable storage medium
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN113761282B (en) Video duplicate checking method and device, electronic equipment and storage medium
KR102435035B1 (en) The Fake News Video Detection System and Method thereby
CN116152938A (en) Method, device and equipment for training identity recognition model and transferring electronic resources
CN111063000B (en) Magnetic resonance rapid imaging method and device based on neural network structure search
CN117095460A (en) Self-supervision group behavior recognition method and system based on long-short time relation predictive coding
CN116740078A (en) Image segmentation processing method, device, equipment and medium
Lu et al. Siamese graph attention networks for robust visual object tracking
US11755671B2 (en) Projecting queries into a content item embedding space
CN117010480A (en) Model training method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination