CN116502679B - Model construction method and device, storage medium and electronic equipment - Google Patents
Model construction method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN116502679B CN116502679B CN202310543696.1A CN202310543696A CN116502679B CN 116502679 B CN116502679 B CN 116502679B CN 202310543696 A CN202310543696 A CN 202310543696A CN 116502679 B CN116502679 B CN 116502679B
- Authority
- CN
- China
- Prior art keywords
- model
- candidate
- candidate model
- architecture
- framework
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 35
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 69
- 238000012360 testing method Methods 0.000 claims abstract description 17
- 238000011156 evaluation Methods 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims description 55
- 230000015654 memory Effects 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 20
- 238000005457 optimization Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 8
- 238000013136 deep learning model Methods 0.000 abstract description 21
- 238000010586 diagram Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 14
- 230000008859 change Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000006872 improvement Effects 0.000 description 8
- 230000001537 neural effect Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000153 supplemental effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Train Traffic Observation, Control, And Security (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The specification discloses a model construction method, a device, a storage medium and electronic equipment, which can screen each candidate model framework to be tested to screen out a part of candidate model frameworks with lower accuracy of performance parameters predicted by a proxy model, so as to obtain real performance parameters of the candidate model frameworks by deploying a test model, and can directly obtain the performance parameters by a proxy model for the rest candidate model frameworks, and train the proxy model online by an active learning method, thereby improving the efficiency of automatically constructing a deep learning model while ensuring the accuracy of performance evaluation of the candidate model frameworks.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for model construction, a storage medium, and an electronic device.
Background
Neural architecture search (Neural Architecture Search, NAS) is an important component of automatic deep learning (Automatic Deep Learning, autoDL) for automatically designing a network architecture of a deep learning model suitable for processing data input by a user according to the data, and automatically constructing and deploying the deep learning model according to the designed network architecture of the deep learning model, so that automatic processing of the data input by the user can be realized.
When the network architecture of the deep learning model is automatically designed through neural architecture search, performance index parameters of the network architecture of the designed deep learning model in different hardware environments are generally considered to screen an optimal network architecture from the network architectures of the multiple designed deep learning models, and when the performance index parameters of the network architecture of the designed deep learning model in different hardware environments are acquired, a corresponding model is required to be deployed on a hardware platform according to the network architecture of each designed deep learning model, so that the performance index parameters of the network architecture of each designed deep learning model in different hardware environments are acquired, and the time consumption of the whole flow is longer.
Therefore, how to improve the efficiency of automatically constructing the deep learning model is a urgent problem to be solved.
Disclosure of Invention
The present disclosure provides a method and apparatus for model construction, a storage medium, and an electronic device, so as to partially solve the foregoing problems in the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a model construction method, comprising the following steps:
Obtaining a model construction request;
determining each candidate model architecture according to the model construction request;
inputting each candidate model architecture into a preset proxy model, and obtaining performance parameters of the candidate model architecture in a specified hardware environment through the proxy model to serve as first performance parameters;
determining the weight of each candidate model framework according to the first performance parameter, and screening the model framework to be tested from the candidate model frameworks according to the weight;
deploying a test model in the specified hardware environment according to the to-be-tested model framework to obtain a second performance parameter corresponding to the to-be-tested model framework;
screening a target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested and the first performance parameters corresponding to the candidate model architectures except the model architecture to be tested;
and constructing a target model according to the target model framework, and executing tasks through the target model.
Optionally, determining each candidate model architecture according to the model construction request specifically includes:
according to the construction request of the model, screening out each atomic operation used for forming each candidate model framework from preset atomic operations as each target atomic operation corresponding to each candidate model framework, wherein the atomic operation comprises the following steps: at least one of a normal convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation;
And determining each candidate model architecture according to the target atom operation corresponding to each candidate model architecture.
Optionally, according to the building request of the model, each atomic operation for forming each candidate model architecture is screened from preset atomic operations, which specifically includes:
determining each optimization target according to the construction request of the model;
and screening each atomic operation for forming each candidate model framework from preset atomic operations according to each optimization target through a preset optimizer.
Optionally, determining a weight of each candidate model architecture according to the first performance parameter specifically includes:
determining a target screening strategy from preset screening strategies according to the task type of the agent model, wherein the task type comprises the following steps: classifying tasks and returning tasks;
and determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
Optionally, determining a weight of each candidate model architecture according to the target screening policy and the first performance parameter specifically includes:
determining the confidence coefficient of the first performance parameter corresponding to each candidate model framework according to the target screening strategy aiming at the first performance parameter corresponding to each candidate model framework;
And determining the weight of each candidate model framework according to the confidence coefficient, wherein the lower the confidence coefficient of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
Optionally, determining a weight of each candidate model architecture according to the target screening policy and the first performance parameter specifically includes:
determining contribution degree of the first performance parameters corresponding to each candidate model framework according to the target screening strategy;
and determining the weight of each candidate model framework according to the contribution degree, wherein the higher the contribution degree of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
Optionally, for each candidate model architecture, inputting the candidate model architecture into a preset proxy model to obtain a performance parameter of the candidate model architecture in a specified hardware environment, wherein the performance parameter is used as a first performance parameter and specifically comprises:
inputting each candidate model architecture into each pre-trained agent model to obtain performance parameters of the candidate model architecture in each appointed hardware environment as each first performance parameter, wherein each agent model is used for predicting the performance parameters of the candidate model architecture in at least one appointed hardware environment;
Determining a weight of each candidate model architecture according to the first performance parameter, and screening the model architecture to be tested from the candidate model architectures according to the weight, wherein the method specifically comprises the following steps:
determining the weight of each candidate model framework according to the first performance parameters, and screening the model framework to be tested from the candidate model frameworks according to the weight;
deploying a test model in the specified hardware environment according to the to-be-tested model architecture to obtain a second performance parameter corresponding to the to-be-tested model architecture, wherein the method specifically comprises the following steps:
deploying a test model in each appointed hardware environment according to the to-be-tested model framework to acquire a second performance parameter corresponding to the to-be-tested model framework in each hardware environment;
screening a target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested and the first performance parameters corresponding to the candidate model architectures except the model architecture to be tested, wherein the method specifically comprises the following steps:
and screening out a target model framework from the candidate model frameworks according to the second performance parameters corresponding to the model frameworks to be tested and the first performance parameters corresponding to the candidate model frameworks except the model frameworks to be tested.
Optionally, the proxy model includes: the feature extraction layer and each decision layer, wherein different decision layers are used for determining different types of performance parameters of the candidate model architecture in a specified hardware environment;
inputting the candidate model architecture into a preset proxy model to obtain performance parameters of the candidate model architecture in a specified hardware environment, wherein the performance parameters are used as first performance parameters and specifically comprise:
inputting the candidate model architecture into the feature extraction layer of a preset proxy model, and obtaining the feature representation of the candidate model architecture through the feature extraction layer;
inputting the feature representation of the candidate model architecture into each decision layer of the proxy model through the feature extraction layer so as to obtain each sub-first performance parameter of the candidate model architecture in a specified hardware environment through each decision layer;
and obtaining the first performance parameters of the candidate model architecture in the appointed hardware environment according to the first performance parameters of each sub.
Optionally, the method further comprises:
and training the proxy model by minimizing the deviation between the first performance parameter corresponding to the to-be-tested model framework and the second performance parameter corresponding to the to-be-tested model framework output by the proxy model.
The present specification provides a model building apparatus including:
the acquisition module is used for acquiring a model construction request;
the determining module is used for determining each candidate model framework according to the model construction request;
the first evaluation module is used for inputting each candidate model framework into a preset proxy model to obtain the performance parameters of the candidate model framework in a specified hardware environment as first performance parameters;
the screening module is used for determining the weight of each candidate model framework according to the first performance parameter, and screening the model framework to be tested from the candidate model frameworks according to the weight;
the second evaluation module is used for deploying a test model in the specified hardware environment according to the to-be-tested model framework so as to acquire a second performance parameter corresponding to the to-be-tested model framework;
the decision module is used for screening out a target model framework from all candidate model frameworks according to the second performance parameters corresponding to the model framework to be tested and the first performance parameters corresponding to the candidate model frameworks except the model framework to be tested;
and the execution module is used for constructing a target model according to the target model framework and executing tasks through the target model.
Optionally, the determining module is specifically configured to screen, according to a construction request of the model, each atomic operation for forming each candidate model architecture from preset atomic operations, as each target atomic operation corresponding to each candidate model architecture, where the atomic operation includes: at least one of a normal convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation; and determining each candidate model architecture according to the target atom operation corresponding to each candidate model architecture.
Optionally, the determining module is specifically configured to determine each optimization target according to a construction request of the model; and screening each atomic operation for forming each candidate model framework from preset atomic operations according to each optimization target through a preset optimizer.
Optionally, the screening module is specifically configured to determine, according to a task type of the proxy model, a target screening policy from preset screening policies, where the task type includes: classifying tasks and returning tasks; and determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above model building method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above model building method when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
according to the model construction method provided by the specification, firstly, a model construction request is acquired, candidate model frameworks are determined according to the model construction request, the candidate model frameworks are input into a preset proxy model for each candidate model framework, the performance parameters of the candidate model frameworks in a specified hardware environment are obtained through the proxy model and serve as first performance parameters, the weight of each candidate model framework is determined according to the first performance parameters, the model frameworks to be tested are screened out from the candidate model frameworks according to the weight, a test model is deployed in a specified hardware environment according to the model frameworks to be tested, so that second performance parameters corresponding to the model frameworks to be tested are acquired, the target model frameworks are screened out from the candidate model frameworks according to the second performance parameters corresponding to the model frameworks to be tested and the first performance parameters corresponding to other candidate model frameworks except the model frameworks to be tested, the target model is constructed according to the target model frameworks, and tasks are executed through the target model.
According to the method, each candidate model framework to be tested can be screened to screen out a part of candidate model frameworks which are predicted by the proxy model and have low accuracy of performance parameters, so that the actual performance parameters of the candidate model frameworks can be obtained by deploying the test model, and the performance parameters of the part of candidate model frameworks can be obtained by directly using the proxy model for the rest candidate model frameworks, so that the efficiency of automatically constructing the deep learning model can be improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a schematic flow chart of a model building method provided in the present specification;
FIG. 2 is a schematic diagram of a candidate model architecture determination process provided in the present specification;
FIG. 3 is a schematic diagram of the proxy model provided in the present specification;
FIG. 4 is a schematic diagram of a determination process of a model architecture to be tested provided in the present specification;
FIG. 5 is a schematic diagram of a model building apparatus provided herein;
Fig. 6 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a model building method provided in the present specification, including the following steps:
s101: a model build request is obtained.
With the development of deep learning technology, the neural architecture search technology has been paid a lot of attention, and users can automatically construct a deep learning model according to data provided by users based on the neural architecture search technology after providing required data for an automatic deep learning platform, and perform corresponding task execution (e.g. processing the data provided by users, fitting data distribution of the data provided by users, etc.) based on the constructed deep learning model and the data provided by users. However, at present, since the time required for the process of automatically constructing the deep learning model is long, the development of the automatic deep learning technique is limited.
Based on this, in this specification, when a user needs to execute a target task, the user may generate a model construction request according to data required for executing the target task through the deep learning model, and send the generated model construction request to the service platform. After receiving the model construction request sent by the user, the service platform can automatically construct the deep learning model based on the received model construction request. For example: when the user needs to recommend the commodity, the data required by constructing the recommendation model can be sent to the service platform, so that the service platform automatically constructs the recommendation model based on the data provided by the user.
In the present specification, the execution body for implementing the model building method may refer to a designated device such as a server provided on a service platform, or may refer to a designated device such as a desktop computer, a notebook computer, a mobile phone, etc., and for convenience of description, the model building method provided in the present specification will be described below by taking the server as an example of the execution body.
S102: and determining each candidate model architecture according to the model construction request.
After receiving the model construction request sent by the user, the server can determine candidate architecture types from the predefined architecture types according to the received model construction request, and screen out all the atomic operations matched with the candidate architecture types from preset atomic operations to serve as all the candidate atomic operations according to the determined candidate architecture types.
The candidate architecture types may include: convolutional neural network (Convolutional Neural Network, CNN) model architecture, transducer model architecture, etc.
The atomic operations described above include: conventional convolution operation, split convolution operation, average pooling operation, maximum pooling operation, and the like.
And further, each atomic operation for forming each candidate model architecture can be determined from each candidate atomic operation through a preset optimizer and used as each target atomic operation corresponding to each candidate model architecture, as shown in fig. 2.
Fig. 2 is a schematic diagram of a determination process of a candidate model architecture provided in the present specification.
As can be seen in connection with fig. 2, the server may determine a candidate architecture type from the predefined architecture types according to the received model building request, and screen, according to the determined candidate architecture type, each atomic operation matching with the candidate architecture type from preset atomic operations as each candidate atomic operation, so as to use each candidate atomic operation and an operation parameter included in the candidate architecture type as a search space (which may be understood as a search range) of the neural architecture search.
The operating parameters included in the candidate architecture type may be, for example: step size, kernel size, etc. in convolutional neural network model architecture. For another example: hidden layer Hidden size in the model architecture of the transducer, head number of the multi-Head attention mechanism, etc.
It should be noted that, when the server builds the deep learning model, not only the atomic operations included in the deep learning model, but also the operation parameters of the deep learning model need to be configured, so when the server determines the search space, the server may determine the search space for searching the neural architecture according to the candidate atomic operations and the operation parameters included in the candidate architecture type.
Further, after determining the search space, the server may determine each optimization target according to the building request of the model, and run a preset multi-target optimization algorithm through a preset optimizer to sample from preset atomic operations according to each optimization target, so as to screen each atomic operation used for forming each candidate model architecture (in other words, each atomic operation used for forming each candidate model architecture may be searched out in the search space searched by the multi-target optimization algorithm, where the optimization target refers to an index used for evaluating performance of each candidate model architecture, for example: accuracy, precision, recall, performance parameters in a given hardware environment (such as latency, throughput, amount of access, cache overhead, etc.), and the like.
In the above, the multi-objective optimization algorithm may employ techniques such as: multi-objective simulated annealing algorithms, multi-objective reinforcement learning algorithms, multi-objective evolutionary algorithms, and the like.
It should be noted that, the process that the server samples through the multi-objective optimization algorithm to determine the objective model architecture may be understood as a path optimizing process, for example: the server can determine an atomic operation from each atomic operation as a starting point, further can reselect an atomic operation and the atomic operation combination as the starting point from each atomic operation to serve as a candidate model framework, further can evaluate the candidate model framework, judges whether the atomic operation combination is the optimal atomic operation combination according to an evaluation result, and if so, can reselect an atomic operation from each atomic operation again to be added into the atomic operation combination to obtain a new candidate model framework, evaluate the new candidate model framework, and finally obtain the optimal target model framework by analogy.
S103: inputting each candidate model architecture into a preset proxy model, and obtaining the performance parameters of the candidate model architecture in a specified hardware environment through the proxy model to serve as first performance parameters.
From the above, it can be seen that in the process of determining the target model architecture, the server needs to determine the performance parameter of each candidate model architecture in the specified hardware environment, and determine, according to the determined performance parameter, the optimal candidate model architecture from the candidate model architectures, as the target model architecture.
Specifically, for each candidate model architecture, the server may input the candidate model architecture into a preset proxy model, so as to obtain, through the proxy model, a performance parameter of the candidate model architecture in a specific hardware environment, where the proxy model may be: a multi-layer perceptron (Multilayer Perceptron, MLP) model, a Long Short-Term Memory (LSTM) model, a cyclic gate unit (Gated Recurrent Unit GRU) model, a gradient boost decision tree (Gradient Boosting Decision Tree, GBDT) model, and the like.
It should be noted that, the proxy model may be a multitasking proxy model, and the server may determine, by using the multitasking proxy model, different types of performance parameters of the candidate model architecture in the specified hardware environment, where the different types of performance parameters include: the calculation amount, the number of times of reading and writing of each level of cache, the occupancy rate of each level of cache, the throughput rate, the push delay and the like are specifically shown in fig. 3.
Fig. 3 is a schematic structural diagram of the proxy model provided in the present specification.
As can be seen from fig. 3, the proxy model may comprise a feature extraction layer and decision layers, and the server may input the candidate model architecture into the feature extraction layer of the preset proxy model, so as to obtain a feature representation of the candidate model architecture through the feature extraction layer.
The feature representation of the candidate model architecture can be input into each decision layer of the proxy model through the feature extraction layer, so that each sub-first performance parameter of the candidate model architecture in the appointed hardware environment can be obtained through each decision layer, and the first performance parameter of the candidate model architecture in the appointed hardware environment can be obtained according to each sub-first performance parameter, wherein each sub-first performance parameter is a first performance parameter of different types, for example: the above operand is a sub-first performance parameter.
S104: and determining the weight of each candidate model framework according to the first performance parameter, and screening the model framework to be tested from the candidate model frameworks according to the weight.
Further, the server may determine a weight of each candidate model architecture according to the first performance parameter, and screen the model architecture to be tested from the candidate model architectures according to the weight, as shown in fig. 4.
Fig. 4 is a schematic diagram of a determination process of a model architecture to be tested provided in the present specification.
As can be seen from fig. 4, in the actual application scenario, the proxy model may be a classification model or a regression model, based on which, the server may determine, according to the task type of the proxy model, a target screening policy from preset screening policies, where the task type includes: and the classification task and the regression task are used for determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
If the task type of the proxy model is a classification task, the confidence level of the first performance parameter corresponding to each candidate model architecture may be determined according to the target screening policy, and the weight of each candidate model architecture is determined according to the confidence level, where the lower the confidence level of the first performance parameter corresponding to the candidate model architecture is, the higher the weight of the candidate model architecture is.
If the task type of the proxy model is a regression task, the contribution degree of the first performance parameter corresponding to each candidate model architecture can be determined according to the target screening strategy for the first performance parameter corresponding to each candidate model architecture, and the weight of each candidate model architecture is determined according to the contribution degree, wherein the higher the contribution degree of the first performance parameter corresponding to the candidate model architecture is, the higher the weight of the candidate model architecture is.
The contribution degree of the first performance parameter corresponding to the candidate model architecture can adopt an expected model change maximization (Expected Model Change Maximization, EMCM) algorithm, the candidate model architecture is predicted to serve as a model architecture to be tested for each candidate model architecture, the parameter change degree of the proxy model is trained on the basis of the model architecture to be tested, and then the contribution degree of the first performance parameter corresponding to the candidate model architecture can be determined according to the predicted parameter change degree, wherein the larger the predicted parameter change degree is, the larger the contribution degree of the first performance parameter corresponding to the candidate model architecture is determined.
S105: and deploying a test model in the specified hardware environment according to the to-be-tested model framework to obtain a second performance parameter corresponding to the to-be-tested model framework.
According to the above, the server can screen out the candidate model architecture which is predicted by the proxy model and has lower accuracy rate or has larger lifting of the proxy model as the model architecture to be tested, and then can deploy the test model in a specified hardware environment according to the model architecture to be tested so as to obtain the second performance parameter corresponding to the model architecture to be tested.
Further, after obtaining the second performance parameters corresponding to the model architecture to be tested, the server can not only screen the target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested through the optimizer, but also train the proxy model according to the second performance parameters corresponding to the model architecture to be tested.
Specifically, the server may train the proxy model by minimizing a deviation between a first performance parameter corresponding to the to-be-tested model architecture and a second performance parameter corresponding to the to-be-tested model architecture output by the proxy model.
From the above, it can be seen that the training samples for training the proxy model by the server are selected from the candidate model architectures, and the candidate model architectures are determined by the optimizer based on a plurality of optimization targets in a similar manner to path optimization, so that a certain generalization error exists in the proxy model trained by using the model architectures to be tested as the training samples.
Therefore, the server can also sample from the search space through random sampling or network search sampling and other methods to obtain each supplementary model architecture, and further can obtain the first performance parameters of the supplementary model architecture through the proxy model, and can construct a supplementary model in a specified hardware environment according to the supplementary model architecture to obtain the second performance parameters corresponding to the supplementary model architecture.
Further, the server may train the proxy model with a minimized deviation between the first performance parameter corresponding to the supplemental model architecture and the second performance parameter corresponding to the supplemental model architecture output by the proxy model.
S106: and screening out a target model framework from the candidate model frameworks according to the second performance parameters corresponding to the model framework to be tested and the first performance parameters corresponding to the candidate model frameworks except the model framework to be tested.
S107: and constructing a target model according to the target model framework, and executing tasks through the target model.
Further, the server may screen out a target model architecture from the candidate model architectures according to the determined second performance parameters corresponding to the model architecture to be tested and the determined first performance parameters corresponding to the candidate model architectures except the model architecture to be tested, construct a target model according to the target model architecture, and execute the task through the target model.
Specifically, the server may determine, as the target model architecture, an optimal candidate model architecture or a candidate model architecture with superior pareto from among the candidate model architectures according to the determined second performance parameters corresponding to the model architecture to be tested and the determined first performance parameters corresponding to the candidate model architectures other than the model architecture to be tested.
It should be noted that, the proxy model may be used to determine the first performance parameter of each candidate model architecture in a specific hardware environment, and in an actual application scenario, it may be necessary to screen out the target model architecture according to the first performance parameter of each candidate model architecture in multiple specific hardware environments.
Based on the method, the server can pretrain the proxy model through methods of transfer learning, less sample learning Few-shot learning, zero-sample learning and the like, so that the proxy model can be used for constructing first performance parameters under various specified hardware environments according to each candidate model.
In addition, a plurality of proxy models may be deployed in the server, wherein each proxy model is configured to predict performance parameters of the candidate model architecture in at least one specified hardware environment.
Specifically, for each candidate model architecture, the server may input the candidate model architecture into each pre-trained proxy model to obtain performance parameters of the candidate model architecture in each specific hardware environment, as each first performance parameter, determine a weight of each candidate model architecture according to each first performance parameter, screen out a model architecture to be tested from each candidate model architecture according to the weight, deploy a test model in each specific hardware environment according to the model architecture to be tested, so as to obtain corresponding second performance parameters of the model architecture to be tested in each hardware environment, and screen out a target model architecture from each candidate model architecture according to each second performance parameter corresponding to the model architecture to be tested and each first performance parameter corresponding to other candidate model architectures except the model architecture to be tested.
It should be noted that, the above proxy model may be deployed to the server after pre-training, and when the server responds to the received model building request to perform model building, the first performance parameters of each candidate model architecture in the specified hardware environment may be predicted online, so that the problem that the predicted first performance parameters are inaccurate due to the architecture type change, the hardware environment change, and the target space (i.e., each optimization target) change in offline prediction may be avoided.
And the model framework to be tested can be screened out in real time in the prediction process, and the proxy model is trained based on the actual performance parameter of the model framework to be tested, namely the second performance parameter, so that the accuracy of the first performance parameter predicted by the proxy model can be gradually improved.
From the above, it can be seen that each candidate model architecture to be tested can be screened to screen out a part of candidate model architectures with lower accuracy of performance parameters predicted by the proxy model, so as to obtain real performance parameters of the candidate model architecture by deploying the test model, and for the rest candidate model architectures, the performance parameters of the part of candidate model architectures can be directly obtained by the proxy model, thereby improving the efficiency of automatically constructing the deep learning model and reducing the cost of obtaining the performance parameters of the candidate model architecture.
The above method for model construction and image registration provided for one or more embodiments of the present disclosure further provides a corresponding device for model construction based on the same concept, as shown in fig. 5.
Fig. 5 is a schematic diagram of a model building apparatus provided in the present specification, including:
an obtaining module 501, configured to obtain a model building request;
a determining module 502, configured to determine each candidate model architecture according to the model building request;
a first evaluation module 503, configured to input, for each candidate model architecture, the candidate model architecture into a preset proxy model, and obtain, by using the proxy model, a performance parameter of the candidate model architecture in a specified hardware environment as a first performance parameter;
the screening module 504 is configured to determine a weight of each candidate model architecture according to the first performance parameter, and screen a model architecture to be tested from the candidate model architectures according to the weight;
a second evaluation module 505, configured to deploy a test model in the specified hardware environment according to the to-be-tested model architecture, so as to obtain a second performance parameter corresponding to the to-be-tested model architecture;
the decision module 506 is configured to screen a target model architecture from the candidate model architectures according to the second performance parameter corresponding to the model architecture to be tested and the first performance parameter corresponding to the candidate model architectures except for the model architecture to be tested;
And the execution module 507 is used for constructing a target model according to the target model architecture and executing tasks through the target model.
Optionally, the determining module 502 is specifically configured to screen, according to the model building request, each atomic operation for forming each candidate model architecture from preset atomic operations, as each target atomic operation corresponding to each candidate model architecture, where the atomic operation includes: at least one of a normal convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation; and determining each candidate model architecture according to the target atom operation corresponding to each candidate model architecture.
Optionally, the determining module 502 is specifically configured to determine each optimization objective according to a construction request of the model; and screening each atomic operation for forming each candidate model framework from preset atomic operations according to each optimization target through a preset optimizer.
Optionally, the filtering module 504 is specifically configured to determine, according to a task type of the proxy model, a target filtering policy from preset filtering policies, where the task type includes: classifying tasks and returning tasks; and determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
Optionally, the filtering module 504 is specifically configured to determine, for each first performance parameter corresponding to each candidate model architecture, a confidence level of the first performance parameter corresponding to the candidate model architecture according to the target filtering policy; and determining the weight of each candidate model framework according to the confidence coefficient, wherein the lower the confidence coefficient of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
Optionally, the filtering module 504 is specifically configured to determine, for each first performance parameter corresponding to each candidate model architecture, a contribution degree of the first performance parameter corresponding to the candidate model architecture according to the target filtering policy; and determining the weight of each candidate model framework according to the contribution degree, wherein the higher the contribution degree of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
Optionally, the first evaluation module 503 is specifically configured to, for each candidate model architecture, input the candidate model architecture into each pre-trained proxy model to obtain, as each first performance parameter, a performance parameter of the candidate model architecture in each specific hardware environment, where each proxy model is configured to predict the performance parameter of the candidate model architecture in at least one specific hardware environment;
The screening module 504 is specifically configured to determine a weight of each candidate model architecture according to the first performance parameters, and screen the model architecture to be tested from the candidate model architectures according to the weight;
the second evaluation module 505 is specifically configured to deploy a test model in each specific hardware environment according to the to-be-tested model architecture, so as to obtain a second performance parameter corresponding to the to-be-tested model architecture in each hardware environment;
the decision module 506 is specifically configured to screen a target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested and the first performance parameters corresponding to the candidate model architectures except the model architecture to be tested.
Optionally, the proxy model includes: the feature extraction layer and each decision layer, wherein different decision layers are used for determining different types of performance parameters of the candidate model architecture in a specified hardware environment;
the first evaluation module 503 is specifically configured to input the candidate model architecture into the feature extraction layer of a preset proxy model, and obtain, through the feature extraction layer, a feature representation of the candidate model architecture; inputting the feature representation of the candidate model architecture into each decision layer of the proxy model through the feature extraction layer so as to obtain each sub-first performance parameter of the candidate model architecture in a specified hardware environment through each decision layer; and obtaining the first performance parameters of the candidate model architecture in the appointed hardware environment according to the first performance parameters of each sub.
Optionally, the apparatus further comprises: a training module 508;
the training module 508 is specifically configured to train the proxy model by minimizing a deviation between a first performance parameter corresponding to the to-be-tested model architecture and a second performance parameter corresponding to the to-be-tested model architecture output by the proxy model.
The present specification also provides a computer readable storage medium having stored thereon a computer program operable to perform a method of one of the methods provided in fig. 1 above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 6. At the hardware level, as shown in fig. 6, the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile storage, and may of course include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method described above with respect to fig. 1.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.
Claims (14)
1. A method of modeling, comprising:
obtaining a model construction request;
determining each candidate model architecture according to the model construction request;
inputting each candidate model architecture into a preset proxy model to obtain performance parameters of the candidate model architecture in a specified hardware environment as first performance parameters, wherein the proxy model comprises: the method comprises a feature extraction layer and decision layers, wherein different decision layers are used for determining different types of performance parameters of a candidate model framework in a specified hardware environment, inputting the candidate model framework into the feature extraction layer of a preset proxy model, obtaining a feature representation of the candidate model framework through the feature extraction layer, inputting the feature representation of the candidate model framework into each decision layer of the proxy model through the feature extraction layer, so as to obtain each sub-first performance parameter of the candidate model framework in the specified hardware environment through each decision layer, and obtaining a first performance parameter of the candidate model framework in the specified hardware environment according to each sub-first performance parameter;
determining the weight of each candidate model framework according to the first performance parameter, and screening the model framework to be tested from the candidate model frameworks according to the weight;
Deploying a test model in the specified hardware environment according to the to-be-tested model framework to obtain a second performance parameter corresponding to the to-be-tested model framework;
screening a target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested and the first performance parameters corresponding to the candidate model architectures except the model architecture to be tested;
and constructing a target model according to the target model framework, and executing tasks through the target model.
2. The method of claim 1, wherein determining candidate model architectures from the model build request comprises:
according to the construction request of the model, screening out each atomic operation used for forming each candidate model framework from preset atomic operations as each target atomic operation corresponding to each candidate model framework, wherein the atomic operation comprises the following steps: at least one of a normal convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation;
and determining each candidate model architecture according to the target atom operation corresponding to each candidate model architecture.
3. The method according to claim 2, wherein the step of screening out the atomic operations used for forming each candidate model structure from the preset atomic operations according to the model construction request comprises:
determining each optimization target according to the construction request of the model;
and screening each atomic operation for forming each candidate model framework from preset atomic operations according to each optimization target through a preset optimizer.
4. The method of claim 1, wherein determining weights for each candidate model architecture based on the first performance parameter, comprises:
determining a target screening strategy from preset screening strategies according to the task type of the agent model, wherein the task type comprises the following steps: classifying tasks and returning tasks;
and determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
5. The method of claim 4, wherein determining the weight of each candidate model architecture according to the target screening policy and the first performance parameter, comprises:
determining the confidence coefficient of the first performance parameter corresponding to each candidate model framework according to the target screening strategy aiming at the first performance parameter corresponding to each candidate model framework;
And determining the weight of each candidate model framework according to the confidence coefficient, wherein the lower the confidence coefficient of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
6. The method of claim 4, wherein determining the weight of each candidate model architecture according to the target screening policy and the first performance parameter, comprises:
determining contribution degree of the first performance parameters corresponding to each candidate model framework according to the target screening strategy;
and determining the weight of each candidate model framework according to the contribution degree, wherein the higher the contribution degree of the first performance parameter corresponding to the candidate model framework is, the higher the weight of the candidate model framework is.
7. The method of claim 1, wherein for each candidate model architecture, inputting the candidate model architecture into a preset proxy model to obtain a performance parameter of the candidate model architecture in a specified hardware environment, wherein the performance parameter is specifically:
inputting each candidate model architecture into each pre-trained agent model to obtain performance parameters of the candidate model architecture in each appointed hardware environment as each first performance parameter, wherein each agent model is used for predicting the performance parameters of the candidate model architecture in at least one appointed hardware environment;
Determining a weight of each candidate model architecture according to the first performance parameter, and screening the model architecture to be tested from the candidate model architectures according to the weight, wherein the method specifically comprises the following steps:
determining the weight of each candidate model framework according to the first performance parameters, and screening the model framework to be tested from the candidate model frameworks according to the weight;
deploying a test model in the specified hardware environment according to the to-be-tested model architecture to obtain a second performance parameter corresponding to the to-be-tested model architecture, wherein the method specifically comprises the following steps:
deploying a test model in each appointed hardware environment according to the to-be-tested model framework to acquire a second performance parameter corresponding to the to-be-tested model framework in each hardware environment;
screening a target model architecture from the candidate model architectures according to the second performance parameters corresponding to the model architecture to be tested and the first performance parameters corresponding to the candidate model architectures except the model architecture to be tested, wherein the method specifically comprises the following steps:
and screening out a target model framework from the candidate model frameworks according to the second performance parameters corresponding to the model frameworks to be tested and the first performance parameters corresponding to the candidate model frameworks except the model frameworks to be tested.
8. The method of claim 1, wherein the method further comprises:
and training the proxy model by minimizing the deviation between the first performance parameter corresponding to the to-be-tested model framework and the second performance parameter corresponding to the to-be-tested model framework output by the proxy model.
9. A model building apparatus, comprising:
the acquisition module is used for acquiring a model construction request;
the determining module is used for determining each candidate model framework according to the model construction request;
the first evaluation module is configured to input, for each candidate model architecture, the candidate model architecture into a preset proxy model, and obtain a performance parameter of the candidate model architecture in a specified hardware environment as a first performance parameter, where the proxy model includes: the method comprises a feature extraction layer and decision layers, wherein different decision layers are used for determining different types of performance parameters of a candidate model framework in a specified hardware environment, inputting the candidate model framework into the feature extraction layer of a preset proxy model, obtaining a feature representation of the candidate model framework through the feature extraction layer, inputting the feature representation of the candidate model framework into each decision layer of the proxy model through the feature extraction layer, so as to obtain each sub-first performance parameter of the candidate model framework in the specified hardware environment through each decision layer, and obtaining a first performance parameter of the candidate model framework in the specified hardware environment according to each sub-first performance parameter;
The screening module is used for determining the weight of each candidate model framework according to the first performance parameter, and screening the model framework to be tested from the candidate model frameworks according to the weight;
the second evaluation module is used for deploying a test model in the specified hardware environment according to the to-be-tested model framework so as to acquire a second performance parameter corresponding to the to-be-tested model framework;
the decision module is used for screening out a target model framework from all candidate model frameworks according to the second performance parameters corresponding to the model framework to be tested and the first performance parameters corresponding to the candidate model frameworks except the model framework to be tested;
and the execution module is used for constructing a target model according to the target model framework and executing tasks through the target model.
10. The apparatus of claim 9, wherein the determining module is specifically configured to screen, according to the model building request, each atomic operation used to form each candidate model architecture from preset atomic operations as each target atomic operation corresponding to each candidate model architecture, where the atomic operation includes: at least one of a normal convolution operation, a split convolution operation, an average pooling operation, a maximum pooling operation; and determining each candidate model architecture according to the target atom operation corresponding to each candidate model architecture.
11. The apparatus of claim 10, wherein the determination module is specifically configured to determine each optimization objective based on a request for building the model; and screening each atomic operation for forming each candidate model framework from preset atomic operations according to each optimization target through a preset optimizer.
12. The apparatus of claim 9, wherein the filtering module is specifically configured to determine a target filtering policy from preset filtering policies according to a task type of the proxy model, the task type includes: classifying tasks and returning tasks; and determining the weight of each candidate model framework according to the target screening strategy and the first performance parameter.
13. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-8.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-8 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310543696.1A CN116502679B (en) | 2023-05-15 | 2023-05-15 | Model construction method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310543696.1A CN116502679B (en) | 2023-05-15 | 2023-05-15 | Model construction method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116502679A CN116502679A (en) | 2023-07-28 |
CN116502679B true CN116502679B (en) | 2023-09-05 |
Family
ID=87324682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310543696.1A Active CN116502679B (en) | 2023-05-15 | 2023-05-15 | Model construction method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116502679B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116821193B (en) * | 2023-08-30 | 2024-01-09 | 之江实验室 | Reasoning query optimization method and device based on proxy model approximation processing |
CN117215728B (en) * | 2023-11-06 | 2024-03-15 | 之江实验室 | Agent model-based simulation method and device and electronic equipment |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814966A (en) * | 2020-08-24 | 2020-10-23 | 国网浙江省电力有限公司 | Neural network architecture searching method, neural network application method, device and storage medium |
CN113657465A (en) * | 2021-07-29 | 2021-11-16 | 北京百度网讯科技有限公司 | Pre-training model generation method and device, electronic equipment and storage medium |
CN114118403A (en) * | 2021-10-19 | 2022-03-01 | 上海瑾盛通信科技有限公司 | Neural network architecture searching method, device, storage medium and electronic equipment |
CN114882311A (en) * | 2022-04-22 | 2022-08-09 | 北京三快在线科技有限公司 | Training set generation method and device |
WO2022271858A1 (en) * | 2021-06-25 | 2022-12-29 | Cognitiv Corp. | Multi-task attention based recurrent neural networks for efficient representation learning |
CN115563584A (en) * | 2022-11-29 | 2023-01-03 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN115618964A (en) * | 2022-10-26 | 2023-01-17 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN115620706A (en) * | 2022-11-07 | 2023-01-17 | 之江实验室 | Model training method, device, equipment and storage medium |
CN116011510A (en) * | 2021-10-19 | 2023-04-25 | 英特尔公司 | Framework for optimizing machine learning architecture |
CN116049761A (en) * | 2022-12-30 | 2023-05-02 | 支付宝(杭州)信息技术有限公司 | Data processing method, device and equipment |
CN116108912A (en) * | 2023-02-21 | 2023-05-12 | 北京航空航天大学 | Heuristic neural network architecture searching method |
CN116108384A (en) * | 2022-12-26 | 2023-05-12 | 南京信息工程大学 | Neural network architecture searching method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11556778B2 (en) * | 2018-12-07 | 2023-01-17 | Microsoft Technology Licensing, Llc | Automated generation of machine learning models |
-
2023
- 2023-05-15 CN CN202310543696.1A patent/CN116502679B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814966A (en) * | 2020-08-24 | 2020-10-23 | 国网浙江省电力有限公司 | Neural network architecture searching method, neural network application method, device and storage medium |
WO2022271858A1 (en) * | 2021-06-25 | 2022-12-29 | Cognitiv Corp. | Multi-task attention based recurrent neural networks for efficient representation learning |
CN113657465A (en) * | 2021-07-29 | 2021-11-16 | 北京百度网讯科技有限公司 | Pre-training model generation method and device, electronic equipment and storage medium |
CN114118403A (en) * | 2021-10-19 | 2022-03-01 | 上海瑾盛通信科技有限公司 | Neural network architecture searching method, device, storage medium and electronic equipment |
CN116011510A (en) * | 2021-10-19 | 2023-04-25 | 英特尔公司 | Framework for optimizing machine learning architecture |
CN114882311A (en) * | 2022-04-22 | 2022-08-09 | 北京三快在线科技有限公司 | Training set generation method and device |
CN115618964A (en) * | 2022-10-26 | 2023-01-17 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN115620706A (en) * | 2022-11-07 | 2023-01-17 | 之江实验室 | Model training method, device, equipment and storage medium |
CN115563584A (en) * | 2022-11-29 | 2023-01-03 | 支付宝(杭州)信息技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN116108384A (en) * | 2022-12-26 | 2023-05-12 | 南京信息工程大学 | Neural network architecture searching method and device, electronic equipment and storage medium |
CN116049761A (en) * | 2022-12-30 | 2023-05-02 | 支付宝(杭州)信息技术有限公司 | Data processing method, device and equipment |
CN116108912A (en) * | 2023-02-21 | 2023-05-12 | 北京航空航天大学 | Heuristic neural network architecture searching method |
Non-Patent Citations (1)
Title |
---|
Reconfigurable intelligent surfaces for wireless communications: Overview of hardware designs, channel models, and estimation techniques;Mengnan Jian 等;《Intelligent and Converged Networks》;第1-32页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116502679A (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116502679B (en) | Model construction method and device, storage medium and electronic equipment | |
CN116663618B (en) | Operator optimization method and device, storage medium and electronic equipment | |
CN116304720B (en) | Cost model training method and device, storage medium and electronic equipment | |
CN114936085A (en) | ETL scheduling method and device based on deep learning algorithm | |
CN116860259B (en) | Method, device and equipment for model training and automatic optimization of compiler | |
CN116402108A (en) | Model training and graph data processing method, device, medium and equipment | |
CN110516915B (en) | Service node training and evaluating method and device and electronic equipment | |
CN117194992B (en) | Model training and task execution method and device, storage medium and equipment | |
CN116578877B (en) | Method and device for model training and risk identification of secondary optimization marking | |
CN115827918B (en) | Method and device for executing service, storage medium and electronic equipment | |
CN117953258A (en) | Training method of object classification model, object classification method and device | |
CN116524998A (en) | Model training method and molecular property information prediction method and device | |
CN114912513A (en) | Model training method, information identification method and device | |
CN114120273A (en) | Model training method and device | |
CN117666971B (en) | Industrial data storage method, device and equipment | |
CN115862675B (en) | Emotion recognition method, device, equipment and storage medium | |
CN117075918B (en) | Model deployment method and device, storage medium and electronic equipment | |
CN116434787B (en) | Voice emotion recognition method and device, storage medium and electronic equipment | |
CN116991388B (en) | Graph optimization sequence generation method and device of deep learning compiler | |
CN117933707A (en) | Wind control model interpretation method and device, storage medium and electronic equipment | |
CN116431888A (en) | Model training method, information recommending method and device | |
CN116484002A (en) | Paper classification method and device, storage medium and electronic equipment | |
CN113344186A (en) | Neural network architecture searching method and image classification method and device | |
CN116109008A (en) | Method and device for executing service, storage medium and electronic equipment | |
CN117034926A (en) | Model training method and device for multi-field text classification model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |