CN113407820B

CN113407820B - Method for processing data by using model, related system and storage medium

Info

Publication number: CN113407820B
Application number: CN202110597246.1A
Authority: CN
Inventors: 黎彧君; 黄译旻; 李震国
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-05-29
Filing date: 2021-05-29
Publication date: 2023-09-15
Anticipated expiration: 2041-05-29
Also published as: CN113407820A

Abstract

The embodiment of the application provides a model training method, a related system and a storage medium. Which is applied to the aspect of artificial intelligence, the method comprises the following steps: according to M models P _i‑2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i‑2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining reference superparameters according to M models P _i‑1 Determining M models P by reference to model performance scores and hyper-parameters of (2) _i‑1 Super-parameters of each model in' in the ith training phase; in the ith training phase, according to M models P _i‑1 Super parameters of each model in' T is performed on each model _i Training process for several times to obtain M models P _i The method comprises the steps of carrying out a first treatment on the surface of the When the ith training phase is the last training phase, the model P is selected from M models _i And determining the target model. The scheme can improve the calculation efficiency of the model.

Description

Method for processing data by using model, related system and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model training method, a related system and a storage medium.

Background

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, man-machine interaction, recommendation and search, AI-based theory, and the like.

Neural networks have been successfully applied in a number of fields including computer vision, machine translation, speech recognition, and the like. Successful training of a neural network often requires the adjustment of appropriate hyper-parameters. Wherein, the super parameter is a parameter set before the deep neural network starts to be trained, is not a network weight parameter of the neural network, and is used for controlling the parameter in the training process. Hyper-parameters are not directly involved in the training process, they are just configuration variables.

The proper super-parameters have a certain influence on the performance of the neural network obtained by training. Therefore, how to automate the superparameter selection process is also a commercially valuable technique.

At present, a slot machine Based algorithm (PB 2) Based on Population splits a training process into several training phases. Each training phase comprises a plurality of training processes. For each training process, the model traverses through the sample set. And when the super parameters are selected, PB2 performs super parameter selection according to the model performance estimation obtained by the first training process in the next training stage. However, since it can only predict the model performance obtained by performing the first training process in the next training phase, the prediction target is too limited, and it cannot predict the model performance obtained by the last training process in the next training phase of the neural network. Because the model performance obtained by the last training process of each training stage is more concerned when proper super parameters are selected, the method has little guiding significance for obtaining the neural network with better performance.

Disclosure of Invention

The application discloses a model training method, a related system and a storage medium, which can help to improve model calculation efficiency.

In a first aspect, an embodiment of the present application provides a model training method, including: according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M modelsP type _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2; according to M models P _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Obtained by processing, the M models P _i-1 ' with the M models P _i-1 One-to-one correspondence of the M models P _i-1 And the M models P _i-2 ' one-to-one correspondence; in the ith training phase, according to the M models P _i-1 The hyper-parameters of each model in' T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 A model performance score obtained by each training process of each model in the ith training stage; when the ith training stage is the last training stage, the M models P are based _i From the M models P _i And determining the target model.

According to the embodiment of the application, the super-parameters of each model in the M models in the ith training stage are determined according to the model performance score obtained by each training process of each model in the M models in the ith-1 training stage, the super-parameters of each model in the ith-1 training stage and the training process times of the ith training stage, and the M models are trained based on the super-parameters of the M models, so that the target model is obtained. By adopting the means, the super-parameter determination process is more comprehensive and more accurate by combining and considering the training flow times of the ith training stage when the super-parameter of the ith training stage is determined, thereby being beneficial to obtaining the neural network with better performance and improving the model calculation efficiency.

As an alternative implementation, the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining the reference hyper-parameters in the ith training phase comprises:

according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the 'i-1 training stage' are used for obtaining a model performance estimation function; according to the training process times T of the ith training stage _i The M models P _i-2 And (3) at least one model in the' model performance score obtained by at least one training process in the (i-1) th training stage, and processing the model performance estimation function to obtain the reference super-parameters in the (i) th training stage.

The model performance scores of the prediction interval delta t training processes can be realized by learning the corresponding information of each model in the ith-1 training stage so as to obtain a model performance estimation function. By adopting the method, the model calculation efficiency is improved.

Wherein, the training process times T according to the ith training stage _i The M models P _i-2 Model performance scores obtained by at least one training process of at least one model in the ith training stage and at least one training process of the ith training stage are obtained by processing the model performance estimation function, and the reference super parameters in the ith training stage can be obtained according to M models P _i-2 The model performance score obtained by the last training process of the' one model in the (i-1) th training stage is used for obtaining the reference super-parameters.

It may also be according to M models P _i-2 The model performance scores obtained by the last training process of a plurality of models in the (i-1) th training stage are used for obtaining the reference super-parameters.

It may also be according to M models P _i-2 The model performance scores obtained by the training process of a plurality of models in the' i-1 th training stage are obtained by any one time of training process, so that the reference super-parameters are obtained.

The reference super parameter may be one or more, which is not specifically limited in this scheme.

As an alternative implementation, when i is not less than 3, the model P is described according to M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps:

according to M models P _i-3 Model performance scores obtained by last training process of each model in the ith-2 training phases and M models P _i-3 Super-parameters of each model in' M models P in the i-2 th training stage _i-2 Model performance scores obtained by each model in' each training process in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, wherein the M models P _i-2 ' is according to the M models P _i-3 ' processed, the M models P _i-2 ' with the M models P _i-3 'one-to-one correspondence'.

According to the embodiment of the application, the model performance score obtained by the last training process in the previous training stage, the super parameter in the previous training stage, the model performance score obtained by each training process in the current training stage and the super parameter in the current training stage are learned, so that the model performance estimation of the cross-stage can be further carried out, and the accuracy of the model performance estimation is improved.

As another alternative implementation, the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 ' each model in the first partThe super parameters in the i-1 training stages are used for obtaining a model performance estimation function, and the method comprises the following steps:

according to M models P ₀ Model performance scores obtained by each training process of each model in the previous i-1 training phases, and M models P ₀ Super-parameters of each model in each training phase in the previous i-1 training phases are used for obtaining model performance estimation functions, and M models P are obtained ₀ Is an initial model.

According to the embodiment of the application, the reference super parameters in the ith training stage are obtained by learning the performance scores and the super parameters of each model in the i-1 training stages before learning. By adopting the method, the accuracy of model performance estimation is improved by learning a large amount of data, so that the reliability of selecting super parameters is improved, and the efficiency of obtaining a model with better performance is improved.

The embodiment is described taking the training data of the previous i-1 training phases as an example, and the reference super-parameters may be obtained by learning the training data of any plurality of training phases, for example, at least 3 training phases, or from the 3 rd training phase to the i-1 th training phase, etc., which is not particularly limited in this scheme.

As an alternative implementation, the method further includes: processing the model performance estimation functions according to N length ranges to obtain N processed model performance estimation functions, wherein N is an integer not less than 2; the training process times T according to the ith training stage _i The M models P _i-2 At least one model in' at least one model performance score obtained by training process in the ith training stage-1, processing the model performance estimation function to obtain a reference super-parameter in the ith training stage, including: according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in the' model performance score obtained by at least one training process in the i-1 training stage is used for respectively processing the N processed model performance estimation functions to obtain N initial hyper-parameters; processing the N initial super parameters to obtainAnd (3) the reference super-parameters in the ith training stage.

By adopting the method, the stability of the model can be improved by searching the proper super parameters through a plurality of length ranges.

As an alternative implementation, the model P is based on M models _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 The super-parameters of each model in' in the ith training phase include: according to the M models P _i-1 Obtaining K first models and M-K second models, wherein the K first models are the M models P _i-1 The model performance scores of the models are smaller than a first preset threshold, the M-K second models are models with the model performance scores not smaller than the first preset threshold, K is an integer not smaller than 1, and K is smaller than M; updating the parameters of each model in the K first models according to the parameters of the models with the model performance scores larger than a second preset threshold value to obtain K updated first models, wherein the second preset threshold value is not smaller than the first preset threshold value; determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' comprising the K updated first models and the M-K second models, wherein the hyper-parameters of each model in the M-K second models in the ith training stage are the same as the hyper-parameters of the model in the ith-1 training stage.

The embodiment of the application is based on M models P _i-1 Model updating is carried out to obtain M models P used for training in the next training stage _i-1 '. The model is periodically updated, parameters of the model with poor performance score are updated to parameters of the model with high performance score, and training in the next training stage is performed based on the model with good performance, so that the efficiency of obtaining the model with good performance is improved.

The embodiment of the application takes the reference super-parameter in the ith training stage as the super-parameter of each model in the K updated first models in the ith training stage as an example for explanation, and the reference super-parameter can be one.

The reference super-parameter may also be a plurality. For example, one of the reference superparameters is determined to be the superparameter of the model with the worst model performance score, the other reference superparameter is determined to be the superparameter of the model with the worse model performance score, etc. It may also be in other forms, and the present solution is not particularly limited thereto.

As an alternative implementation, the method further includes: according to the training process times T of each training phase in the previous i-1 training phases _j Acquiring the data volume of the previous i-1 training phases, wherein j is a positive integer, such as j is 1, 2, 3 and the like, T _j Is a positive integer; and confirming that the data quantity of the previous i-1 training phases does not exceed a preset value.

Whether excessive training data are collected in the model training process is judged in real time, so that initialization processing is carried out when the data size is excessive. By adopting the means, the calculation efficiency can be effectively improved.

When the data quantity of the previous i-1 training phases exceeds a preset value, as an optional implementation manner, initializing the super parameters.

As another alternative implementation manner, when the data amount of the first i-1 training phases exceeds a preset value, a part of data may be cleared. For example, only training data from the 5 th training phase to the i-1 th training phase may be stored. It may also be other processes, etc.

The target model provided by the scheme is applied to an image processing system or a recommendation system.

The image processing system may be used for image recognition, instance segmentation, object detection, etc. The recommendation system can be used for making commodity recommendation, entertainment recommendation such as movies and music, and the like, such as recommendation based on click rate prediction.

Wherein the super-parameters include at least one of: learning rate, lot size, discard rate, weight decay factor, momentum factor.

In a second aspect, the present application provides a model training apparatus comprising: a first determining module for determining M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2; a second determining module for determining according to M models P _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Obtained by processing, the M models P _i-1 ' with the M models P _i-1 One-to-one correspondence of the M models P _i-1 And the M models P _i-2 ' one-to-one correspondence; a model training module for, in the ith training stage, according to the M models P _i-1 The hyper-parameters of each model in' T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 A model performance score obtained by each training process of each model in the ith training stage; a model determining module for determining the model P based on the M models when the ith training stage is the last training stage _i From the M models P _i And determining the target model.

As an optional implementation manner, the first determining module is configured to: according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the 'i-1 training stage' are used for obtaining a model performance estimation function; according to the training process times T of the ith training stage _i The M models P _i-2 And (3) at least one model in the' model performance score obtained by at least one training process in the (i-1) th training stage, and processing the model performance estimation function to obtain the reference super-parameters in the (i) th training stage.

The model performance scores of the training processes at the prediction interval delta t can be realized by learning the corresponding information of each model to obtain a model performance estimation function. By adopting the method, the model calculation efficiency is improved.

As an alternative implementation manner, when i is not less than 3, the first determining module is further configured to: according to M models P _i-3 Model performance scores obtained by last training process of each model in the ith-2 training phases and M models P _i-3 Super-parameters of each model in' M models P in the i-2 th training stage _i-2 Model performance scores obtained by each model in' each training process in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, wherein the M models P _i-2 ' is according to the M models P _i-3 ' processed, the M models P _i-2 ' with the M models P _i-3 'one-to-one correspondence'.

As another optional implementation manner, the first determining module is further configured to: according to M models P ₀ Model performance scores obtained by each training process of each model in the previous i-1 training phases, and M models P ₀ Super-parameters of each model in each training phase in the previous i-1 training phases are used for obtaining model performance estimation functions, and M models P are obtained ₀ Is an initial model.

As an alternative implementation manner, the apparatus further includes a processing module, configured to: processing the model performance estimation functions according to N length ranges to obtain N processed model performance estimation functions, wherein N is an integer not less than 2; the first determining module is further configured to: according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in the' model performance score obtained by at least one training process in the i-1 training stage is used for respectively processing the N processed model performance estimation functions to obtain N initial hyper-parameters; and processing the N initial super parameters to obtain the reference super parameters in the ith training stage.

As an optional implementation manner, the second determining module is configured to: according to the M models P _i-1 Obtaining K first models and M-K second models, wherein the K first models are the M models P _i-1 The model performance scores of the models are smaller than a first preset threshold, the M-K second models are models with the model performance scores not smaller than the first preset threshold, K is an integer not smaller than 1, and K is smaller than M; updating the parameters of each model in the K first models according to the parameters of the models with the model performance scores larger than a second preset threshold value to obtain K updated first models, wherein the second preset threshold value is not smaller than the first preset threshold value; determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' comprising the K updated first models and the M-K second models, wherein the hyper-parameters of each model in the M-K second models in the ith training stage are the same as the hyper-parameters of the model in the ith-1 training stage.

As an alternative implementation manner, the apparatus further includes a confirmation module, configured to: according to the training process times T of each training phase in the previous i-1 training phases _j Acquiring the data quantity of the previous i-1 training phases; and confirming that the data quantity of the previous i-1 training phases does not exceed a preset value.

The object model is applied to an image processing system or a recommendation system.

The super-parameters include at least one of: learning rate, lot size, discard rate, weight decay factor, momentum factor.

In a third aspect, the present application provides a model training apparatus comprising a processor and a memory; wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method.

In a fourth aspect, the application provides a computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a method as provided by any one of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the application provide a computer program product for, when run on a computer, causing the computer to perform a method as provided by any one of the possible implementations of the first aspect.

In a sixth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a data interface, where the processor reads an instruction stored on a memory through the data interface, and performs a method provided by any one of possible implementation manners of the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform a method provided in any one of possible implementation manners of the first aspect.

It will be appreciated that the apparatus of the second aspect, the apparatus of the third aspect, the computer storage medium of the fourth aspect, the computer program product of the fifth aspect or the chip of the sixth aspect provided above are all adapted to perform the method provided in any of the first aspects.

Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

Drawings

The drawings to which embodiments of the present application are applied are described below.

FIG. 1 is a schematic diagram of an artificial intelligence main body framework according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an application environment according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a neural network processor according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a model training architecture according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a learning method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another learning method provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a model training method according to an embodiment of the present application;

FIG. 11 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a model training device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of another model training apparatus according to an embodiment of the present application.

Detailed Description

The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

FIG. 1 illustrates a schematic diagram of an artificial intelligence framework that describes the overall workflow of an artificial intelligence system, applicable to general artificial intelligence field requirements.

The above-described artificial intelligence topic framework is described below in terms of two dimensions, the "Intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).

The "intelligent information chain" reflects a series of processes from acquisition of data to processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process.

The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure, information (provisioning and processing technology implementation) of artificial intelligence to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (central processing unit (CentralProcessingUnit, CPU), an embedded neural network processor NPU, a graphics processor (Graphics Processing Unit, GPU), an application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA), or other hardware accelerator chip); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

Referring to fig. 2, an embodiment of the present invention provides a system architecture 200. The data acquisition device 260 is used to acquire, for example, image data and store it in the database 230, and the training device 220 generates the target model/rule 201 based on the image data maintained in the database 230. How the training device 220 obtains the target model/rule 201 based on the image data, the target model/rule 201 can be applied to an image processing system, a recommendation system, or the like, will be described in more detail below.

Wherein the training device 220 is based on M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2; according to M models P _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Obtained by processing, the M models P _i-1 ' with the M models P _i-1 One-to-one correspondence of the M models P _i-1 And the M models P _i-2 ' one-to-one correspondence; in the ith training phase, according to the M models P _i-1 The hyper-parameters of each model in' T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 A model performance score obtained by each training process of each model in the ith training stage; when the ith training stage is the last training stage, the M models P are based _i From the M models P _i And determining the target model.

Wherein the method comprises the steps of The training device 220 is further configured to: according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the 'i-1 training stage' are used for obtaining a model performance estimation function; according to the training process times T of the ith training stage _i The M models P _i-2 And (3) at least one model in the' model performance score obtained by at least one training process in the (i-1) th training stage, and processing the model performance estimation function to obtain the reference super-parameters in the (i) th training stage.

As an alternative implementation, when i is not less than 3, the model P is described according to M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps: according to M models P _i-3 Model performance scores obtained by last training process of each model in the ith-2 training phases and M models P _i-3 Super-parameters of each model in' M models P in the i-2 th training stage _i-2 Model performance scores obtained by each model in' each training process in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, wherein the M models P _i-2 ' is according to the M models P _i-3 ' processed, the M models P _i-2 ' with the M models P _i-3 'one-to-one correspondence'.

As another alternative implementation, the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps: according to M models P ₀ Each model of (2) each time in the first i-1 training phasesModel performance scores obtained by training process, M models P ₀ Super-parameters of each model in each training phase in the previous i-1 training phases are used for obtaining model performance estimation functions, and M models P are obtained ₀ Is an initial model.

As an alternative implementation, the method further includes: processing the model performance estimation functions according to N length ranges to obtain N processed model performance estimation functions, wherein N is an integer not less than 2; the training process times T according to the ith training stage _i The M models P _i-2 At least one model in' at least one model performance score obtained by training process in the ith training stage-1, processing the model performance estimation function to obtain a reference super-parameter in the ith training stage, including: according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in the' model performance score obtained by at least one training process in the i-1 training stage is used for respectively processing the N processed model performance estimation functions to obtain N initial hyper-parameters; and processing the N initial super parameters to obtain the reference super parameters in the ith training stage.

As an alternative implementation, the model P is based on M models _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 The super-parameters of each model in' in the ith training phase include: according to the M models P _i-1 Obtaining K first models and M-K second models, wherein the K first models are the M models P _i-1 The model performance scores of the models are smaller than a first preset threshold, the M-K second models are models with the model performance scores not smaller than the first preset threshold, K is an integer not smaller than 1, and K is smaller than M; updating the parameters of each model in the K first models according to the parameters of the model with the model performance score larger than a second preset threshold value to obtain K updated first models, wherein the second preset threshold value is not smaller than The first preset threshold value; determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' comprising the K updated first models and the M-K second models, wherein the hyper-parameters of each model in the M-K second models in the ith training stage are the same as the hyper-parameters of the model in the ith-1 training stage.

Wherein the training device 220 is further configured to:

according to the training process times T of each training phase in the previous i-1 training phases _j Acquiring the data quantity of the previous i-1 training phases; and confirming that the data quantity of the previous i-1 training phases does not exceed a preset value.

The operation of each layer in the deep neural network can be described by the mathematical expression y=a (Wgx +b): the work of each layer in a physical layer deep neural network can be understood as completing the transformation of input space into output space (i.e., row space to column space of the matrix) by five operations on the input space (set of input vectors), including: 1. dimension increasing/decreasing; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Wherein operations 1, 2, 3 are completed by Wgx, operations 4 are completed by +b, and operations 5 are completed by a (). The term "space" is used herein to describe two words because the object being classified is not a single thing, but rather a class of things, space referring to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value of a neuron in the layer neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weights W of each layer control how the space is transformed. The purpose of training the deep neural network is to finally obtain a weight matrix (a weight matrix formed by a plurality of layers of vectors W) of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, and more specifically to learn the weight matrix.

Because the output of the deep neural network is expected to be as close as possible to the truly desired value, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the truly desired target value and then based on the difference between the two (of course, there is typically an initialization process prior to the first update, i.e., pre-configuring parameters for each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower and adjusted continuously until the neural network can predict the truly desired target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

The target model/rules obtained by training device 220 may be applied in different systems or devices. In FIG. 2, the executing device 210 is configured with an I/O interface 212 for data interaction with external devices, and a "user" may input data to the I/O interface 212 through the client device 240.

The execution device 210 may call data, code, etc. in the data storage system 250, or may store data, instructions, etc. in the data storage system 250.

The calculation module 211 performs image recognition processing or makes recommendation on the input data using the target model/rule 201.

The association function 213 is configured to extract characteristics of the received data and perform normalization operation.

The association function 214 is configured to process the result output by the calculation module.

Finally, the I/O interface 212 returns the processing results to the client device 240 for presentation to the user.

Further, the training device 220 may generate corresponding target models/rules 201 for different targets based on different data to provide better results to the user.

In the case shown in FIG. 2, a user may manually specify data in the input execution device 210, e.g., to operate in an interface provided by the I/O interface 212. In another case, the client device 240 may automatically input data to the I/O interface 212 and obtain the result, and if the client device 240 automatically inputs data to obtain authorization of the user, the user may set the corresponding rights in the client device 240. The user may view the results output by the execution device 210 at the client device 240, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 240 may also store the acquired image data in the database 230 as a data acquisition terminal.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210.

The following description will take convolutional neural network as training example:

the convolutional neural network (CNN, convolutional neuron network) is a deep neural network with a convolutional structure, and is a deep learning architecture, wherein the deep learning architecture refers to learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons respond to overlapping regions in an image input thereto.

As shown in fig. 3, convolutional Neural Network (CNN) 100 may include an input layer 110, a convolutional layer/pooling layer 120, where the pooling layer is optional, and a neural network layer 130.

Convolution layer/pooling layer 120:

Convolution layer:

the convolutional/pooling layer 120 as shown in fig. 3 may include layers as examples 121-126, in one implementation, 121 being a convolutional layer, 122 being a pooling layer, 123 being a convolutional layer, 124 being a pooling layer, 125 being a convolutional layer, 126 being a pooling layer; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

Taking the example of the convolution layer 121, the convolution layer 121 may include a plurality of convolution operators, which are also called kernels, and function in image processing as a filter that extracts specific information from an input image matrix, where the convolution operators may be essentially a weight matrix, which is usually predefined, and where the weight matrix is usually processed on the input image in a horizontal direction (or two pixels followed by two pixels … … depending on the value of the step size stride) to perform the task of extracting specific features from the image.

The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same dimension. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image.

Different weight matrices can be used to extract different features in the image, for example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific color of the image, another weight matrix is used to blur … … unnecessary noise points in the image, the dimensions of the weight matrices are the same, the dimensions of feature images extracted by the weight matrices with the same dimensions are the same, and the extracted feature images with the same dimensions are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can extract information from the input image, so that the convolutional neural network 100 is helped to perform correct prediction.

When convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 100 increases, features extracted by the later convolutional layers (e.g., 126) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer:

since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, i.e., layers 121-126 as illustrated at 120 in FIG. 3, which may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The averaging pooling operator may calculate pixel values in the image over a particular range to produce an average value. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling.

In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 130:

after processing by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not yet sufficient to output the required output information. Because, as previously described, the convolution/pooling layer 120 will only extract features and reduce the parameters imposed by the input image. However, in order to generate the final output information (the required class information or other relevant information), convolutional neural network 100 needs to utilize neural network layer 130 to generate the output of the number of classes required for one or a group.

Thus, multiple hidden layers (131, 132 to 13n as shown in fig. 3) and an output layer 140 may be included in the neural network layer 130, and parameters included in the multiple hidden layers may be pre-trained based on relevant training data of a specific task type, e.g., the task type may include image recognition, image classification, image super-resolution reconstruction, etc. … …

After the underlying layers of the neural network layer 130, i.e., the final layer of the overall convolutional neural network 100 is the output layer 140, the output layer 140 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall convolutional neural network 100 (e.g., propagation from 110 to 140 in fig. 3) is completed (e.g., propagation from 140 to 110 in fig. 3) and the backward propagation (e.g., propagation from 140 to 110 in fig. 3) begins to update the weights and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the desired result.

It should be noted that, the convolutional neural network 100 shown in fig. 3 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, a plurality of convolutional layers/pooling layers shown in fig. 4 are parallel, and the features extracted respectively are all input to the full neural network layer 130 for processing.

Referring to fig. 5, a structure diagram of a neural network processor according to an embodiment of the present invention is shown. The NPU 50NPU of the neural network processor is mounted as a coprocessor to a main CPU (Host CPU) which distributes tasks. The NPU has a core part of an arithmetic circuit 503, and a controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) inside. In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 501 and performs matrix operation with the matrix B, and the obtained partial result or the final result of the matrix is stored in the accumulator 508 accumulator.

The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculations of non-convolutional/non-FC layers in a neural network, such as Pooling (Pooling), batch normalization (Batch Normalization), local response normalization (Local Response Normalization), and the like.

In some implementations, the vector computation unit 507 stores the vector of processed outputs to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to an output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used for storing input data and output data.

The memory cell access controller 505 (Direct Memory Access Controller, DMAC) handles input data in the external memory to the input memory 501 and/or the unified memory 506, stores weight data in the external memory into the weight memory 502, and stores data in the unified memory 506 into the external memory.

A bus interface unit (Bus Interface Unit, BIU) 510 for interfacing between the main CPU, DMAC and finger memory 509 via a bus.

An instruction fetch memory (instruction fetch buffer) 509 connected to the controller 504 for storing instructions used by the controller 504;

And a controller 504 for calling the instruction cached in the instruction memory 509 to control the operation of the operation accelerator.

Typically, the unified memory 506, the input memory 501, the weight memory 502, and the finger memory 509 are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, abbreviated as DDR SDRAM), a high bandwidth memory (High Bandwidth Memory, HBM), or other readable and writable memory.

It should be noted that the hyper-parameters in this solution are used to adjust the whole network model training process, such as the number of hidden layers of the neural network, the size, the number of kernel functions, and so on. Hyper-parameters are configuration variables and do not directly participate in the training process.

The hyper-parameters may be any of the following:

1) Optimizer algorithm (optimizer)

Refers to a machine learning algorithm that updates network weights. Such as a random gradient descent algorithm (stochastic gradient descent, SGD).

2) Learning rate

The magnitude of the update parameter per iteration in the optimization algorithm is referred to as the step size. When the step size is too large, the algorithm is not converged, the objective function is in an oscillation state, and when the step size is too small, the convergence speed of the model is too slow.

3) Activation function

Refers to the nonlinear function added on each neuron, that is, the key that the neural network has nonlinear properties, and common activation functions are sigmoid, relu, tanh, and the like.

Loss function: that is, the smaller the loss, the better the objective function in the optimization process, and the training process is the process of minimizing the loss function.

Common loss functions are logarithmic, square, exponential, etc.

The super parameters in the embodiment of the application can also be:

4) Batch size (batch size)

The data size is referred to as the data size at each gradient descent update.

5) Discard rate (drop out rate)

Refers to the randomly masked network weights of the model training process.

6) Weight attenuation coefficient (weight decay coefficient)

Refers to regularized term coefficients of network weights.

7) Momentum coefficient (momentum coefficient)

Which refers to the momentum term coefficients when the gradient descent updates the network weights.

The present solution is only described by taking this as an example, but may be other super parameters, and the present solution is not limited in detail.

Referring to fig. 6, a schematic diagram of a model training architecture according to an embodiment of the present application is shown. The model training is based on training of Z training stages of M models, and finally a target model is obtained. Each of the M models is trained for Z training phases.

Wherein each training phase comprises at least two training processes. A training process here can be understood as a model traversing a sample set once.

As an alternative implementation, the M models are all the same in scale and structure.

The model training method provided by the embodiment of the application is described in detail below.

The execution subject of the embodiment of the present application may be a server or the like.

Referring to fig. 7, a flow chart of a model training method according to an embodiment of the present application is shown. The method comprises steps 701-704, which are specifically as follows:

701. according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2;

wherein M models P _i-2 ' each model performs T in the i-1 th training stage _i-1 Secondary training process. Each training procedure can be understood as a model traversing a sample set once. A model performance score can be obtained by performing a training process on each model.

Specifically, in one training process, model parameters such as weights are updated by using a training data set to obtain a new model, and performance scores of the new model on a verification data set are calculated to obtain model performance scores, wherein the verification data set is a data set for evaluating the performance of the model.

As an alternative implementation, the above-mentioned model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining the reference hyper-parameters in the ith training phase comprises:

according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the 'i-1 training stage' are used for obtaining a model performance estimation function;

according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in' at least one model performance score obtained by training process in the ith-1 training stage, and processing the model performance estimation function to obtain The reference super-parameters in the ith training stage.

By using the M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Machine learning is carried out on the hyper-parameters of each model in the ith-1 training phase, model performance scores of the delta t training processes are obtained through continuously learning model performance scores obtained based on any training process in the ith-1 training phase and corresponding hyper-parameters, and then a model performance estimation function f (y) is obtained _t ,Δt,A)。

Wherein y is _t For the model performance score, Δt is a positive integer, Δt represents y _t The interval between the corresponding training process and the training process corresponding to the predicted model performance score is equal to A, and A is y _t Super parameters of the corresponding training phase.

Reference is made in particular to the learning process diagram shown in fig. 8. Fig. 8 illustrates an example of a training process having 6 training phases i-1. The training process is respectively based on the first training process at intervals of 1 training process, 2 training processes, 3 training processes, 4 training processes and 5 training processes, and simultaneously the second training process is also learned at intervals of 1 training process, 2 training processes, 3 training processes and 4 training processes.

Accordingly, learning is also performed based on the third training process by 1 training process (not shown in the figure), by 2 training processes (not shown in the figure), by 3 training processes (not shown in the figure), and based on the fourth training process by 1 training process (not shown in the figure), by 2 training processes (not shown in the figure), and based on the fifth training process by 1 training process (not shown in the figure).

By learning the corresponding information of each model in the ith-1 training stage, model performance scores of the deltat training processes can be predicted based on different performance scores under different super parameters. By adopting the method, the model calculation efficiency is improved.

The above-mentioned processing of the model performance estimation function may be that extremum solving is performed on the model performance estimation function to obtain a reference super parameter a, which may be referred to as the following manner:

A*＝argmax _A f(y _t ,Δt,A)；

which is expressed in given y _t In the case of Δt, the value of a corresponding to the maximum value of f is the reference hyper-parameter a.

After the model performance estimation function is obtained, the model performance estimation function is obtained by using M models P _i-2 Model performance score obtained by at least one training process of at least one model in the (i-1) th training stage and training process times T of the (i) th training stage _i And carrying out extremum solving on the model performance estimation function to obtain the super parameter when the model performance score obtained in the last training process of the ith training stage is highest.

Wherein the M models P _i-2 ' at least one model, which may be M models P _i-2 In' the model with highest model performance score obtained by the last training process in the i-1 training stage,

alternatively, one may be selected randomly from models whose model performance scores obtained in the last training procedure exceed a preset threshold. It may also be determined based on other selection methods, which are not particularly limited in this aspect.

The above-mentioned super-parameters obtained based on the model performance score obtained by at least one training process when the model performance score obtained by the last training process in the ith training stage is highest may be super-parameters obtained based on the model performance score obtained by one training process when the model performance score obtained by the last training process in the ith training stage is highest, or super-parameters obtained based on the model performance score obtained by multiple training processes when the model performance score obtained by the last training process in the ith training stage is highest. The present embodiment is not particularly limited thereto.

Specifically, based on M models P _i-2 ' each model in the ith-1 training phaseThe model performance score obtained in the last training process is used for obtaining a model with the highest model performance score; further based on the model and the training process times T of the ith training stage _i To obtain the above-mentioned reference super-parameters.

The embodiment is described by taking the model with the highest model performance score as an example, and the model can be selected by any other setting mode, for example, a model can be randomly selected from the models with the model performance scores in the first four rows from high to low, and the scheme is not particularly limited.

The above description uses only the model performance score obtained by the last training process of a model in the i-1 th training stage as an example, and may also be the model performance score obtained by any training process of a model in the i-1 th training stage except the last time to calculate the reference super-parameters, which is not specifically limited in this scheme.

The reference super-parameters can be calculated based on model performance scores obtained by training processes of a plurality of models in the ith-1 training stage for any time, for example, the super-parameters obtained by the plurality of models can be set with different weights, and then the reference super-parameters are obtained.

The above description is given by taking the example of processing the reference super-parameters based on the model performance estimation function, and the reference super-parameters may be obtained by processing the model performance estimation function, which is not particularly limited in this scheme.

It should be noted that, the reference super parameter in the i-th training stage determined in this scheme may be one or more, and this scheme is not limited specifically.

As an alternative implementation, when i is not less than 3, the above-mentioned model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super-parameters of each model in the ith-1 training stage to obtain a model performance estimation function can comprise:

according to M models P _i-3 ' each model in the model is in the i-2 thModel performance scores obtained by the last training process in the training stages and the M models P _i-3 Super-parameters of each model in' M models P in the i-2 th training stage _i-2 Model performance scores obtained by each model in' each training process in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, wherein the M models P _i-2 ' is according to the M models P _i-3 ' processed, the M models P _i-2 ' with the M models P _i-3 'one-to-one correspondence'.

That is, the model performance estimation function is obtained by learning based on the model performance score obtained from the last training process in the previous training phase, the hyper-parameters in the previous training phase, the model performance score obtained from each training process in the current training phase, and the hyper-parameters in the current training phase.

Fig. 9 is a schematic diagram of a learning process according to an embodiment of the present application. Fig. 9 illustrates an example in which the i-2 th training phase has 8 training processes and the i-1 th training phase has 6 training processes. The training is performed based on the last training process of the i-2 th training stage at intervals of 1 training process (i.e. the 1 st training process of the i-1 st training stage), at intervals of 2 training processes (i.e. the 2 nd training process of the i-1 st training stage), at intervals of 3 training processes, at intervals of 4 training processes, at intervals of 5 training processes and at intervals of 6 training processes, and at the same time, the 1 st training process of the i-1 st training stage is also learned at intervals of 1 training process, at intervals of 2 training processes, at intervals of 3 training processes, at intervals of 4 training processes and at intervals of 5 training processes.

Accordingly, the 2 nd training process based on the i-1 st training stage is further separated by 1 training process (not shown in the figure), the 2 nd training process (not shown in the figure), the 3 rd training process (not shown in the figure), the 4 th training process (not shown in the figure), and the 1 rd training process (not shown in the figure), the 2 nd training process (not shown in the figure), the 3 rd training process (not shown in the figure), and so on.

By learning the corresponding information of each model, model performance scores of at intervals Δt training processes can be predicted based on different performance scores under different super parameters.

The embodiment is based on M models P _i-3 The model performance score obtained by the last training process of each model in the ith-2 training stages is used for obtaining a model performance estimation function.

Alternatively, it is also possible to use M models P _i-3 And (3) obtaining a model performance score by each training process of each model in the ith-2 training stages, thereby obtaining a model performance estimation function. The present embodiment is not particularly limited thereto.

That is, the model performance estimation function may be derived based on learning of the first two training phases.

As a further alternative implementation, the above-mentioned model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps:

according to M models P ₀ Model performance scores obtained by each training process of each model in the previous i-1 training phases, and M models P ₀ Super-parameters of each model in each training phase in the previous i-1 training phases are used for obtaining model performance estimation functions, and M models P are obtained ₀ Is an initial model. The initial model, i.e. the model corresponding to i when 2 is taken.

Specifically, by learning each model performance score and each hyper-parameter of the previous i-1 training phases, M models P can be based _i-2 The model performance score obtained by the last training process of each model in the ith training stage and the training process times of the ith training stage are used for obtaining the reference super-parameters in the ith training stage.

The above description is only given in terms of several different implementations, where it may also be based on the performance scores of the models and the hyper-parameters of any number of training phases, so as to obtain the reference hyper-parameters in the ith training phase. The present embodiment is not particularly limited thereto.

It should be noted that, when determining the reference super-parameters, the embodiment of the present application may directly predict the model performance scores of the Δt training flows at intervals directly based on one of the model performance scores. It may also be an intermediate model performance score that predicts the interval Δt' training flows based on one of the model performance scores, and then a final model performance score based on that intermediate model performance score.

For example, predicting the model performance score of the 10 th training process of the next training phase based on the 3 rd model performance score of the current training phase may be predicting the model performance score of the 2 nd training process of the next training phase based on the 3 rd model performance score of the current training phase, and further predicting the model performance score of the 10 th training process based on the model performance score of the 2 nd training process.

That is, the scheme can learn based on multiple intermediate predictions, and further obtain a model performance estimation function. The foregoing description is merely an embodiment, but may be other forms, and the present invention is not limited thereto.

As an alternative implementation, after obtaining the model performance estimation function, the method further includes:

processing the model performance estimation functions according to N length ranges to obtain N processed model performance estimation functions, wherein N is an integer not less than 2;

according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in the' model performance score obtained by at least one training process in the i-1 training stage is used for respectively processing the N processed model performance estimation functions to obtain N initial hyper-parameters;

And processing the N initial super parameters to obtain the reference super parameters in the ith training stage.

Where length scale refers to the length scale term in the gaussian kernel that is used to control the degree of correlation between two quantities. The N length ranges may be, for example, the length ranges of 0.5 to 1.5 divided into N items.

The model performance estimation function is processed according to the N length ranges, and the gaussian process fitting can be performed on the model performance estimation function by using gaussian kernels of different length ranges. Each length range corresponds to a fitted model performance estimation function, and N processed model performance estimation functions are obtained after Gaussian process fitting is carried out on different length ranges.

Then carrying out the extremum solution on each model performance estimation function in the N processed model performance estimation functions to obtain N initial super-parameters; and performing multi-centroid clustering processing on the super parameters, and selecting the super parameters positioned at the center as the reference super parameters so as to adjust the network training of the next training stage.

Of course, the hyper-parameters corresponding to the plurality of centers formed by clustering can be used as the reference hyper-parameters in the ith training stage. That is, the above-mentioned reference superparameter may be one or more, and the present embodiment is not limited in particular.

As an alternative implementation, before step 701, the method further includes:

according to the training process times T of each training phase in the previous i-1 training phases _j Acquiring the data quantity of the previous i-1 training phases;

and confirming that the data quantity of the previous i-1 training phases does not exceed a preset value.

Wherein, the data amount can be expressed as:

the data amount may characterize the sum of the number of training data within each of the first i-1 training phases based on any one of the model performance scores to obtain model performance scores for the spaced Δt training flows.

For example, there are 4 training processes in the 3 rd training stage, and the number of training data corresponding to the training stage is 4*5/2=10. And the data quantity is obtained by superposing the number of training data of each training stage.

If the preset value is not exceeded, step 701 is performed.

Optionally, each time a model performance score, super parameter, etc. is obtained, it is stored.

If the preset value is exceeded, resetting a memory buffer for storing training data, and initializing super parameters. Or only the latest training data are reserved, and training is continued.

Whether excessive training data are collected in the model training process is judged in real time, so that corresponding processing is carried out when the data size is excessive. By adopting the means, the calculation efficiency can be effectively improved.

702. According to M models P _i-1 Model performance score of (c) and theReference superparameter determination M models P in ith training phase _i-1 Super-parameters of each model in' in the ith training phase;

wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Obtained by processing, the M models P _i-1 ' with the M models P _i-1 One-to-one correspondence of the M models P _i-1 And the M models P _i-2 ' one-to-one correspondence;

fig. 10 is a schematic diagram of a model training method according to an embodiment of the present application.

Fig. 10 illustrates a model s among M models as an example. Model P _i-3，s ' T in the i-2 th training phase _i-2 Training the process for the second time to obtain a model P _i-2，s . Wherein the model P _i-2，s Processing to obtain a model P _i-2，s '. Model P _i-2，s ' T in the i-1 th training phase _i-1 Training the process for the second time to obtain a model P _i-1，s . Wherein the model P _i-1，s Processing to obtain a model P _i-1，s '. Model P _i-1，s ' T in the ith training phase _i Training the process for the second time to obtain a model P _i，s 。

That is, the above model P _i-2，s ' and model P _i-1，s Correspondingly, model P _i-1，s And model P _i-1，s 'correspond'.

As an alternative implementation, the above M models P _i-1 ' is according to the M models P _i-1 The processing is performed, and it can be understood that, among them, M models P _i-1 The K models corresponding to the K first models in' are obtained by combining M models P _i-1 The parameters of K first models with lower scores are updated; m models P _i-1 The M-K models corresponding to the M-K second models are obtained by keeping the parameters of the M-K second models with higher scores unchangedThe obtained model with higher score is kept unchanged, and the partial model for the ith training stage can be obtained.

Of course, the processing can be performed based on other modes to obtain M models P _i-1 The present embodiment is not particularly limited thereto.

As an alternative implementation, the above-mentioned model P is based on M models _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters _i-1 The super-parameters of each model in' in the ith training phase include:

According to the M models P _i-1 Obtaining K first models and M-K second models, wherein the K first models are the M models P _i-1 The model performance scores of the models are smaller than a first preset threshold, the M-K second models are models with the model performance scores not smaller than the first preset threshold, K is an integer not smaller than 1, and K is smaller than M;

updating the parameters of each model in the K first models according to the parameters of the models with the model performance scores larger than a second preset threshold value to obtain K updated first models, wherein the second preset threshold value is not smaller than the first preset threshold value;

determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' comprising the K updated first models and the M-K second models, wherein the hyper-parameters of each model in the M-K second models in the ith training stage are the same as the hyper-parameters of the model in the ith-1 training stage.

That is, M models P _i-1 The model performance scores of (c) are ranked and the reference hyper-parameters are then determined as hyper-parameters in the ith training phase of the model in which a portion of the performance scores are worse (e.g., the K first models described above). Specifically, when there is only one reference hyper-parameter, it can be determined as the hyper-parameter of each model with poor performance score. When referring to Where there are multiple hyper-parameters, the hyper-parameters for each model with poor performance scores may be determined based on the ranking of the performance scores. For example, the allocation may be random, sequential, or the like, and the present embodiment is not particularly limited.

Wherein the hyper-parameters of each of the models with better performance scores (e.g., the M-K second models) in the ith training stage are consistent with the hyper-parameters of the model in the ith-1 training stage.

The updating of the parameters of each of the K first models according to the parameters of the model with the performance score greater than the second preset threshold may be determining the parameters of the model with the poor performance score based on the parameters of the model with the better performance score.

For example, the parameters of the model with the poor model performance score are updated to the parameters of the model with the highest model performance score. Alternatively, the parameters of the model with the worst model performance score are updated to the parameters of the model with the highest model performance score, and the parameters of the model with the poor model performance score are updated to the parameters of the model with the high model performance score. The foregoing is only a part of examples, but may be other forms, and the present solution is not limited to this.

Optionally, the second preset threshold may be the same as the first preset threshold. The second preset threshold may also be greater than the first preset threshold.

It should be noted that, in the embodiment of the present application, the parameters of the model include the network weights of the model. It may also include other information, which the present scheme does not limit.

703. In the ith training phase, according to the M models P _i-1 The hyper-parameters of each model in' T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 A model performance score obtained by each training process of each model in the ith training stage;

by determining M models P of the ith training phase _i-1 Super parameters of each model in' and thus for the M models P _i-1 ' T per model _i Secondary training process.

If the ith training phase is not the last training phase, let i=i+1, and repeatedly execute the steps 701-703, i.e. determine the super parameters of the next training phase and the model of the next training phase, so as to perform training.

Until the ith training phase is the last training phase, step 704 is entered.

704. When the ith training stage is the last training stage, the M models P are based _i From the M models P _i And determining the target model.

Specifically, after the last training process of the last training stage is completed, M models P are obtained finally _i The model performance scores of each model are ranked, and the model with the highest model performance score is determined to be the target model.

The embodiment of the application is described by taking the model with the highest model performance score as the target model, and can also be to determine the model with the model performance score exceeding the preset value as the target model. That is, the target model may be one or more, and the present embodiment is not particularly limited.

The following describes an example of the model training method of the present embodiment applied to an image processing scene:

referring to fig. 11, a model training method applied to image recognition is provided in an embodiment of the present application. The training method comprises steps 1101-1108, specifically as follows:

1101. acquiring an image classification sample set;

as an alternative implementation mode, training data are collected on a supervision and learning task, preprocessing operations such as normalization and the like are carried out on the data, the data are sorted into samples, and then an image classification sample set is obtained.

It may also be to obtain a set of image classification samples in other ways, etc.

Wherein the image classification sample set comprises different categories of data, such as table pictures, chair pictures, airplane pictures, etc.

1102. Initializing each model of the M models to obtain M models P ₀ ；

The initialization process may be to set an initial parameter and an initial super parameter. The initial hyper-parameters are hyper-parameters of the model during a first training phase.

The M models may be deep neural network models or the like.

1103. According to the M models P ₀ For the M models P ₀ T1 training processes of the first training stage are carried out to obtain M models P1 and M models P ₀ The model performance score obtained by each training process of each model in the first training stage;

i.e., i=1, the training of the first training phase is performed.

The description will be given taking the super parameter learning rate as an example. The learning rate controls the model parameters, namely the speed of the network weight updating rate. For example, when the learning rate of the ith training stage is smaller than the learning rate of the i-1 th training stage, the network weight of the ith training stage is updated slower.

In each training process of the ith training stage, the model extracts image data of a batch of samples for a plurality of times, calculates a loss value of the batch of image data according to a predefined image classification evaluation function (namely a loss function) each time, calculates a weight updating direction of an image classification model obtained on the batch of image data through an optimizer algorithm, and updates model weights in combination with a learning rate.

Each model traverses the image classification sample set once for each training process.

1104. According to the M models P ₀ The model performance score obtained by each training process of each model in the first training stage, and the M models P ₀ The initial super-parameters of the second training stage and the training process times T2 of the second training stage are obtained, and M models P1 'of the second training stage and the super-parameters of each model in the M models P1' are obtained;

wherein, according to model performance scores obtained by each training process of each model in the first training stage, M models P ₀ The super-parameters of each model in the first training stage are learned, and model performance scores obtained at intervals delta t of training processes are learned and predicted under the effect of the super-parameters of any model based on the specific model performance score of the model, so that a model performance estimation function can be obtained.

Then, based on a specific model performance score (e.g., M models P are obtained ₀ The highest model performance score in the model performance scores obtained by the last training process of each model in the first training stage, the super-parameters of the model corresponding to the model performance scores in the first training stage, and a specific delta t (such as the training process times of the second training stage)And T2) carrying out extremum solving on the model performance estimation function to obtain the corresponding reference super-parameters when the model performance score obtained by the last training process in the second training stage is highest.

Based on the above reference super parameters and M models P ₀ The model performance score obtained by the last training process of each model in the first training stage is obtained, and M models P1 'of the second training stage and the super parameters of each model in the M models P1' are obtained.

Wherein the M models P1' of the second training stage are based on the M models P ₁ Obtained. Specifically, M models P ₁ Updating parameters of the model with lower medium performance score to M models P ₁ Parameters of the model with higher performance scores.

Furthermore, both the parameters and the superparameter of the model with higher performance scores remain unchanged. The super-parameters of the model with lower performance scores are obtained according to the model performance estimation function.

The specific processing procedure may refer to the description in step 702, and will not be described herein.

1105. Training each model in the M models P1' in a second training stage according to the hyper-parameters of each model in the M models P1' to obtain M models P2, and obtaining model performance scores of each model in the M models P1' in each training process in the second training stage;

i.e., i=2, the training of the second training phase is performed.

1106. Determining whether the ith training phase is the last training phase;

determining whether the second training phase is the last training phase at this time, if not, executing step 1107; if yes, go to step 1108.

1107. Let i=i+1, repeatedly perform steps 1104-1106, and perform training of the next training phase until the last training phase is reached.

1108. And determining a target model according to the model performance score obtained by the last training process of each model in the M models Pi-1' in the ith training stage.

The target model may be the highest model among the models obtained in the last training procedure in the ith training phase.

It should be noted that, for M models P of the first training stage ₀ Setting P ₀ And P ₀ ' same.

The above specific implementation process may refer to the description of the foregoing embodiment, and will not be repeated herein.

Based on the model training method, a target model for image recognition can be obtained.

The following describes the application of the scheme to a recommendation system:

the embodiment of the application provides a model training method. The training method is applied to a recommendation system and comprises the following steps C1-C8, wherein the training method comprises the following steps of:

c1, acquiring a recommended data sample set;

the recommended data sample set includes at least one of a user gender, a user age, a user's historical purchase behavior.

C2, initializing each model in the M models to obtain M models P ₀ ；

C3, according to the M models P ₀ For the M models P ₀ T1 training processes of the first training stage are carried out to obtain M models P1 and M models P ₀ The model performance score obtained by each training process of each model in the first training stage;

the description will be given taking the case where the super parameter is a weight attenuation coefficient. Wherein the weight decay coefficients control regularized term coefficients of the network weights. For example, when the weight attenuation coefficient of the ith training stage is larger than that of the ith training stage-1, the ith training stage has a stricter weight magnitude penalty, so that the weight of the ith training stage is smaller, and the generalization is better.

In each training process of the ith training stage, the model extracts user data of a batch of samples for a plurality of times, calculates a loss value of the batch of user data according to a predefined commodity recommendation evaluation function (namely a loss function) each time, and updates the weight of the recommendation model through an optimizer algorithm.

C4, according to the M models P ₀ The model performance score obtained by each training process of each model in the first training stage, and the M models P ₀ The initial super-parameters of the second training stage and the training process times T2 of the second training stage are obtained, and M models P1 'of the second training stage and the super-parameters of each model in the M models P1' are obtained;

c5, training each model in the M models P1' in a second training stage according to the hyper-parameters of each model in the M models P1' to obtain M models P2, and obtaining model performance scores of each training process of each model in the M models P1' in the second training stage;

c6, determining whether the ith training stage is the last training stage;

C7, let i=i+1, repeatedly execute steps 1104-1106, and perform training of the next training stage until the last training stage is reached.

And C8, determining a target model according to the model performance score obtained by the last training process of each model in the M models Pi-1' in the ith training stage.

Based on the model training method, a target model for commodity recommendation and the like can be obtained.

Referring to fig. 12, a model training apparatus provided in an embodiment of the present application includes a first determining module 1201, a second determining module 1202, a model training module 1203, and a model determining module 1204, which are specifically as follows:

a first determining module 1201 for determining M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2;

a second determining module 1202 for determining M models P _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Obtained by processing, the M models P _i-1 ' with the M models P _i-1 One-to-one correspondence of the M models P _i-1 And the M models P _i-2 ' one-to-one correspondence;

model training module 1203 configured to, in the ith training stage, perform a training according to the M models P _i-1 The hyper-parameters of each model in' T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 A model performance score obtained by each training process of each model in the ith training stage;

a model determining module 1204 for, when the ith training stage is the last training stage, based on the M models P _i From the M models P _i And determining the target model.

As an optional implementation manner, the first determining module 1201 is configured to:

According to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in' in the (i-1) th training stage, to obtainTo a model performance estimation function; according to the training process times T of the ith training stage _i The M models P _i-2 And (3) at least one model in the' model performance score obtained by at least one training process in the (i-1) th training stage, and processing the model performance estimation function to obtain the reference super-parameters in the (i) th training stage.

As another alternative implementation manner, when i is not less than 3, the first determining module 1201 is further configured to:

As yet another alternative implementation manner, the first determining module 1201 is further configured to:

The apparatus further comprises a processing module for:

the first determining module 1201 is further configured to:

according to the ith trainingNumber of training flows T of stage _i The M models P _i-2 At least one model in the' model performance score obtained by at least one training process in the i-1 training stage is used for respectively processing the N processed model performance estimation functions to obtain N initial hyper-parameters; and processing the N initial super parameters to obtain the reference super parameters in the ith training stage.

As another alternative implementation manner, the second determining module 1202 is configured to:

according to the M models P _i-1 Obtaining K first models and M-K second models, wherein the K first models are the M models P _i-1 The model performance scores of the models are smaller than a first preset threshold, the M-K second models are models with the model performance scores not smaller than the first preset threshold, K is an integer not smaller than 1, and K is smaller than M; updating the parameters of each model in the K first models according to the parameters of the models with the model performance scores larger than a second preset threshold value to obtain K updated first models, wherein the second preset threshold value is not smaller than the first preset threshold value; determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' comprising the K updated first models and the M-K second models, wherein the hyper-parameters of each model in the M-K second models in the ith training stage are the same as the hyper-parameters of the model in the ith-1 training stage.

As an alternative implementation manner, the apparatus further includes a confirmation module, configured to:

Wherein the object model can be applied to an image processing system, a recommendation system, or the like.

It should be noted that, the first determining module 1201, the second determining module 1202, the model training module 1203 and the model determining module 1204 shown in fig. 12 are configured to perform the relevant steps of the model training method.

For example, the first determining module 1201 is used for executing the relevant content of step 701, the second determining module 1202 is used for executing the relevant content of step 702, the model training module 1203 is used for executing the relevant content of step 703, and the model determining module 1204 is used for executing the relevant content of step 704.

In this embodiment, the model training means is presented in the form of modules. "module" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the described functionality. Further, the above first determination module 1201, second determination module 1202, model training module 1203, and model determination module 1204 may be implemented by the processor 1302 of the model training apparatus shown in fig. 13.

Fig. 13 is a schematic hardware structure of another model training apparatus according to an embodiment of the present application. The model training apparatus 1300 shown in fig. 13 (the apparatus 1300 may be a computer device in particular) includes a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304. The memory 1301, the processor 1302, and the communication interface 1303 implement communication connection therebetween through the bus 1304.

The Memory 1301 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM).

The memory 1301 may store a program which, when executed by the processor 1302, the processor 1302 and the communication interface 1303 are adapted to perform the steps of the model training method of an embodiment of the present application.

The processor 1302 may employ a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to perform the functions required by the elements of the model training apparatus of an embodiment of the present application or to perform the model training method of an embodiment of the present application.

The processor 1302 may also be an integrated circuit chip with signal processing capabilities. In implementation, the various steps of the model training method of the present application may be accomplished by instructions in the form of integrated logic circuits or software in hardware in the processor 1302. The processor 1302 described above may also be a general purpose processor, a digital signal processor (Digital Signal Processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1301, and the processor 1302 reads the information in the memory 1301, and in combination with its hardware, performs the functions that the units included in the model training apparatus of the embodiment of the present application need to perform, or performs the model training method of the embodiment of the present application.

The communication interface 1303 enables communication between the apparatus 1300 and other devices or communication networks using a transceiver apparatus such as, but not limited to, a transceiver. For example, data may be acquired through the communication interface 1303.

Bus 1304 may include a path to transfer information between various components of device 1300 (e.g., memory 1301, processor 1302, communication interface 1303).

It should be noted that although the apparatus 1300 shown in fig. 13 only shows a memory, a processor, and a communication interface, those skilled in the art will appreciate that in a particular implementation, the apparatus 1300 also includes other devices necessary to achieve proper operation. Also, as will be appreciated by those of skill in the art, the apparatus 1300 may also include hardware devices that implement other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 1300 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in FIG. 13.

The embodiment of the application also provides a chip system, which is applied to the electronic equipment; the system-on-chip includes one or more interface circuits, and one or more processors; the interface circuit and the processor are interconnected through a circuit; the interface circuit is configured to receive a signal from a memory of the electronic device and to send the signal to the processor, the signal including computer instructions stored in the memory; the electronic device performs the method when the processor executes the computer instructions.

The embodiment of the application also provides a model training device, which comprises a processor and a memory; wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the model training method.

Embodiments of the present application also provide a computer-readable storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform one or more steps of any of the methods described above.

Embodiments of the present application also provide a computer program product comprising instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

It should be understood that in the description of the present application, "/" means that the associated objects are in a "or" relationship, unless otherwise specified, for example, a/B may represent a or B; wherein A, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the division of the unit is merely a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a read-only memory (ROM), or a random-access memory (random access memory, RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a digital versatile disk (digital versatile disc, DVD), or a semiconductor medium, such as a Solid State Disk (SSD), or the like.

The foregoing is merely a specific implementation of the embodiment of the present application, but the protection scope of the embodiment of the present application is not limited to this, and any changes or substitutions within the technical scope disclosed in the embodiment of the present application should be covered in the protection scope of the embodiment of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for data processing using a model, comprising:

according to M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2;

according to M models P _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Processing to obtain the product;

in the ith training phase, according to the M models P _i-1 The hyper-parameters of each model in' in the ith training stage T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 Model performance scores obtained by each model in' every training process in the ith training stage, in every training process in the ith training stage, the M models P _i-1 ' each model is traversed onceSample data of an image classification sample set or a recommendation data sample set, a loss value of the sample data is calculated according to an evaluation function, and the loss value and the M models P are used for calculating the model data _i-1 The hyper-parameters of each model in' in the ith training stage update the M models P _i-1 Weights for each model in';

when the ith training stage is the last training stage, the M models P are based _i From the M models P _i Determining a target model, wherein the target model is an image classification model or a recommendation model;

when the target model is an image classification model, inputting image classification data into the image classification model to obtain at least one of an image recognition result, an instance segmentation result and an object detection result; or when the target model is a recommendation model, inputting recommendation data into the recommendation model to obtain at least one of recommended commodities, recommended movies and recommended music.

2. The method according to claim 1, wherein the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining the reference hyper-parameters in the ith training phase comprises:

according to the training process times T of the ith training stage _i The M models P _i-2 At least one model in' model performance score obtained by at least one training process in the ith-1 training stage, and processing the model performance estimation function to obtainAnd (3) the reference super-parameters in the ith training stage.

3. The method according to claim 2, wherein when i is not less than 3, the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps:

according to M models P _i-3 Model performance scores obtained by last training process of each model in the ith-2 training phases and M models P _i-3 Super-parameters of each model in' M models P in the i-2 th training stage _i-2 Model performance scores obtained by each model in' each training process in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, wherein the M models P _i-2 ' is according to the M models P _i-3 ' processed.

4. The method according to claim 2, wherein the model P is based on M models _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 The super parameters of each model in the' i-1 th training stage are used for obtaining a model performance estimation function, and the method comprises the following steps:

5. The method according to any one of claims 2 to 4, further comprising:

the training process times T according to the ith training stage _i The M models P _i-2 At least one model in' at least one model performance score obtained by training process in the ith training stage-1, processing the model performance estimation function to obtain a reference super-parameter in the ith training stage, including:

6. The method according to any one of claims 1 to 4, wherein the model P is based on M models _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 The super-parameters of each model in' in the ith training phase include:

7. The method according to any one of claims 1 to 4, further comprising:

according to the training process times T of each training phase in the previous i-1 training phases _j Acquiring the data quantity of the previous i-1 training phases, wherein j is a positive integer;

8. The method of claim 1, wherein the object model is applied to an image processing system, or a recommendation system.

9. The method of claim 1, wherein the super-parameters comprise at least one of: learning rate, lot size, discard rate, weight decay factor, momentum factor.

10. An apparatus for data processing using a model, comprising:

a first determining module for determining M models P _i-2 Model performance scores obtained by each training process of each model in the ith-1 training stage and M models P _i-2 Super-parameters of each model in the ith training stage and training process times T of the ith training stage _i Determining a reference superparameter in the ith training stage, wherein M, i, T _i Are integers not less than 2;

A second determining module for determining according to M modesP type _i-1 Determining M models P by the model performance scores of (2) and the reference hyper-parameters in the ith training stage _i-1 Super-parameters of each model in' in the ith training phase; wherein the M models P _i-1 For the M models P _i-2 ' last training procedure in the i-1 th training phase, the M models P _i-1 ' is according to the M models P _i-1 Processing to obtain the product;

a model training module for, in the ith training stage, according to the M models P _i-1 The hyper-parameters of each model in' in the ith training stage T the each model _i Training process for several times to obtain M models P _i And obtain the M models P _i-1 Model performance scores obtained by each model in' every training process in the ith training stage, in every training process in the ith training stage, the M models P _i-1 Each model traverses sample data of a sample set of image classification or recommendation data, calculates a loss value of the sample data according to an evaluation function, and according to the loss value and the M models P _i-1 The hyper-parameters of each model in' in the ith training stage update the M models P _i-1 Weights for each model in';

a model determining module for determining the model P based on the M models when the ith training stage is the last training stage _i From the M models P _i Determining a target model, wherein the target model is an image classification model or a recommendation model; when the target model is an image classification model, inputting image classification data into the image classification model to obtain at least one of an image recognition result, an instance segmentation result and an object detection result; or when the target model is a recommendation model, inputting recommendation data into the recommendation model to obtain at least one of recommended commodities, recommended movies and recommended music.

11. The apparatus of claim 10, wherein the first determining module is configured to:

according to the training process times T of the ith training stage _i The M models P _i-2 And (3) at least one model in the' model performance score obtained by at least one training process in the (i-1) th training stage, and processing the model performance estimation function to obtain the reference super-parameters in the (i) th training stage.

12. The apparatus of claim 11, wherein when i is not less than 3, the first determining module is further configured to:

13. The apparatus of claim 11, wherein the first determining module is further configured to:

14. The apparatus according to any one of claims 11 to 13, further comprising a processing module for:

the first determining module is further configured to:

15. The apparatus according to any one of claims 10 to 13, wherein the second determining module is configured to:

determining the reference hyper-parameters in the ith training stage as the hyper-parameters of each model in the K updated first models in the ith training stage; wherein the M models P _i-1 ' include the saidThe super parameters of each model in the M-K second models in the ith training stage are the same as the super parameters of the model in the i-1 th training stage.

16. The apparatus according to any one of claims 10 to 13, further comprising a confirmation module for:

17. The apparatus of claim 10, wherein the object model is applied to an image processing system, or a recommendation system.

18. The apparatus of claim 10, wherein the super-parameters comprise at least one of:

learning rate, lot size, discard rate, weight decay factor, momentum factor.

19. An apparatus for data processing using a model, comprising a processor and a memory; wherein the memory is for storing program code and the processor is for invoking the program code to perform the method of any of claims 1 to 9.

20. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any one of claims 1 to 9.