WO2023174064A1 - Automatic search method, automatic-search performance prediction model training method and apparatus - Google Patents

Automatic search method, automatic-search performance prediction model training method and apparatus Download PDF

Info

Publication number
WO2023174064A1
WO2023174064A1 PCT/CN2023/079287 CN2023079287W WO2023174064A1 WO 2023174064 A1 WO2023174064 A1 WO 2023174064A1 CN 2023079287 W CN2023079287 W CN 2023079287W WO 2023174064 A1 WO2023174064 A1 WO 2023174064A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
data
prediction model
training
performance prediction
Prior art date
Application number
PCT/CN2023/079287
Other languages
French (fr)
Chinese (zh)
Inventor
辜弘炀
陈醒濠
张世枫
李建民
朱军
Original Assignee
华为技术有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 清华大学 filed Critical 华为技术有限公司
Publication of WO2023174064A1 publication Critical patent/WO2023174064A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the field of artificial intelligence, and more specifically, to an automatic search method, automatic search performance prediction model training method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • deep neural network is one of the representative algorithms of deep learning. It is a feedforward neural network with a deep structure. It is used in face recognition. , pedestrian re-identification and other computer vision fields have achieved remarkable results.
  • the performance of the model in computer vision is usually improved through a hand-designed deep neural network architecture, or the performance of the model in computer vision is improved based on a hand-designed loss function. Whether it is based on a hand-designed loss function or a hand-designed deep neural network architecture, it often requires more expert knowledge and takes a lot of time.
  • This application provides an automatic search method, automatic search performance prediction model training method and device, which can improve search efficiency, search for more data, and perform better search results obtained through this method.
  • an automatic search method includes: obtaining at least two candidate data, and at least two candidate data are data to be evaluated for agent tasks; inputting at least two candidate data into a target performance prediction model , obtain prediction indicators corresponding to at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set, and the loss function of the performance prediction model includes a differentiable ranking loss function Number L K and regression loss function, the first training data set includes sample data and evaluation scores corresponding to the sample data; perform agent task evaluation on some candidate data in at least two candidate data according to the prediction indicators corresponding to at least two candidate data .
  • part of the data evaluated by the agent task is added to the population data set.
  • the population data set includes sample data and evaluation scores corresponding to the sample data.
  • agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
  • the type of candidate data can be a loss function, a neural network architecture, a hyperparameter, etc., and this is not limited in the embodiments of the present application.
  • the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the regression loss function that only includes the ability to accurately predict the absolute performance index of the candidate.
  • the loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. and the amount of data explored.
  • performing a proxy task evaluation on part of the at least two candidate data based on predictive indicators corresponding to the at least two candidate data includes: evaluating the candidate with the best predictive indicator among the at least two candidate data. Data for agent task evaluation.
  • part of the candidate data evaluated by the agent task is added to the first training data set to obtain an updated first training data set; at least two updated candidate data are obtained, and at least two updated The updated candidate data is different from at least two candidate data; input at least two updated candidate data into the updated target performance prediction model to obtain prediction indicators corresponding to at least two updated candidate data, where, after the update The target performance prediction model is obtained based on the updated first training data set; based on the prediction indicators corresponding to at least two updated candidate data, some candidate data in the at least two updated candidate data are evaluated for agent tasks .
  • part of the candidate data evaluated by the agent task is added to the population data set.
  • the population data set includes sample data and evaluation scores corresponding to the sample data.
  • the first training data set is continuously updated using selected candidate data, and then the target performance prediction model is continuously updated, which can improve the performance of search results and enhance search results. space exploration capabilities.
  • the regression loss function is the mean square error loss function L MSE .
  • At least two candidate data are at least two candidate loss functions
  • the population data set is a population loss function set.
  • adding the trained performance prediction model to the automatic search not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results.
  • adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
  • obtaining at least two candidate loss functions includes: obtaining the current population loss function set, and the current population loss function set Including M population loss functions, where the m-th population loss function passes through the first calculation graph Second calculation graph and constant s, where M is a positive integer, 1 ⁇ m ⁇ M; perform initial screening on the current population loss function set, and obtain the K first initial loss functions after screening, and K is a positive integer greater than or equal to 2; Cross-screen the K first initial loss functions with a preset probability to obtain the second loss function; if the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function; if the second loss function If it is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
  • the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function.
  • the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
  • the equivalence verification of the second loss function includes: the loss function rejection criterion includes the loss function basic attribute criterion and the target task indicator. If it is satisfied The basic attribute criteria of the loss function and the target task index are used to verify the equivalence of the second loss function; among them, the second loss function satisfies the basic attribute criteria of the loss function, which is the first calculation diagram of the second loss function.
  • the corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
  • the second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
  • the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators.
  • the loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
  • determining the second loss function as the candidate loss function includes: a first calculation graph based on the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector; according to the population loss function in the current population loss function set, the second eigenvector set is obtained, and the second eigenvector set includes the corresponding value of each population loss function.
  • equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
  • a training method for an automatically searched performance prediction model includes: obtaining a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data; according to the first training data set
  • the performance prediction model is trained to obtain the target performance prediction model, in which the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
  • the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the ability to accurately predict the absolute performance indicators of the candidates.
  • the performance prediction proposed in this application The loss function of the model is more flexible, and the prediction accuracy of the trained performance prediction model is also improved. Adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search.
  • the regression loss function is the mean square error loss function L MSE .
  • the first training data set is updated; when the increment of the first training data set reaches the first threshold, the target performance prediction model is trained according to the updated first training data set to obtain the update The final target performance prediction model.
  • the first training data set is updated using the potential data obtained during the inference process of the target performance prediction model, and then the target performance prediction model is continuously trained and updated, which can improve the search
  • the performance of the results improves the exploration ability of the search space.
  • an automatic search device in a third aspect, includes an acquisition unit and a processing unit.
  • the acquisition unit is used to acquire at least two candidate data.
  • the at least two candidate data are data to be evaluated for agent tasks;
  • the processing unit uses Yu: input at least two candidate data into the target performance prediction model, and obtain prediction indicators corresponding to at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set, and the performance
  • the loss function of the prediction model includes a differentiable ranking loss function L K and a regression loss function.
  • the first training data set includes sample data and evaluation scores corresponding to the sample data; at least two candidates are evaluated according to the prediction indicators corresponding to the at least two candidate data. Part of the candidate data in the data is evaluated for agent tasks.
  • part of the data evaluated by the agent task is added to the population data set.
  • the population data set includes sample data and evaluation scores corresponding to the sample data.
  • agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
  • the type of candidate data can be a loss function, a neural network architecture, a hyperparameter, etc., and this is not limited in the embodiments of the present application.
  • the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the regression loss function that only includes the ability to accurately predict the absolute performance index of the candidate.
  • the loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. and the amount of data explored.
  • the processing unit is configured to: perform proxy task evaluation on the candidate data with the best prediction index among the at least two candidate data.
  • the device further includes an update unit: the update unit is used to add part of the candidate data evaluated by the agent task to the first training data set to obtain an updated first training data set; the acquisition unit is used to add Obtaining at least two updated candidate data, the at least two updated candidate data are different from the at least two candidate data; the processing unit is configured to: input the at least two updated candidate data into the updated target performance prediction In the model, prediction indicators corresponding to at least two updated candidate data are obtained, wherein the updated target performance prediction model is obtained based on the updated first training data set; based on at least two updated candidate data corresponding Predictive indicators are evaluated for the proxy task on a subset of at least two updated candidate data.
  • the update unit is used to add part of the candidate data evaluated by the agent task to the first training data set to obtain an updated first training data set
  • the acquisition unit is used to add Obtaining at least two updated candidate data, the at least two updated candidate data are different from the at least two candidate data
  • the processing unit is configured to: input the at least two updated candidate data into the
  • part of the candidate data evaluated by the agent task is added to the population data set.
  • the population data set includes sample data and evaluation scores corresponding to the sample data.
  • the first training data set is continuously updated using selected candidate data, and then the target performance prediction model is continuously updated, which can improve the performance of search results and enhance search results. space exploration capabilities.
  • the regression loss function is the mean square error loss function L MSE .
  • At least two candidate data are at least two candidate loss functions
  • the population data set is a population loss function set.
  • adding the trained performance prediction model to the automatic search not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results.
  • adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
  • the acquisition unit is used to obtain the current population loss function set, and the current population loss function set includes M populations Loss function, where the m-th population loss function passes through the first calculation graph Second calculation graph and constant s, where M is a positive integer, 1 ⁇ m ⁇ M;
  • the processing unit is used to: perform initial screening of the current population loss function set, and obtain the K first initial loss functions after screening, K is greater than or equal to 2 is a positive integer; cross-screen the K first initial loss functions with a preset probability to obtain the second loss function; if the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function; if If the second loss function is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
  • the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function.
  • the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
  • the equivalence verification of the second loss function includes: the loss function rejection criterion includes the loss function basic attribute criterion and the target task indicator, the processing unit Used for: If the basic attribute criteria of the loss function and the target task indicators are met, the equivalence verification of the second loss function is performed; where the second loss function satisfies the basic attribute criteria of the loss function and is the first calculation diagram of the second loss function.
  • the corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
  • the second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
  • the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators.
  • the loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
  • the processing unit is used for: the first calculation graph according to the second loss function
  • the corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector; according to the population loss function in the current population loss function set, the second eigenvector set is obtained, and the second eigenvector set includes the corresponding value of each population loss function.
  • equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
  • the fourth aspect is a training device for an automatically searched performance prediction model.
  • the device includes an acquisition unit and a processing unit: the acquisition unit is used to acquire a first training data set.
  • the first training data includes sample data and evaluations corresponding to the sample data. score; the processing unit is used to train the performance prediction model according to the first training data set to obtain the target performance prediction model, where the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
  • the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the ability to accurately predict the absolute performance indicators of the candidates.
  • the performance prediction proposed by this application The loss function of the model is more flexible, and the prediction accuracy of the trained performance prediction model is also improved. Adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search.
  • the regression loss function is the mean square error loss function L MSE .
  • the device further includes an update unit: the update unit is used to update the first training data set; the processing unit is used to, when the increment of the first training data set reaches the first threshold, according to the updated The first training data set is used to train the target performance prediction model, and the updated target performance prediction model is obtained.
  • the first training data set is updated using the potential data obtained during the inference process of the target performance prediction model, and then the target performance prediction model is continuously trained and updated, which can improve the search
  • the performance of the results improves the exploration ability of the search space.
  • an automatic search device in a fifth aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, the processing The processor is used to execute the first aspect and the method in any implementation manner of the first aspect.
  • the processor in the fifth aspect mentioned above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor.
  • the neural network computing processor here can include a graphics processor (graphics processing unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc.
  • GPU graphics processing unit
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • TPU is an artificial intelligence accelerator special integrated circuit fully customized by Google for machine learning.
  • an automatic search performance prediction model training device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed When, the processor is configured to execute the second aspect and the method in any one implementation of the second aspect.
  • the processor in the above-mentioned sixth aspect can be either a central processing unit or a combination of a CPU and a neural network operation processor.
  • the neural network operation processor here can include a graphics processor, a neural network processor and a tensor processor. etc.
  • TPU is an artificial intelligence accelerator dedicated integrated circuit fully customized by Google for machine learning.
  • a computer-readable medium stores program code for device execution.
  • the program code includes a method for executing any one of the implementation methods of the first aspect or the second aspect. .
  • An eighth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any implementation of the first aspect or the second aspect.
  • a ninth aspect provides a chip.
  • the chip includes a processor and a data interface.
  • the processor reads instructions stored in the memory through the data interface and executes any one of the first aspect or the second aspect. Methods in the implementation.
  • the chip may also include a memory, and instructions are stored in the memory,
  • the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the method in any implementation of the first aspect or the second aspect.
  • the above-mentioned chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application.
  • Figure 2 is a system architecture 100 provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of the deployment of a training device provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the processing flow on the AutoML service platform provided by this embodiment of the application.
  • Figure 5 is a schematic flowchart of a training method for an automatic search performance prediction model provided by an embodiment of the present application
  • Figure 6 is a schematic diagram for visual comparison between the tanh( ⁇ ) function curve and the sign( ⁇ ) function curve provided by the embodiment of the present application;
  • Figure 7 is a schematic flowchart of an automatic search method provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of the overall flow of training and inference of a performance prediction model provided by an embodiment of the present application.
  • Figure 9 is the first calculation diagram of a loss function provided by the embodiment of the present application. schematic diagram
  • Figure 10 is a schematic flowchart of obtaining a candidate loss function provided by an embodiment of the present application.
  • Figure 11 is a schematic flow chart of a GMS loss function search method provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of a variation method of a calculation graph provided by an embodiment of the present application.
  • Figure 13 is a schematic diagram comparing the effects of whether the loss function includes a differentiable ranking loss function in the process of training a performance prediction model provided by an embodiment of the present application;
  • Figure 14 is a schematic diagram comparing the effects of whether to add a potential loss function selection module in an automatic loss function search provided by an embodiment of the present application;
  • Figure 15 is a schematic block diagram of an automatic search performance prediction model training device provided by an embodiment of the present application.
  • Figure 16 is a schematic block diagram of an automatic search device provided by an embodiment of the present application.
  • Figure 17 is a schematic block diagram of an automatic search performance prediction model training device provided by an embodiment of the present application.
  • Figure 18 is a schematic block diagram of an automatic search device provided by an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework.
  • the main framework describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • the infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
  • the smart chip here can be a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), or an application specific integrated circuit.
  • CPU central processing unit
  • NPU neural network processing unit
  • GPU graphics processing unit
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the basic platform of infrastructure can include distributed computing framework and network related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
  • data can be obtained through sensors and external communication, and then the data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • This data involves graphics, images, voice, and text, as well as IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • the above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminal, etc.
  • the automatic search loss function method in the embodiment of this application can be applied to many fields in artificial intelligence, such as smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, safe cities and other fields.
  • the embodiments of the present application can be specifically applied in fields that require the use of (deep) neural networks, such as face recognition, pedestrian re-identification, and metric learning.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of this activation function can be used as the input of the next layer.
  • the activation function can be ReLU, tanh or sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in between are hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks very complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large.
  • DNN The definitions of these parameters in DNN are as follows: Taking the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
  • the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value.
  • the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference.
  • the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • the smaller the loss the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network.
  • the smaller the loss fluctuation the more stable the training; the larger the loss fluctuation, the more unstable the training.
  • loss functions There are currently many types of loss functions, which can be roughly divided according to the type of tasks the loss function is applied to.
  • the regression loss function applied to regression problems mean square error (MSE) loss function, mean absolute error ( mean absolute error (MAE) loss function, mean squared logarithmic error (MSLE) loss function, mean absolute percentage error (MAPE) loss function, etc.
  • classification loss function applied to classification problems Logistic loss function, negative log likelihood loss function (negative log likelihood loss function), cross entropy loss function (cross entropy loss function), Hinge loss function and exponential (exponential) loss function; triplet loss function applied to metric learning tasks .
  • MSE mean square error
  • MAE mean absolute error
  • MSLE mean squared logarithmic error
  • MLE mean absolute percentage error
  • classification loss function applied to classification problems Logistic loss function, negative log likelihood loss function (negative log likelihood loss function), cross entropy loss function (cross entropy loss function), Hinge loss function and exponential (exponential) loss function
  • the cross-entropy loss function may be a margin-based softmax (MS) loss function or a generalized margin-based softmax (GMS) loss function.
  • MS margin-based softmax
  • GMS generalized margin-based softmax
  • t(x) is a function whose domain is [-1, 1]. is the predicted output value of the neural network model, and y is the target value of the neural network model.
  • n(x) is also a function whose domain is [-1, 1].
  • the MS loss function is a special case of the GMS loss function.
  • the commonly used specific forms of the sum of n(x) of t(x) in the GMS loss function are shown in Table 1.
  • Computational graph also known as data flow graph, is defined as a directed acyclic graph (DAG).
  • Tensors and operation units are both objects in the graph. Operation units are the nodes of the graph, and tensors are the data flowing on the edges of the graph.
  • Acyclic means that the graph cannot have cycles. For example, the tensor x cannot be the input of a layer that generates x. The only processing loops (i.e. loop connections) allowed are the inner loops of the loop layer.
  • each node represents a neuron. If the output of one node serves as the input of another node, the two nodes share a line. side. That is, the nodes in this calculation graph represent operators, and the edges between nodes represent data dependencies between the two nodes.
  • Edge devices refer to any device with computing resources and network resources between the data generation source and the cloud center. Compare For example, mobile phones are the edge devices between people and the cloud center, and gateways are the edge devices between smart homes and the cloud center. In an ideal world, edge devices are devices that analyze or process data close to the source of the data. Since there is no data flow, network traffic and response time are reduced.
  • the edge device in the embodiment of the present application may be a mobile phone, a tablet personal computer (TPC), a media player, a smart home, a laptop computer (LC), or a personal digital assistant with computing capabilities.
  • assistant PDA
  • personal computer PC
  • camera camcorder
  • smart watch wearable device
  • self-driving vehicle etc. It can be understood that the embodiments of the present application do not limit the specific form of the edge device.
  • Figure 2 shows a system architecture 100 provided by an embodiment of the present application.
  • data collection device 160 is used to collect training data.
  • the training data may include training images and classification results corresponding to the training images, where the classification results of the training images may be manually pre-annotated results.
  • the data collection device 160 After collecting the training data, the data collection device 160 stores the training data into the database 130, and the training device 120 trains to obtain the target model/rules 101 based on the training data maintained in the database 130.
  • the training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value is The value is less than a certain threshold, thereby completing the training of the target model/rule 101.
  • the above-mentioned target model/rule 101 can be used to implement data processing in the embodiment of the present application.
  • the target model/rule 101 in the embodiment of this application may specifically be a neural network model. For example, deep neural networks.
  • the training data maintained in the database 130 may not necessarily be collected by the data collection device 160, but may also be received from other devices.
  • the training device 120 may not necessarily train the target model/rules 101 based entirely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a guide for this application. Limitations of Examples.
  • the target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as to the execution device 110 shown in Figure 2.
  • the execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR), AR/virtual reality (VR), vehicle-mounted terminals, etc., or servers or clouds, etc.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • Input data In the embodiment of this application, it may include: data to be processed input by the client device.
  • the execution device 110 When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processes, the execution device 110 can call data, codes, etc. in the data storage system 150 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the processing result of the data obtained above, to the client device 140, thereby providing it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the above. tasks to provide users with the desired results.
  • the user can manually enter the input data, and the manual input can be operated through the interface provided by the I/O interface 112 .
  • client device 140 may automatically send input to I/O interface 112 Data, if requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140 .
  • the user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc.
  • the client device 140 can also be used as a data collection end to collect the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as new sample data, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in database 130.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
  • the loss function used in the process of obtaining the target model/rule 101 through training by the training device 120 may be a loss function obtained by automatically searching for a loss function according to the embodiment of the present application.
  • FIG 3 is a schematic diagram of the deployment of a training device provided by an embodiment of the present application.
  • the training device 310 can be deployed in a cloud environment.
  • the cloud environment uses basic resources to provide users with information in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large number of basic resources (including computing resources, storage resources and network resources) owned by the cloud service provider.
  • the computing resources included in the cloud data center can be a large amount of computing resources.
  • Device e.g. server).
  • the training device 310 may be a server in a cloud data center that trains a neural network model, or it may be a virtual machine that trains a neural network model.
  • the training device 310 can also be a software device deployed on a server or a virtual machine in a cloud data center.
  • the software device is used to train a neural network model.
  • the software device can be distributed on multiple servers or distributed. Deployed on multiple virtual machines in a formal manner, or deployed on virtual machines and servers in a distributed manner.
  • the training device 310 can be abstracted by the cloud service provider into a cloud service for training neural network models on the cloud service platform and provided to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment uses the cloud service. Provide users with cloud services for training neural networks.
  • the training device 310 receives the neural network to be trained and the original training set, performs automatic search (for example, automatic search for loss function) through the automatic search module 311, and inputs the search results (for example, loss function) obtained from the search.
  • the neural network model to be trained is trained in the model training module 312, and the finally trained target neural network is returned to the edge device where the user is located by the training device 310.
  • the edge device is described in detail in the previous article and will not be described in detail here.
  • the automatic search module 311 includes a trained automatic search performance prediction model.
  • the user can upload the target task type to the cloud environment through the application program interface or the web interface provided by the cloud service platform. Furthermore, the user can also upload the original training set, and the training device receives the target task type and the original training set, and automatically
  • the search module 311 performs an automatic search (for example, automatically searches for a loss function), and inputs the search results (for example, a loss function) obtained by the search into the model training module 312 to train a neural network model corresponding to the type of the target task, and finally trains The target neural network is returned by the training device 310 to the edge device where the user is located.
  • the user can upload the target task type as image processing (such as face recognition or object detection, etc.) to the cloud environment through the application program interface or the web interface provided by the cloud service platform.
  • the training device 310 receives the target task type and the original training set, performs an automatic search (for example, automatic search for a loss function) through the automatic search module 311, and inputs the search results (for example, a loss function) obtained by the search into the model training module 312 for the target
  • the neural network model corresponding to the type of task is trained, and the finally trained image processing model is returned to the edge device where the user is located from the training device.
  • the above-mentioned training device 310 can be deployed in a cloud environment as shown in (a) of Figure 3; or the above-mentioned training device 310 can also be a terminal device. In this case, the execution device 310 can be deployed on the user terminal side. Embodiments of the present application There is no limit to this.
  • the performance of a neural network model is affected by many factors, such as the architecture of the neural network model, the training process, regularization methods, hyperparameters, and loss functions. At present, most methods to improve the performance of neural network models are often by manually designing the architecture of the neural network model or manually designing the loss function. With the rise of AutoML, it has become possible to automatically search for loss functions, neural network model architecture or hyperparameters. AutoML can provide corresponding services based on user input training data and target tasks.
  • FIG 4 is a schematic diagram of the processing flow on the AutoML service platform provided by this embodiment of the present application.
  • the AutoML service platform provides corresponding services based on the training data and target tasks provided by users.
  • the AutoML service platform obtains solutions that meet user needs by performing one or more search operations.
  • the search operations that the AutoML service platform can perform include data enhancement strategy search, model structure search, loss function search, and hyperparameter search.
  • data enhancement strategy search, model structure search, loss function search and hyperparameter search are all optional operations. For example, if the user provides a model structure, there is no need to perform a model structure search.
  • the automatic search method can be executed using the method in the embodiment of the present application to obtain search results that meet the requirements.
  • the specific loss function automatic search method see the description in Figure 11 below.
  • the output of the AutoML service platform is determined based on the user's needs.
  • the output of the AutoML service platform may include the target neural network model and/or loss function.
  • the AutoML service platform can output a target neural network model that can be used to perform the face recognition task.
  • the training data provided by the user is a sample image
  • the target task is a face recognition task
  • the user requires the output of a loss function for training the target neural network model
  • the AutoML service platform can output the target neural network that can be used to perform the face recognition task.
  • Network model and loss function may be used to perform the face recognition task.
  • the training data provided by the user is a sample image
  • the target task is face recognition.
  • the user also provides the structure of the neural network model and requires the output of the loss function in the target neural network model.
  • the AutoML service platform can output Loss function during training of target neural network model for face recognition task.
  • the search cost of the current automatic search method is relatively high. Therefore, how to improve the efficiency of automatic search has become an urgent problem to be solved.
  • the embodiment of the present application proposes a training method for an automatic search performance prediction model, which can improve the efficiency of automatic search while improving the performance of search results, thereby improving the performance of the target neural network model.
  • the training method of the performance prediction model provided by the embodiment of this application can be specifically applied to the automatic search method of loss function, neural network framework or hyperparameter, etc., to symbolize the training data (such as the loss function training data set in this application) and formalized intelligent information modeling, extraction, preprocessing, training, etc., to finally obtain a trained performance prediction model; and, the automatic search method provided by the embodiment of the present application can use the above-trained performance prediction model to Input data (such as the candidate loss function in this application) is input into the trained performance prediction model to obtain output data (such as the prediction index in this application).
  • performance prediction model training method and automatic search method provided in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts of a system, or two stages of an overall process. : Such as model training stage and model application stage.
  • Figure 5 is a schematic flowchart of a training method for an automatic search performance prediction model provided by an embodiment of the present application. It should be understood that the method 500 shown in Figure 5 can be executed by a training device in a cloud environment or by a training device of a terminal device. The embodiment of the present application does not limit the specific form of the training device.
  • the method 500 includes steps S510 to S520, which will be described in detail below.
  • the first training data is related to the task of automatic search. For example, if the task of automatic search is to automatically search the loss function, then the first training data is the loss function that has been performance evaluated; or if the task of the automatic search is to automatically search the neural network model structure, then the first training data is the performance evaluation Neural network model structure; or if the task of automatic search is to automatically search for hyperparameters, then the first training data is the hyperparameters that have been performance evaluated.
  • the embodiment of the present application does not limit the type of the first training data. The embodiment of this application will be described in detail later using the automatic search task as the loss function as an example.
  • the loss function of the automatic search performance prediction model includes a differentiable ranking loss function and a regression loss function.
  • the performance prediction model of the automatic search is used to predict the performance index of the candidate loss function, or to predict the performance index of the candidate neural network structure, or to predict the performance index of the candidate hyperparameter, which is not done in the embodiment of this application. limit.
  • the embodiments of this application will be described in detail later using the prediction model to predict the performance of the loss function as an example.
  • the loss function of the performance prediction model balances the two parts of the loss function through the balancing factor ⁇ .
  • the ranking index used in the ranking loss function can be the similarity Kendall's Tau ranking index, and the specific expression is as shown in formula (3):
  • P(x n ) represents the output of the performance prediction model
  • y n represents the performance accuracy of the agent task, which is the real performance accuracy
  • B represents the size of the batch data (batch size)
  • the sign( ⁇ ) function is as follows: 4) piecewise function.
  • the similarity ranking loss function such as formula (5) is used. It can also be Using a loss function based on other similarity ranking indicators, such as spearman ranking index or pearman ranking index, the embodiment of the present application does not limit this. It should be understood that both the spearman ranking index and the pearman ranking index are non-differentiable, so if you want to use a loss function based on these two ranking indicators, you can obtain a differentiable ranking loss function in a similar way as above, which will not be described in detail here.
  • the regression loss function can be a mean square error loss function, a mean absolute error loss function, etc.
  • the embodiments of the present application are not limited to this.
  • the regression loss function is a mean square error loss function, as shown in formula (6) Show.
  • x n is the feature representation of the input data of the prediction model.
  • the input data of the prediction model is a candidate loss function
  • x n is the feature vector of the candidate loss function, where n ⁇ [1,N] is a positive integer , N represents the number of candidate loss functions.
  • the loss function of the performance prediction model is shown in formula (7):
  • the performance prediction model Since only regression loss functions (such as MSE loss functions) are used, the performance prediction model must have the ability to accurately predict the absolute performance indicators of candidate data. However, during the training process of the performance prediction model, a small amount of data is often used for training due to the large search space. , therefore, only using the regression loss function as the loss function of the performance prediction model can easily lead to overfitting of the performance prediction model and weak generalization ability of the performance prediction model. Therefore, in the embodiment of the present application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to a loss function that only includes the ability to accurately predict the absolute performance index of the candidate.
  • MSE loss functions such as MSE loss functions
  • the loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. sex. For example, in the automatic search for a loss function, adding the performance prediction model of the embodiment of the present application can improve the search efficiency of the loss function, and the performance of the searched loss function is also better.
  • Figure 7 is a schematic flowchart of an automatic search method provided by an embodiment of the present application. Figure 7 will be described in detail below through steps S701 to S704.
  • S701 Obtain at least two candidate data, and at least two candidate data are data to be evaluated for the agent task.
  • agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
  • the target performance prediction model is obtained by training the performance prediction model based on the first training data set.
  • Performance The loss function of the prediction model includes a differentiable ranking loss function L K and a regression loss function.
  • the first training data set includes sample data and evaluation scores corresponding to the sample data.
  • target performance loss function is a target performance prediction model trained in the manner described in Figure 6 .
  • the prediction index obtained by the performance prediction model output and the evaluation score corresponding to the sample data in the first training data set use the same metric.
  • the difference is that the prediction index is the prediction result of the candidate data, and the evaluation score is the actual result corresponding to the sample data.
  • the prediction index is related to the actual agent task. For example, in pedestrian re-identification, the prediction index is mAP. In the classification task, the prediction index may be accuracy. This is not limited in the embodiment of the present application.
  • S703 According to the prediction indicators corresponding to the at least two candidate data, select some candidate numbers in the at least two candidate data. Perform agent task assessment based on data.
  • part of the candidate data among at least two candidate data indicates that the data used for subsequent agent task evaluation is smaller than the number of data for at least two candidate data, and part of the data can be called potential data.
  • agent task evaluation is mainly used.
  • the actual evaluation score of the potential data such as the actual evaluation score of the potential loss function.
  • the proxy task is evaluated on the candidate data with the best predictor among at least two candidate data.
  • S704 add part of the data set evaluated by the agent task to the population data set.
  • the population data set is used to determine the target search results.
  • the population data set includes sample data and evaluation scores corresponding to the sample data. That is, the data in the population data set are data that have been evaluated by the agent task to obtain actual evaluation scores.
  • the amount of data in a population dataset is fixed.
  • adding the trained performance prediction model to the automatic search not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results.
  • adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
  • Figure 8 is a schematic diagram of the overall flow of training and inference of a performance prediction model provided by an embodiment of the present application.
  • the training process 8100 and the inference process 8200 of the performance prediction model can be interspersed together.
  • the process shown in Figure 8 can also be called a potential data selection process.
  • the training process 8100 for the performance prediction model may be to obtain the target performance prediction model after one training as shown in Figure 6, or it may be to continuously increase and update the first training data set as shown in Figure 8.
  • the target performance prediction model is obtained by training. The following describes the continuous updating process of the target performance prediction model in conjunction with Figure 8.
  • the first training data set is input into the performance prediction model to be trained to train the parameters of the performance prediction model to obtain parameter training
  • a good performance prediction model is the target performance prediction model.
  • the first training data set is the loss function training set
  • the first training data set can use represents, where ⁇ i is the parameter of each loss function in the loss function data set, and p i is the performance corresponding to each loss function.
  • the performance prediction model may be a one-dimensional ResNet50 or other neural network model, which is not limited in the embodiment of the present application.
  • At least two candidate data are input into the target performance prediction model, prediction indicators corresponding to the at least two candidate data are obtained, and the candidate data is determined based on the obtained prediction performance.
  • at least two candidate data are data to be evaluated by the agent task.
  • the potential data is passed through the agent task to obtain the evaluation score of the potential data, and the potential data and its corresponding evaluation score are added to the first training data set.
  • at least two candidate data will be cleared, and at least two updated candidate data will be obtained, using the same method as the previous performance prediction of at least two candidate data, to obtain the new potential data, and new potential data including evaluation scores are added to the first training data set.
  • the potential loss function selection process can be as shown in Algorithm 1.
  • the performance prediction model to be trained when the number of loss functions evaluated on the proxy task reaches E 0 It will be based on the currently evaluated set (parameters ⁇ i of the loss function and its corresponding performance p i ) Perform performance prediction model to be trained parameter training. Then whenever the number of evaluated sets
  • each new generated loss function that passes the equivalence verification strategy will be added to the candidate set of the selector until the number of candidate sets reaches the preset N
  • N the number of candidate sets reaches the preset N
  • use the parameter-trained performance prediction model Predict the performance of each loss function in the candidate data set, and select the loss function with the highest prediction performance as the most potential loss function for evaluation on subsequent agent tasks. At this time, all loss functions in the candidate data set will be cleared, etc. After the loss function is evaluated, the corresponding parameters and indicators are added to the evaluated set Eva.
  • the current loss function search can be divided into two categories, namely dynamic loss function search and fixed loss function search.
  • the dynamic loss function search embeds the search process of the loss function in the model training. Each iteration of training will generate a new loss function for the update. After the training based on the fixed model and data set is completed, the dynamic loss function search It also ended.
  • the loss function obtained at the end of the search is only applicable to the model and training data set during the training process.
  • the loss function search needs to be re-searched to obtain the target loss function. . Therefore, the target loss function obtained through dynamic loss function search has poor transferability across data sets and models. For different data sets and neural network models, it takes a lot of computing power to search for the target loss function during the training process, and search The obtained target loss function has weak generalization ability.
  • automatic loss function search method AutoML for loss function search, AM-LFS
  • softmax function search searching for softmax, Search-Softmax
  • Fixed loss function search is a general search method that searches the loss function from scratch to find a universal loss function, models the loss function through computational graphs, and uses evolutionary algorithms to search for the best loss function form. For example, convergence simulation driven evolutionary search algorithm (CSE-Autoloss) and searching loss function from scratch (AutoLoss-Zero), search through these two fixed loss functions
  • CSE-Autoloss convergence simulation driven evolutionary search algorithm
  • AutoLoss-Zero searching loss function from scratch
  • the obtained target loss function can be transferred to other data sets and neural network models for model training.
  • the cost of searching the target loss function through evolutionary algorithms is often relatively high.
  • This search cost is not only reflected in the need to spend a lot of time to evaluate the searched candidate loss function to obtain the optimal target loss function, but also reflected in the fact that if the candidate loss function to be evaluated is a loss function with poor performance, in this candidate
  • the loss function evaluation process requires a lot of time to evaluate the loss function itself with poor performance.
  • the specific expression form of the GMS loss function is as shown in the above-mentioned formula (2).
  • the search space of the GMS loss function in the embodiment of this application is through the first function t(x), the second function n(x) and the constant s, specifically, represented by a calculation graph, where the first function t(x) corresponds to the first calculation graph The second function n(x) corresponds to the second calculation graph
  • the search space corresponding to other loss functions can also be customized based on the number of functions and the number of constants included in the other loss functions, and the embodiments of the present application do not limit this.
  • Figure 9 is the first calculation diagram of a loss function provided by the embodiment of the present application. schematic diagram.
  • the input node of the calculation graph shows two types of input nodes, one is a constant node, and the other
  • the input node represents the output of the neural network, where the constant node c can be a value in the constant set shown in formula (8).
  • ⁇ c and N c are preset values, ⁇ c is a real number, and N c is a positive integer.
  • Operator nodes represent primitive mathematical operations, as shown in Table 2.
  • the corresponding expressions for the operator operations shown in Figure 9 can be queried from Table 2.
  • Output nodes are used to aggregate results without subsequent operator nodes.
  • each second computational graph can also be represented in the same way.
  • the constant s adopts the same discretization method as the constant node c, and the constant s can be a value in the constant set shown in formula (9).
  • ⁇ s and N s are preset values, ⁇ s is a real number, and N s is a positive integer.
  • the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function.
  • the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
  • Figure 10 is a schematic process diagram for obtaining candidate loss functions provided by an embodiment of the present application.
  • Figure 11 is a schematic flowchart of a GMS loss function search method provided by an embodiment of the present application.
  • the current population loss function set is the initial population loss function set, determine the initial population loss function set based on the search space. Loss function set, each initial population loss function is obtained based on prior experience.
  • evaluation score of each current population loss function is obtained after each current population loss function is evaluated by the agent task.
  • each potential GMS loss function is represented by a first calculation graph, a second calculation graph and a constant s.
  • Each potential GMS loss function corresponds to an evaluation score.
  • S1002 Perform initial screening on the current population loss function set to obtain K first loss functions, where K is a positive integer.
  • the specific method of initial screening may be a tournament selection algorithm or a roulette selection algorithm.
  • the embodiment of the present application does not limit the specific method of initial screening.
  • K is equal to 1, directly mutate, copy, or randomly re-initialize the first loss function to obtain the second loss function.
  • Re-random initialization can be understood as randomly selecting a population loss function as the second loss function in the current population loss function; there is a probability of B to re-initialize the first loss function. Copy, that is, the second loss function keeps the form of the first loss function unchanged; there is also a probability of C to mutate the first loss function, that is, mutate the calculation graph representing the first loss function, this application applies to A
  • Figure 12 is a schematic diagram of a mutation method of a calculation graph provided by an embodiment of the present application.
  • the main ways to mutate the calculation graph representing the first loss function are to insert new operator nodes in the calculation graph, delete the original operator nodes in the calculation graph, or modify the original operator nodes in the calculation graph. Cinda replacement.
  • Figure 12(a) is the calculation graph to be mutated
  • Figure 12(b) is the new Div operator node added
  • Figure 12(c) is the original Sig operator node
  • (d) in Figure 12 replaces the original Exp operator node with the Gd operator node.
  • K is a positive integer greater than or equal to 2
  • cross screening can be understood as crossing K first loss functions with a probability of D, and selecting an intermediate loss function from the K first loss functions to generate a second loss function.
  • the embodiment of the present application performs D on The value is not limited, for example, it can be 60%, 80%, etc.
  • the loss function as the GMS loss function as an example
  • the two first loss functions are crossed with a probability of 60%, that is, with a probability of 60%
  • the first loss function a is replaced by the first loss function b
  • the loss function a serves as the intermediate loss function. Reinitialize, copy or mutate the intermediate loss function to obtain the second loss function.
  • the specific implementation method is similar to the above-mentioned method of obtaining the second loss function based on the first loss function, and will not be described in detail here to avoid repetition.
  • the loss function rejection criterion can be understood as a judgment criterion for whether the basic properties of the loss function meet the requirements. Take the loss function as GMS loss function as an example for detailed explanation.
  • the rejection criteria of the GMS loss function include basic attribute criteria and target task indicators.
  • the basic attribute criterion refers to whether the functions t(x) and n(x) corresponding to the calculation graph of the second loss function generated by S1003 satisfy formula (10) on the interval x ⁇ [-1, 1]:
  • the target task indicator is that the output indicator obtained by training the task data through the second loss function reaches a preset value, where the type of the target task indicator is related to the task-related output metric. For example, take the mean average accuracy (mAP) indicator of the entire class.
  • the types of target task indicators and their default values in the application embodiment are not limited.
  • the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators.
  • the loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
  • the second loss function does not pass the loss function rejection criterion, then the first loss function or the intermediate loss function is mutated, copied, or re-randomly initialized to obtain a new second loss function, Until the updated second loss function passes the loss function rejection criterion, as shown in Figure 11.
  • the second loss function is not equivalent to the mth population loss function in the current population loss function set, determine the second loss function as a candidate loss function.
  • the corresponding first function t(x) and the second calculation graph obtain the first eigenvector, where the first eigenvector satisfies formula (11) to formula (13).
  • TN min represents the minimum function value of t(x) and n(x) on the interval [-1,1];
  • TN max represents the minimum function value of t(x) and n(x) on the interval [-1,1].
  • Maximum function value; k represents the normalized scale factor;
  • is the threshold of the preset search space constraint, and the t(x), n(x), and s of the second loss function satisfy the constraints shown in formula (13).
  • a second feature vector set is obtained based on the population loss function in the current population loss function set.
  • the second feature vector set includes the second feature vector corresponding to each population loss function set; if the first If the feature vector is not equivalent to the second feature vector corresponding to each population loss function, the second loss function is determined as the candidate loss function.
  • the evaluation score corresponding to the mth population loss function is assigned to the second loss function, and the population is updated. loss function set.
  • the way to update the population loss function set can be to add the second loss function as the potential loss function and the evaluation score corresponding to the second loss function into the population loss function set, and eliminate one population loss function, where,
  • the eliminated population loss function may be the earliest population loss function, for example, the eliminated population loss function ranked first in the concentration of eliminated population loss functions.
  • the embodiment of the present application does not limit the method of eliminating the population loss function.
  • the evaluation corresponding to the m-th population GMS loss function in the current population GMS loss function set The score is assigned to the second loss function, and the second loss function and the evaluation score corresponding to the second loss function are directly added to the current population GMS loss function set, and one population loss function is eliminated to obtain the updated population GMS loss function set.
  • equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
  • the candidate loss function can be selected through the potential loss function selection module to conduct agent task evaluation, thereby updating the current population loss function set.
  • the working principle of the potential loss function selection module is similar to that of the potential data selection module. The difference is that the potential data is a potential loss function.
  • the method of updating the current population loss function set is also in the above content. Detailed description will not be repeated here to avoid repetition.
  • a potential loss function with the best evaluation score is selected from the updated population loss function set as the target loss function.
  • the automatic search method of this application is described in detail through Figures 5 to 12.
  • the model training module 312 will use the target loss function searched in the automatic search module 311 through the above method to train the neural network model to be trained and the original training data obtained from the user to obtain the target neural network model.
  • the corresponding target neural network model is obtained and applied to the corresponding specific tasks.
  • the training tasks can be face recognition, pedestrian re-identification, metric learning, etc., and the embodiments of the present application do not limit this.
  • Figure 13 is a schematic diagram comparing the effects of whether the loss function includes a differentiable ranking loss function in the process of training a performance prediction model provided by an embodiment of the present application.
  • the abscissa is the amount of training data of the performance prediction model, and the ordinate is the prediction effect index (KTau) of the performance prediction model.
  • the higher the prediction effect index the higher the prediction accuracy of the performance prediction model.
  • the prediction effect index of the performance prediction model whose loss function is only the MSE loss function is lower than that of the performance prediction model whose loss function is the MSE loss function and the differentiable ranking loss function. Therefore, the performance prediction model trained according to the differentiable ranking loss function and the MSE loss function proposed in the embodiments of this application has better prediction accuracy, which in turn helps the potential loss device to select a potential loss function with better performance.
  • Figure 14 is a schematic diagram comparing the effects of whether to add a potential loss function selection module in an automatic loss function search provided by an embodiment of the present application.
  • the abscissa is the number of loss functions searched for the loss function, and the ordinate is the task-related output metric, such as mAP shown in Figure 14.
  • mAP the task-related output metric
  • the number of loss functions explored based on the loss function search method including PLC is significantly higher than the number of loss functions explored based on the loss function search method not including PLC. This is because, through the performance prediction model in the potential loss function selection module, candidate loss functions with poor prediction results can be eliminated in advance, and candidate loss functions with good prediction results can be evaluated for proxy tasks, for example, from two candidate loss functions Select one as the potential loss function, or select a potential loss function from 5 candidate loss functions, or select a potential loss function from more candidate loss functions, and evaluate the selected potential loss function proxy task.
  • each candidate loss function needs to be evaluated by the agent task. Under the same number of iterations, the number of loss functions explored based on the loss function search method that includes PLC is higher.
  • Table 4 shows the results of different models (e.g., residual network (ResNet50), omni-scale network (OSNet), and multiple granularity network (MGN)) using the same data set (e.g., Market1501 data set) the loss function searched.
  • ResNet50 residual network
  • OSNet omni-scale network
  • MGN multiple granularity network
  • the device according to the embodiment of the present application will be described below with reference to FIGS. 15 to 18 . It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.
  • FIG. 15 is a schematic block diagram of the automatic search performance prediction model training device 3000 according to the embodiment of the present application.
  • the neural network model training device 3000 shown in FIG. 15 includes an acquisition unit 3010 and a processing unit 3020.
  • the acquisition unit 3010 is configured to acquire a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data.
  • the processing unit 3020 is configured to train the performance prediction model according to the first training data set to obtain a target performance prediction model, where the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
  • FIG. 16 is a schematic block diagram of the automatic search device 4000 provided by the embodiment of the present application.
  • the automatic search device 4000 shown in FIG. 16 includes an acquisition unit 4010 and a processing unit 4020.
  • the acquisition unit 3010 is configured to acquire at least two candidate data, where the at least two candidate data are data to be evaluated for the agent task.
  • the processing unit 3020 is configured to input at least two candidate data into the target performance prediction model, and obtain prediction indicators corresponding to the at least two candidate data, wherein the target performance prediction model trains the performance prediction model based on the first training data set. Obtained, the loss function of the performance prediction model includes the differentiable ranking loss function L K and the regression loss function.
  • the first training data set includes sample data and the evaluation scores corresponding to the sample data; according to the prediction index pairs corresponding to at least two candidate data Part of the candidate data among the at least two candidate data is evaluated by the agent task.
  • training device 3000 and device 4000 are embodied in the form of functional units.
  • unit here can be implemented in the form of software and/or hardware, and is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of both that implements the above functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
  • ASIC application specific integrated circuit
  • processor such as a shared processor, a dedicated processor, or a group processor for executing one or more software or firmware programs. etc.
  • memory merged logic circuitry, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • FIG 17 is a schematic diagram of the hardware structure of an automatic search performance prediction model training device provided by an embodiment of the present application.
  • the automatic search performance prediction model training device 5000 shown in Figure 17 includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004.
  • the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.
  • the memory 5001 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM).
  • the memory 5001 can store programs. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 is used to execute various steps of the training method of the performance prediction model in the embodiment of the present application.
  • the processor 5002 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute relevant programs to implement the training method of the performance prediction model of the method embodiment of the present application.
  • the processor 5002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the training method of the performance prediction model of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 5002.
  • the above-mentioned processor 5002 can also be a general-purpose processor or a digital signal processor (digital signal processing, DSP), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 5001.
  • the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the training device shown in Figure 15, or to execute Figure 5 of the method embodiment of the present application. The performance prediction model training method shown.
  • the communication interface 5003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, training data can be obtained through the communication interface 5003.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, training data can be obtained through the communication interface 5003.
  • Bus 5004 may include a path that carries information between various components of device 5000 (eg, memory 5001, processor 5002, communication interface 5003).
  • FIG. 18 is a schematic diagram of the hardware structure of the automatic search device according to the embodiment of the present application.
  • the automatic search device 6000 shown in FIG. 18 includes a memory 6001, a processor 6002, a communication interface 6003 and a bus 6004. Among them, the memory 6001, the processor 6002, and the communication interface 6003 implement communication connections between each other through the bus 6004.
  • Memory 6001 may be ROM, static storage device, and RAM.
  • the memory 6001 can store programs. When the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 and the communication interface 6003 are used to execute various steps of the automatic search method in the embodiment of the present application. Specifically, the processor 6002 can perform the method shown in Figure 7 above.
  • the processor 6002 can be a general-purpose CPU, microprocessor, ASIC, GPU or one or more integrated circuits, and is used to execute related programs to realize the functions required to be performed by the units in the automatic search device in the embodiment of the present application. Or execute the automatic search method of the method embodiment of the present application.
  • the processor 6002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the automatic search method in the embodiment of the present application can be completed through the integrated logic circuit of hardware in the processor 6002 or instructions in the form of software.
  • the above-mentioned processor 6002 can also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 6001.
  • the processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required to be performed by the units included in the automatic search device of the embodiment of the present application, or to perform the automatic search of the method embodiment of the present application. method.
  • the communication interface 6003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks.
  • the data to be processed can be obtained through the communication interface 6003.
  • Bus 6004 may include a path that carries information between various components of device 6000 (eg, memory 6001, processor 6002, communication interface 6003).
  • the above-mentioned device 5000 and device 6000 only show a memory, a processor, and a communication interface
  • the device 5000 and the device 6000 may also include other components necessary for normal operation.
  • the device 5000 and the device 6000 may also include hardware devices that implement other additional functions.
  • the device 5000 and the device 6000 may only include components necessary to implement the embodiments of the present application, and do not necessarily include all the components shown in FIGS. 17 and 18 .
  • the processor in the embodiment of the present application can be a central processing unit (CPU).
  • the processor can also be other general-purpose processors, digital signal processors (DSP), or application-specific integrated circuits. (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • RAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory access memory
  • direct rambus RAM direct rambus RAM, DR RAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmit to another website, computer, server or data center through wired (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one refers to one or more
  • plural items refers to two or more.
  • At least one item (items) or similar expressions refer to any combination of these items, including any combination of single items (items) or plural items (items).
  • at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided in the present application are an automatic search method, an automatic-search performance prediction model training method and an apparatus, relating to the field of artificial intelligence, in particular to the field of computer vision. The training method comprises: on the basis of a potential data selection module of a performance prediction model, training and updating the performance prediction model in an automatic data search process and performing inference by using the trained performance prediction model, so as to assist selection of potential data, wherein loss functions of the performance prediction model comprise a differentiable ranking loss function LK and a regression loss function. The present application may improve the prediction accuracy of the performance prediction model, and then uses the trained performance prediction model to an automatic search, so as to improve the efficiency of and the accuracy of the automatic search and the volume of explored data.

Description

自动搜索方法、自动搜索的性能预测模型训练方法及装置Automatic search method, automatic search performance prediction model training method and device
本申请要求于2022年3月14日提交中国专利局、申请号为202210249999.8、申请名称为“自动搜索方法、自动搜索的性能预测模型训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority for the Chinese patent application submitted to the China Patent Office on March 14, 2022, with application number 202210249999.8 and the application title "Automatic search method, automatic search performance prediction model training method and device", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及人工智能领域,并且更具体地,涉及一种自动搜索方法、自动搜索的性能预测模型训练方法及装置。The present application relates to the field of artificial intelligence, and more specifically, to an automatic search method, automatic search performance prediction model training method and device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
随着深度学习(deep learning,DL)的发展,深度神经网络(deep neural network,DNN)是深度学习的代表算法之一,它是一种具有深度结构的前馈神经网络,它在人脸识别、行人再识别等计算机视觉领域取得了显著的效果。通常通过手工设计好的深度神经网络架构来提升模型在计算机视觉中的表现,或者基于手工设计的损失函数来提升模型在计算机视觉中的表现。无论是基于手工设计的损失函数还是手工设计好的深度神经网络架构,往往需要较多的专家知识并且需要耗费大量的时间。With the development of deep learning (DL), deep neural network (DNN) is one of the representative algorithms of deep learning. It is a feedforward neural network with a deep structure. It is used in face recognition. , pedestrian re-identification and other computer vision fields have achieved remarkable results. The performance of the model in computer vision is usually improved through a hand-designed deep neural network architecture, or the performance of the model in computer vision is improved based on a hand-designed loss function. Whether it is based on a hand-designed loss function or a hand-designed deep neural network architecture, it often requires more expert knowledge and takes a lot of time.
因此,随着自动机器学习(automated machine learning,AutoML)的兴起,损失函数搜索(loss function search,LFS)、网络架构搜索或者超参数搜索等也成为了可能。由于目前这些自动搜索方法的搜索成本比较大。因此,如何提高自动搜索效率成为一个亟待解决的问题。Therefore, with the rise of automated machine learning (AutoML), loss function search (LFS), network architecture search or hyperparameter search has also become possible. Because the search cost of these automatic search methods is relatively high at present. Therefore, how to improve the efficiency of automatic search has become an urgent problem to be solved.
发明内容Contents of the invention
本申请提供一种自动搜索方法、自动搜索的性能预测模型训练方法及装置,可以提高搜索效率,搜索更多的数据,而且通过该方法获得的搜索结果的表现更好。This application provides an automatic search method, automatic search performance prediction model training method and device, which can improve search efficiency, search for more data, and perform better search results obtained through this method.
第一方面,提供了一种自动搜索方法,该方法包括:获取至少两个候选数据,至少两个候选数据为待进行代理任务评估的数据;将至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,性能预测模型的损失函数包括可微分的排序损失函 数LK和回归损失函数,第一训练数据集包括样本数据和样本数据对应的评估分数;根据至少两个候选数据对应的预测指标,对至少两个候选数据中的部分候选数据进行代理任务评估。In a first aspect, an automatic search method is provided, which method includes: obtaining at least two candidate data, and at least two candidate data are data to be evaluated for agent tasks; inputting at least two candidate data into a target performance prediction model , obtain prediction indicators corresponding to at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set, and the loss function of the performance prediction model includes a differentiable ranking loss function Number L K and regression loss function, the first training data set includes sample data and evaluation scores corresponding to the sample data; perform agent task evaluation on some candidate data in at least two candidate data according to the prediction indicators corresponding to at least two candidate data .
可选地,将经过代理任务评估后的部分数据,加入种群数据集中。Optionally, part of the data evaluated by the agent task is added to the population data set.
应理解,种群数据集中包括样本数据和样本数据对应的评估分数。It should be understood that the population data set includes sample data and evaluation scores corresponding to the sample data.
应理解,代理任务评估可以是人脸识别任务、行人再识别任务、分类任务或者是度量学习等,本申请实施例对此不作限制。It should be understood that the agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
还应理解,候选数据的类型可以为损失函数、神经网络架构、超参数等,本申请实施例对此不作限制。It should also be understood that the type of candidate data can be a loss function, a neural network architecture, a hyperparameter, etc., and this is not limited in the embodiments of the present application.
在本申请实施例中,通过结合可微分排序损失函数和回归损失函数得到的性能预测模型的损失函数,相比于仅包括需要具有精准预测候选的绝对性能指标的能力的回归损失函数而言,本申请提出的性能预测模型的损失函数更加灵活,并且训练得到的性能预测模型的预测准确性也得到了提高,进而将训练好的性能预测模型加入自动搜索中可以提高自动搜索的效率、准确性和探索的数据量。In the embodiment of the present application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the regression loss function that only includes the ability to accurately predict the absolute performance index of the candidate. The loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. and the amount of data explored.
在某些可能的实现方式中,根据至少两个候选数据对应的预测指标,对至少两个候选数据中的部分候选数据进行代理任务评估包括:对至少两个候选数据中预测指标最好的候选数据进行代理任务评估。In some possible implementations, performing a proxy task evaluation on part of the at least two candidate data based on predictive indicators corresponding to the at least two candidate data includes: evaluating the candidate with the best predictive indicator among the at least two candidate data. Data for agent task evaluation.
在某些可能的实现方式中,将经过代理任务评估后的部分候选数据加入第一训练数据集中,得到更新后的第一训练数据集;获取至少两个更新后的候选数据,至少两个更新后的候选数据和至少两个候选数据不同;将至少两个更新后的候选数据输入到更新后的目标性能预测模型中,得到至少两个更新后的候选数据对应的预测指标,其中,更新后的目标性能预测模型是根据更新后的第一训练数据集得到的;根据至少两个更新后的候选数据对应的预测指标,对至少两个更新后的候选数据中的部分候选数据进行代理任务评估。In some possible implementations, part of the candidate data evaluated by the agent task is added to the first training data set to obtain an updated first training data set; at least two updated candidate data are obtained, and at least two updated The updated candidate data is different from at least two candidate data; input at least two updated candidate data into the updated target performance prediction model to obtain prediction indicators corresponding to at least two updated candidate data, where, after the update The target performance prediction model is obtained based on the updated first training data set; based on the prediction indicators corresponding to at least two updated candidate data, some candidate data in the at least two updated candidate data are evaluated for agent tasks .
可选地,将经过代理任务评估后的部分候选数据,加入种群数据集中。Optionally, part of the candidate data evaluated by the agent task is added to the population data set.
应理解,种群数据集中包括样本数据和样本数据对应的评估分数。It should be understood that the population data set includes sample data and evaluation scores corresponding to the sample data.
在本申请实施例中,在目标性能预测模型的推理过程中,利用挑选出的部分候选数据对第一训练数据集不断更新,进而不断更新目标性能预测模型,可以提高搜索结果的表现,提升搜索空间的探索能力。In the embodiment of the present application, during the reasoning process of the target performance prediction model, the first training data set is continuously updated using selected candidate data, and then the target performance prediction model is continuously updated, which can improve the performance of search results and enhance search results. space exploration capabilities.
在某些可能的实现方式中,述回归损失函数为均方误差损失函数LMSEIn some possible implementations, the regression loss function is the mean square error loss function L MSE .
在某些可能的实现方式中,至少两个候选数据为至少两个候选损失函数,种群数据集为种群损失函数集。In some possible implementations, at least two candidate data are at least two candidate loss functions, and the population data set is a population loss function set.
在本申请实施例中,通过将训练好的性能预测模型加入到自动搜索中,不仅可以提高搜索效率,通过性能预测模型筛选出来潜力候选数据的表现更好,进而提高了目标搜索结果的表现性能。例如,在损失函数自动搜索中,将性能预测模型加入到自动搜索过程中,不经可以提高搜索空间的探索,还能提高目标损失函数的表现。In the embodiment of this application, by adding the trained performance prediction model to the automatic search, not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results. . For example, in the automatic search of the loss function, adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
在某些可能的实现方式中,当候选损失函数中的损失函数类型为广义间隔softmax损失函数GMS损失函数时,获取至少两个候选损失函数包括:获取当前种群损失函数集,当前种群损失函数集中包括M个种群损失函数,其中,第m个种群损失函数通过第一计算图第二计算图和常数s表示,其中M为正整数,1≤m≤M;对当前种群损失函数集进行初始筛选,获得筛选后的K个第一初始损失函数,K为大于或等于2的正整数; 对K个第一初始损失函数以预设概率进行交叉筛选,获得第二损失函数;如果第二损失函数通过损失函数拒绝准则,则对第二损失函数进行等价性验证;如果第二损失函数与当前种群损失函数集中的第m个当前种群损失函数不等价,则第二损失函数确定为候选损失函数。In some possible implementations, when the loss function type in the candidate loss function is the generalized interval softmax loss function GMS loss function, obtaining at least two candidate loss functions includes: obtaining the current population loss function set, and the current population loss function set Including M population loss functions, where the m-th population loss function passes through the first calculation graph Second calculation graph and constant s, where M is a positive integer, 1≤m≤M; perform initial screening on the current population loss function set, and obtain the K first initial loss functions after screening, and K is a positive integer greater than or equal to 2; Cross-screen the K first initial loss functions with a preset probability to obtain the second loss function; if the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function; if the second loss function If it is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
本申请实施例中,根据损失函数中包括的函数个数以及常数个数,使用函数个数对应的计算图以及常数来构建损失函数的搜索空间,相比于传统根据整个损失函数对应的计算图构建搜索空间,本申请实施例中的构建搜索空间方式更加细致,更加有助于搜索出性能好的目标损失函数。In the embodiment of the present application, according to the number of functions and the number of constants included in the loss function, the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function. Compared with the traditional calculation graph corresponding to the entire loss function, Constructing the search space, the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
在某些可能的实现方式中,如果第二损失函数通过损失函数拒绝准则,则对第二损失函数进行等价性验证包括:损失函数拒绝准则包括损失函数基本属性准则和目标任务指标,如果满足损失函数基本属性准则和目标任务指标,则对第二损失函数进行等价性验证;其中,第二损失函数满足损失函数基本属性准则为第二损失函数的第一计算图对应的第一函数t(x)和第二计算图对应的第二函数n(x)满足如下公式:
In some possible implementations, if the second loss function passes the loss function rejection criterion, the equivalence verification of the second loss function includes: the loss function rejection criterion includes the loss function basic attribute criterion and the target task indicator. If it is satisfied The basic attribute criteria of the loss function and the target task index are used to verify the equivalence of the second loss function; among them, the second loss function satisfies the basic attribute criteria of the loss function, which is the first calculation diagram of the second loss function. The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
第二损失函数满足目标任务指标为通过第二损失函数对任务数据进行训练得到的输出指标达到预设值。The second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
在本申请实施例中,包括基本属性准则和目标任务指标的损失函数拒绝准则可以对第二损失函数进行快速筛选,提早筛除不满足要求的第二损失函数,相比于传统的仅基于基本属性的准则的损失函数拒绝准则,或者仅基于目标任务指标的损失函数拒绝准则,本申请实施例的损失函数拒绝准确考虑的因素更加全面,可以更全面地筛除不满足要求的第二损失函数,从而提高整体损失函数的搜索效率。In the embodiment of the present application, the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators. The loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
在某些可能的实现方式中,第二损失函数确定为候选损失函数包括:根据第二损失函数的第一计算图对应的第一函数t(x)、第二计算图对应的第二函数n(x)和常数s,获得第一特征向量;根据当前种群损失函数集中的种群损失函数,获得第二特征向量集合,第二特征向量集合中包括每个种群损失函数对应的第二特征向量;如果第一特征向量和每个种群损失函数对应的第二特征向量不等价,则第二损失函数确定为候选损失函数。In some possible implementations, determining the second loss function as the candidate loss function includes: a first calculation graph based on the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector; according to the population loss function in the current population loss function set, the second eigenvector set is obtained, and the second eigenvector set includes the corresponding value of each population loss function. The second eigenvector of; if the first eigenvector and the second eigenvector corresponding to each population loss function are not equivalent, the second loss function is determined as the candidate loss function.
在本申请实施例中,基于特征向量的等价性验证有效地将等价的损失函数挑选出来,避免对和当前种群损失函数集中的损失函数进行重复代理任务评估,从而有效提高了损失函数的搜索效率。In the embodiments of this application, equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
第二方面,提供了一种自动搜索的性能预测模型的训练方法,该方法包括:获取第一训练数据集,第一训练数据包括样本数据和样本数据对应的评估分数;根据第一训练数据集对性能预测模型进行训练,得到目标性能预测模型,其中性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数。In a second aspect, a training method for an automatically searched performance prediction model is provided. The method includes: obtaining a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data; according to the first training data set The performance prediction model is trained to obtain the target performance prediction model, in which the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
在本申请实施例中,通过结合可微分排序损失函数和回归损失函数得到的性能预测模型的损失函数,相比于需要具有精准预测候选的绝对性能指标的能力而言,本申请提出的性能预测模型的损失函数更加灵活,并且训练得到的性能预测模型的预测准确性也得到了提高,进而将训练好的性能预测模型加入自动搜索中可以提高自动搜索的效率和准确性。 In the embodiment of this application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the ability to accurately predict the absolute performance indicators of the candidates. The performance prediction proposed in this application The loss function of the model is more flexible, and the prediction accuracy of the trained performance prediction model is also improved. Adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search.
在某些可能的实现方式中,回归损失函数为均方误差损失函数LMSEIn some possible implementations, the regression loss function is the mean square error loss function L MSE .
在某些可能的实现方式中,更新第一训练数据集;当第一训练数据集的增量到达第一阈值时,根据更新后的第一训练数据集对目标性能预测模型进行训练,得到更新后的目标性能预测模型。In some possible implementations, the first training data set is updated; when the increment of the first training data set reaches the first threshold, the target performance prediction model is trained according to the updated first training data set to obtain the update The final target performance prediction model.
在本申请实施例中,在目标性能预测模型的训练过程中,利用目标性能预测模型推理过程中获得的潜力数据对第一训练数据集进行更新,进而不断训练更新目标性能预测模型,可以提高搜索结果的表现,提升搜索空间的探索能力。In the embodiment of the present application, during the training process of the target performance prediction model, the first training data set is updated using the potential data obtained during the inference process of the target performance prediction model, and then the target performance prediction model is continuously trained and updated, which can improve the search The performance of the results improves the exploration ability of the search space.
第三方面,提供了一种自动搜索装置,该装置包括获取单元和处理单元,该获取单元用于获取至少两个候选数据,至少两个候选数据为待进行代理任务评估的数据;处理单元用于:将至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数,第一训练数据集包括样本数据和样本数据对应的评估分数;根据至少两个候选数据对应的预测指标对至少两个候选数据中的部分候选数据进行代理任务评估。In a third aspect, an automatic search device is provided. The device includes an acquisition unit and a processing unit. The acquisition unit is used to acquire at least two candidate data. The at least two candidate data are data to be evaluated for agent tasks; the processing unit uses Yu: input at least two candidate data into the target performance prediction model, and obtain prediction indicators corresponding to at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set, and the performance The loss function of the prediction model includes a differentiable ranking loss function L K and a regression loss function. The first training data set includes sample data and evaluation scores corresponding to the sample data; at least two candidates are evaluated according to the prediction indicators corresponding to the at least two candidate data. Part of the candidate data in the data is evaluated for agent tasks.
可选地,将经过代理任务评估后的部分数据,加入种群数据集中。Optionally, part of the data evaluated by the agent task is added to the population data set.
应理解,种群数据集中包括样本数据和样本数据对应的评估分数。It should be understood that the population data set includes sample data and evaluation scores corresponding to the sample data.
应理解,代理任务评估可以是人脸识别任务、行人再识别任务、分类任务或者是度量学习等,本申请实施例对此不作限制。It should be understood that the agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
还应理解,候选数据的类型可以为损失函数、神经网络架构、超参数等,本申请实施例对此不作限制。It should also be understood that the type of candidate data can be a loss function, a neural network architecture, a hyperparameter, etc., and this is not limited in the embodiments of the present application.
在本申请实施例中,通过结合可微分排序损失函数和回归损失函数得到的性能预测模型的损失函数,相比于仅包括需要具有精准预测候选的绝对性能指标的能力的回归损失函数而言,本申请提出的性能预测模型的损失函数更加灵活,并且训练得到的性能预测模型的预测准确性也得到了提高,进而将训练好的性能预测模型加入自动搜索中可以提高自动搜索的效率、准确性和探索的数据量。In the embodiment of the present application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the regression loss function that only includes the ability to accurately predict the absolute performance index of the candidate. The loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. and the amount of data explored.
在某些可能的实现方式中,处理单元用于:对至少两个候选数据中预测指标最好的候选数据进行代理任务评估。In some possible implementations, the processing unit is configured to: perform proxy task evaluation on the candidate data with the best prediction index among the at least two candidate data.
在某些可能的实现方式中,装置还包括更新单元:更新单元用于,将经过代理任务评估后的部分候选数据加入第一训练数据集中,得到更新后的第一训练数据集;获取单元用于,获取至少两个更新后的候选数据,至少两个更新后的候选数据和至少两个候选数据不同;处理单元用于:将至少两个更新后的候选数据输入到更新后的目标性能预测模型中,得到至少两个更新后的候选数据对应的预测指标,其中,更新后的目标性能预测模型是根据更新后的第一训练数据集得到的;根据至少两个更新后的候选数据对应的预测指标,对至少两个更新后的候选数据中的部分候选数据进行代理任务评估。In some possible implementations, the device further includes an update unit: the update unit is used to add part of the candidate data evaluated by the agent task to the first training data set to obtain an updated first training data set; the acquisition unit is used to add Obtaining at least two updated candidate data, the at least two updated candidate data are different from the at least two candidate data; the processing unit is configured to: input the at least two updated candidate data into the updated target performance prediction In the model, prediction indicators corresponding to at least two updated candidate data are obtained, wherein the updated target performance prediction model is obtained based on the updated first training data set; based on at least two updated candidate data corresponding Predictive indicators are evaluated for the proxy task on a subset of at least two updated candidate data.
可选地,将经过代理任务评估后的部分候选数据,加入种群数据集中。Optionally, part of the candidate data evaluated by the agent task is added to the population data set.
应理解,种群数据集中包括样本数据和样本数据对应的评估分数。It should be understood that the population data set includes sample data and evaluation scores corresponding to the sample data.
在本申请实施例中,在目标性能预测模型的推理过程中,利用挑选出的部分候选数据对第一训练数据集不断更新,进而不断更新目标性能预测模型,可以提高搜索结果的表现,提升搜索空间的探索能力。In the embodiment of the present application, during the reasoning process of the target performance prediction model, the first training data set is continuously updated using selected candidate data, and then the target performance prediction model is continuously updated, which can improve the performance of search results and enhance search results. space exploration capabilities.
在某些可能的实现方式中,回归损失函数为均方误差损失函数LMSEIn some possible implementations, the regression loss function is the mean square error loss function L MSE .
在某些可能的实现方式中,至少两个候选数据为至少两个候选损失函数,种群数据集为种群损失函数集。In some possible implementations, at least two candidate data are at least two candidate loss functions, and the population data set is a population loss function set.
在本申请实施例中,通过将训练好的性能预测模型加入到自动搜索中,不仅可以提高搜索效率,通过性能预测模型筛选出来潜力候选数据的表现更好,进而提高了目标搜索结果的表现性能。例如,在损失函数自动搜索中,将性能预测模型加入到自动搜索过程中,不经可以提高搜索空间的探索,还能提高目标损失函数的表现。In the embodiment of this application, by adding the trained performance prediction model to the automatic search, not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results. . For example, in the automatic search of the loss function, adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
在某些可能的实现方式中,当候选损失函数中的损失函数类型为广义间隔softmax损失函数GMS损失函数时,获取单元用于,获取当前种群损失函数集,当前种群损失函数集中包括M个种群损失函数,其中,第m个种群损失函数通过第一计算图第二计算图和常数s表示,其中M为正整数,1≤m≤M;处理单元用于:对当前种群损失函数集进行初始筛选,获得筛选后的K个第一初始损失函数,K为大于或等于2的正整数;对K个第一初始损失函数以预设概率进行交叉筛选,获得第二损失函数;如果第二损失函数通过损失函数拒绝准则,则对第二损失函数进行等价性验证;如果第二损失函数与当前种群损失函数集中的第m个当前种群损失函数不等价,则第二损失函数确定为候选损失函数。In some possible implementations, when the loss function type in the candidate loss function is the generalized interval softmax loss function GMS loss function, the acquisition unit is used to obtain the current population loss function set, and the current population loss function set includes M populations Loss function, where the m-th population loss function passes through the first calculation graph Second calculation graph and constant s, where M is a positive integer, 1≤m≤M; the processing unit is used to: perform initial screening of the current population loss function set, and obtain the K first initial loss functions after screening, K is greater than or equal to 2 is a positive integer; cross-screen the K first initial loss functions with a preset probability to obtain the second loss function; if the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function; if If the second loss function is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
本申请实施例中,根据损失函数中包括的函数个数以及常数个数,使用函数个数对应的计算图以及常数来构建损失函数的搜索空间,相比于传统根据整个损失函数对应的计算图构建搜索空间,本申请实施例中的构建搜索空间方式更加细致,更加有助于搜索出性能好的目标损失函数。In the embodiment of the present application, according to the number of functions and the number of constants included in the loss function, the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function. Compared with the traditional calculation graph corresponding to the entire loss function, Constructing the search space, the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
在某些可能的实现方式中,如果第二损失函数通过损失函数拒绝准则,则对第二损失函数进行等价性验证包括:损失函数拒绝准则包括损失函数基本属性准则和目标任务指标,处理单元用于:如果满足损失函数基本属性准则和目标任务指标,则对第二损失函数进行等价性验证;其中,第二损失函数满足损失函数基本属性准则为第二损失函数的第一计算图对应的第一函数t(x)和第二计算图对应的第二函数n(x)满足如下公式:
In some possible implementations, if the second loss function passes the loss function rejection criterion, the equivalence verification of the second loss function includes: the loss function rejection criterion includes the loss function basic attribute criterion and the target task indicator, the processing unit Used for: If the basic attribute criteria of the loss function and the target task indicators are met, the equivalence verification of the second loss function is performed; where the second loss function satisfies the basic attribute criteria of the loss function and is the first calculation diagram of the second loss function. The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
第二损失函数满足目标任务指标为通过第二损失函数对任务数据进行训练得到的输出指标达到预设值。The second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
在本申请实施例中,包括基本属性准则和目标任务指标的损失函数拒绝准则可以对第二损失函数进行快速筛选,提早筛除不满足要求的第二损失函数,相比于传统的仅基于基本属性的准则的损失函数拒绝准则,或者仅基于目标任务指标的损失函数拒绝准则,本申请实施例的损失函数拒绝准确考虑的因素更加全面,可以更全面地筛除不满足要求的第二损失函数,从而提高整体损失函数的搜索效率。In the embodiment of the present application, the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators. The loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
在某些可能的实现方式中,处理单元用于:根据第二损失函数的第一计算图对应的第一函数t(x)、第二计算图对应的第二函数n(x)和常数s,获得第一特征向量;根据当前种群损失函数集中的种群损失函数,获得第二特征向量集合,第二特征向量集合中包括每个种群损失函数对应的第二特征向量;如果第一特征向量和每个种群损失函数对应的第二特征向量不等价,则第二损失函数确定为候选损失函数。 In some possible implementations, the processing unit is used for: the first calculation graph according to the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector; according to the population loss function in the current population loss function set, the second eigenvector set is obtained, and the second eigenvector set includes the corresponding value of each population loss function. The second eigenvector of; if the first eigenvector and the second eigenvector corresponding to each population loss function are not equivalent, the second loss function is determined as the candidate loss function.
在本申请实施例中,基于特征向量的等价性验证有效地将等价的损失函数挑选出来,避免对和当前种群损失函数集中的损失函数进行重复代理任务评估,从而有效提高了损失函数的搜索效率。In the embodiments of this application, equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
第四方面,一种自动搜索的性能预测模型的训练装置,该装置包括获取单元和处理单元:获取单元用于,获取第一训练数据集,第一训练数据包括样本数据和样本数据对应的评估分数;处理单元用于,根据第一训练数据集对性能预测模型进行训练,得到目标性能预测模型,其中性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数。The fourth aspect is a training device for an automatically searched performance prediction model. The device includes an acquisition unit and a processing unit: the acquisition unit is used to acquire a first training data set. The first training data includes sample data and evaluations corresponding to the sample data. score; the processing unit is used to train the performance prediction model according to the first training data set to obtain the target performance prediction model, where the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
在本申请实施例中,通过结合可微分排序损失函数和回归损失函数得到的性能预测模型的损失函数,相比于需要具有精准预测候选的绝对性能指标的能力而言,本申请提出的性能预测模型的损失函数更加灵活,并且训练得到的性能预测模型的预测准确性也得到了提高,进而将训练好的性能预测模型加入自动搜索中可以提高自动搜索的效率和准确性。In the embodiment of this application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to the ability to accurately predict the absolute performance indicators of the candidates. The performance prediction proposed by this application The loss function of the model is more flexible, and the prediction accuracy of the trained performance prediction model is also improved. Adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search.
在某些可能的实现方式中,回归损失函数为均方误差损失函数LMSEIn some possible implementations, the regression loss function is the mean square error loss function L MSE .
在某些可能的实现方式中,装置还包括更新单元:更新单元用于,更新第一训练数据集;处理单元用于,当第一训练数据集的增量到达第一阈值时,根据更新后的第一训练数据集对目标性能预测模型进行训练,得到更新后的目标性能预测模型。In some possible implementations, the device further includes an update unit: the update unit is used to update the first training data set; the processing unit is used to, when the increment of the first training data set reaches the first threshold, according to the updated The first training data set is used to train the target performance prediction model, and the updated target performance prediction model is obtained.
在本申请实施例中,在目标性能预测模型的训练过程中,利用目标性能预测模型推理过程中获得的潜力数据对第一训练数据集进行更新,进而不断训练更新目标性能预测模型,可以提高搜索结果的表现,提升搜索空间的探索能力。In the embodiment of the present application, during the training process of the target performance prediction model, the first training data set is updated using the potential data obtained during the inference process of the target performance prediction model, and then the target performance prediction model is continuously trained and updated, which can improve the search The performance of the results improves the exploration ability of the search space.
第五方面,提供了一种自动搜索装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面以及第一方面中的任意一种实现方式中的方法。In a fifth aspect, an automatic search device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, the processing The processor is used to execute the first aspect and the method in any implementation manner of the first aspect.
上述第五方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(***)为机器学习全定制的人工智能加速器专用集成电路。The processor in the fifth aspect mentioned above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor. The neural network computing processor here can include a graphics processor (graphics processing unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator special integrated circuit fully customized by Google for machine learning.
第六方面,提供了一种自动搜索的性能预测模型训练装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第二方面以及第二方面中的任意一种实现方式中的方法。In a sixth aspect, an automatic search performance prediction model training device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed When, the processor is configured to execute the second aspect and the method in any one implementation of the second aspect.
上述第六方面中的处理器既可以是中央处理器,也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器、神经网络处理器和张量处理器等等。其中,TPU是谷歌为机器学习全定制的人工智能加速器专用集成电路。The processor in the above-mentioned sixth aspect can be either a central processing unit or a combination of a CPU and a neural network operation processor. The neural network operation processor here can include a graphics processor, a neural network processor and a tensor processor. etc. Among them, TPU is an artificial intelligence accelerator dedicated integrated circuit fully customized by Google for machine learning.
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第二方面中的任意一种实现方式中的方法。In a seventh aspect, a computer-readable medium is provided. The computer-readable medium stores program code for device execution. The program code includes a method for executing any one of the implementation methods of the first aspect or the second aspect. .
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方法。An eighth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any implementation of the first aspect or the second aspect.
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。A ninth aspect provides a chip. The chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes any one of the first aspect or the second aspect. Methods in the implementation.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令, 所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may also include a memory, and instructions are stored in the memory, The processor is configured to execute instructions stored on the memory. When the instructions are executed, the processor is configured to execute the method in any implementation of the first aspect or the second aspect.
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。The above-mentioned chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
附图说明Description of the drawings
图1是本申请实施例提供的一种人工智能主体框架示意图;Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application;
图2是本申请实施例提供了一种***架构100;Figure 2 is a system architecture 100 provided by an embodiment of the present application;
图3是本申请实施例提供的一种训练装置部署示意图;Figure 3 is a schematic diagram of the deployment of a training device provided by an embodiment of the present application;
图4是本申请实施例提供了一种AutoML服务平台上的处理流程的示意图;Figure 4 is a schematic diagram of the processing flow on the AutoML service platform provided by this embodiment of the application;
图5是本申请实施例提供的一种自动搜索的性能预测模型的训练方法流程示意图;Figure 5 is a schematic flowchart of a training method for an automatic search performance prediction model provided by an embodiment of the present application;
图6是本申请实施例提供的一种tanh(·)函数的曲线和sign(·)函数曲线可视化对比示意图;Figure 6 is a schematic diagram for visual comparison between the tanh(·) function curve and the sign(·) function curve provided by the embodiment of the present application;
图7为本申请实施例提供的一种自动搜索方法的流程示意图;Figure 7 is a schematic flowchart of an automatic search method provided by an embodiment of the present application;
图8是本申请实施例提供的一种性能预测模型的训练和推理的整体流程示意图;Figure 8 is a schematic diagram of the overall flow of training and inference of a performance prediction model provided by an embodiment of the present application;
图9是本申请实施例提供的一种损失函数中第一计算图的示意图;Figure 9 is the first calculation diagram of a loss function provided by the embodiment of the present application. schematic diagram;
图10是本申请实施例提供的一种获取候选损失函数的流程示意图;Figure 10 is a schematic flowchart of obtaining a candidate loss function provided by an embodiment of the present application;
图11是本申请实施例提供的一种GMS损失函数搜索方法的流程示意图;Figure 11 is a schematic flow chart of a GMS loss function search method provided by an embodiment of the present application;
图12是本申请实施例提供的一种计算图的变异方式示意图;Figure 12 is a schematic diagram of a variation method of a calculation graph provided by an embodiment of the present application;
图13是本申请实施例提供的一种训练性能预测模型过程中的损失函数是否包括可微分的排序损失函数的效果对比示意图;Figure 13 is a schematic diagram comparing the effects of whether the loss function includes a differentiable ranking loss function in the process of training a performance prediction model provided by an embodiment of the present application;
图14是本申请实施例提供的一种自动损失函数搜索中是否加入潜力损失函数选择模块的效果对比示意图;Figure 14 is a schematic diagram comparing the effects of whether to add a potential loss function selection module in an automatic loss function search provided by an embodiment of the present application;
图15是本申请实施例提供的自动搜索的性能预测模型训练装置的示意性框图;Figure 15 is a schematic block diagram of an automatic search performance prediction model training device provided by an embodiment of the present application;
图16是本申请实施例提供的自动搜索装置的示意性框图;Figure 16 is a schematic block diagram of an automatic search device provided by an embodiment of the present application;
图17是本申请实施例提供的自动搜索的性能预测模型训练装置的示意性框图;Figure 17 is a schematic block diagram of an automatic search performance prediction model training device provided by an embodiment of the present application;
图18是本申请实施例提供的自动搜索装置的示意性框图。Figure 18 is a schematic block diagram of an automatic search device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in this application will be described below with reference to the accompanying drawings.
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能***总体工作流程,适用于通用的人工智能领域需求。Figure 1 shows a schematic diagram of an artificial intelligence main framework. The main framework describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
下面从“智能信息链”(水平轴)和“信息技术(information technology,IT)价值链”(垂直轴)两个维度对上述人工智能主题框架进行详细的阐述。The above artificial intelligence theme framework is elaborated below from the two dimensions of "intelligent information chain" (horizontal axis) and "information technology (IT) value chain" (vertical axis).
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。"Intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到***的产业生态过程,反映人工智能为信息技术产业带来的价值。The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
(1)基础设施: (1)Infrastructure:
基础设施为人工智能***提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
基础设施可以通过传感器与外部沟通,基础设施的计算能力可以由智能芯片提供。The infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
这里的智能芯片可以是中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专门应用的集成电路(application specific integrated circuit,ASIC)以及现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片。The smart chip here can be a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), or an application specific integrated circuit. Integrated circuit, ASIC) and field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips.
基础设施的基础平台可以包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。The basic platform of infrastructure can include distributed computing framework and network related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
例如,对于基础设施来说,可以通过传感器和外部沟通获取数据,然后将这些数据提供给基础平台提供的分布式计算***中的智能芯片进行计算。For example, for infrastructure, data can be obtained through sensors and external communication, and then the data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据:(2)Data:
基础设施的上一层的数据用于表示人工智能领域的数据来源。该数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有***的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. This data involves graphics, images, voice, and text, as well as IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理:(3)Data processing:
上述数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等处理方式。The above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能***中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
(4)通用能力:(4) General abilities:
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用***,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
(5)智能产品及行业应用:(5) Intelligent products and industry applications:
智能产品及行业应用指人工智能***在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminal, etc.
本申请实施例中的自动搜索损失函数方法可以应用在人工智能中的很多领域,例如,智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市等领域。The automatic search loss function method in the embodiment of this application can be applied to many fields in artificial intelligence, such as smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, safe cities and other fields.
具体地,本申请实施例可以具体应用在人脸识别、行人再识别和度量学习等需要使用(深度)神经网络的领域。Specifically, the embodiments of the present application can be specifically applied in fields that require the use of (deep) neural networks, such as face recognition, pedestrian re-identification, and metric learning.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiments of the present application involve the application of a large number of neural networks, for ease of understanding, the relevant terms and concepts of neural networks that may be involved in the embodiments of the present application are first introduced below.
(1)神经网络(1)Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input. The output of the arithmetic unit can be:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit.
f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层的输入。例如,激活函数可以是ReLU,tanh或sigmoid函数。f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next layer. For example, the activation function can be ReLU, tanh or sigmoid function.
神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. DNN is divided according to the position of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,系数W和偏移向量的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。Although DNN looks very complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large. The definitions of these parameters in DNN are as follows: Taking the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为 To sum up, the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
(3)损失函数(3)Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。通常地,loss越小,该深度神经网络的训练质量越高,loss越大,深度神经网络的训练质量越低。类似的,loss波动越小,训练越稳定;loss波动越大,训练越不稳定。 In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two to update the weight vector of each layer of the neural network according to the difference (of course, there is usually a process of optimization before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and continue to adjust until the deep neural network can predict the really desired target value or a value that is very close to the really desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible. Generally, the smaller the loss, the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network. Similarly, the smaller the loss fluctuation, the more stable the training; the larger the loss fluctuation, the more unstable the training.
目前损失函数的类型有很多,大致上可以根据损失函数应用的任务类型来进行划分,例如,应用于回归问题的回归损失函数:均方误差(mean square error,MSE)损失函数、平均绝对误差(mean absolute error,MAE)损失函数、均方误差对数(mean squared logarithmic error,MSLE)损失函数以及平均绝对百分比误差(mean absolute percentage error,MAPE)损失函数等;应用于分类问题的分类损失函数:logistic损失函数、负对数似然损失函数(negative log likelihood loss)、交叉熵损失函数(cross entropy loss)、Hinge损失函数以及指数(exponential)损失函数;应用于度量学习任务的三元组损失函数。应理解,本申请实施例的自动搜索损失函数的方法可以应用于任何一种损失函数,本申请实施例对损失函数的类型不作限制,下面以常用的交叉熵损失函数为例对本申请的自动搜索损失函数的方法进行说明。There are currently many types of loss functions, which can be roughly divided according to the type of tasks the loss function is applied to. For example, the regression loss function applied to regression problems: mean square error (MSE) loss function, mean absolute error ( mean absolute error (MAE) loss function, mean squared logarithmic error (MSLE) loss function, mean absolute percentage error (MAPE) loss function, etc.; classification loss function applied to classification problems: Logistic loss function, negative log likelihood loss function (negative log likelihood loss function), cross entropy loss function (cross entropy loss function), Hinge loss function and exponential (exponential) loss function; triplet loss function applied to metric learning tasks . It should be understood that the method of automatically searching the loss function in the embodiment of the present application can be applied to any kind of loss function. The embodiment of the present application does not limit the type of the loss function. The following uses the commonly used cross-entropy loss function as an example to automatically search the loss function in the present application. The loss function method is explained.
示例性地,交叉熵损失函数可以是间隔softmax(margin-based softmax,MS)损失函数,也可以是广义间隔softmax(generalized margin-based softmax,GMS)损失函数。MS损失函数具体形式如公式(1)所示,GMS损失函数具体形式如公式(2)所示。
For example, the cross-entropy loss function may be a margin-based softmax (MS) loss function or a generalized margin-based softmax (GMS) loss function. The specific form of the MS loss function is shown in formula (1), and the specific form of the GMS loss function is shown in formula (2).
其中,MS损失函数中的t(x)为一个定义域为[-1,1]函数。为神经网络模型的预测输出值,y为神经网络模型的目标值。
Among them, in the MS loss function t(x) is a function whose domain is [-1, 1]. is the predicted output value of the neural network model, and y is the target value of the neural network model.
其中,n(x)也是一个定义域为[-1,1]函数。当n(x)=x时,MS损失函数为GMS损失函数的一种特例。GMS损失函数中t(x)的n(x)和的常用的具体形式如表1所示。Among them, n(x) is also a function whose domain is [-1, 1]. When n(x)=x, the MS loss function is a special case of the GMS loss function. The commonly used specific forms of the sum of n(x) of t(x) in the GMS loss function are shown in Table 1.
表1.GMS损失函数中的t(x),n(x)
Table 1. t(x), n(x) in GMS loss function
(4)计算图(4) Calculation graph
计算图(graph),又称数据流图,被定义为有向无环图(directed acyclic graph,DAG)。张量和运算单元都是图中的对象,运算单元是图的节点,张量是图的边上流动的数据。无环(acyclic)是指图不能有循环,例如,张量x不能成为生成x的某一层的输入。唯一允许的处理循环(即循环连接)是循环层的内部循环。Computational graph (graph), also known as data flow graph, is defined as a directed acyclic graph (DAG). Tensors and operation units are both objects in the graph. Operation units are the nodes of the graph, and tensors are the data flowing on the edges of the graph. Acyclic means that the graph cannot have cycles. For example, the tensor x cannot be the input of a layer that generates x. The only processing loops (i.e. loop connections) allowed are the inner loops of the loop layer.
大多数深度学习框架可以使用一个有向无环图来描述,在这个有向无环图中每个节点代表一个神经元,如果一个节点的输出作为另一个节点的输入,这两个节点共享一条边。也就是,在这个计算图中的节点表示算子,节点和节点之间的边表示两个节点之间有数据依赖关系。Most deep learning frameworks can be described using a directed acyclic graph. In this directed acyclic graph, each node represents a neuron. If the output of one node serves as the input of another node, the two nodes share a line. side. That is, the nodes in this calculation graph represent operators, and the edges between nodes represent data dependencies between the two nodes.
(5)边缘设备(5)Edge devices
边缘设备是指在数据产生源头和云中心之间任一具有计算资源和网络资源的设备。比 如,手机就是人与云中心之间的边缘设备,网关是智能家居和云中心之间的边缘设备。在理想环境中,边缘设备指是指在数据产生源附近分析或处理数据的设备。由于没有数据的流转,进而减少网络流量和响应时间。Edge devices refer to any device with computing resources and network resources between the data generation source and the cloud center. Compare For example, mobile phones are the edge devices between people and the cloud center, and gateways are the edge devices between smart homes and the cloud center. In an ideal world, edge devices are devices that analyze or process data close to the source of the data. Since there is no data flow, network traffic and response time are reduced.
本申请实施例中的边缘设备可以是具有计算能力的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能家居、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等。可以理解的是,本申请实施例对边缘设备的具体形式不作限定。The edge device in the embodiment of the present application may be a mobile phone, a tablet personal computer (TPC), a media player, a smart home, a laptop computer (LC), or a personal digital assistant with computing capabilities. assistant (PDA), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc. It can be understood that the embodiments of the present application do not limit the specific form of the edge device.
图2是本申请实施例提供了一种***架构100。在图2中,数据采集设备160用于采集训练数据。例如,针对本申请实施例的数据处理来说,若数据为图像数据,则训练数据可以包括训练图像以及训练图像对应的分类结果,其中,训练图像的分类结果可以是人工预先标注的结果。Figure 2 shows a system architecture 100 provided by an embodiment of the present application. In Figure 2, data collection device 160 is used to collect training data. For example, for the data processing in this embodiment of the present application, if the data is image data, the training data may include training images and classification results corresponding to the training images, where the classification results of the training images may be manually pre-annotated results.
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。After collecting the training data, the data collection device 160 stores the training data into the database 130, and the training device 120 trains to obtain the target model/rules 101 based on the training data maintained in the database 130.
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的原始数据进行处理,将输出值与目标值进行对比,直到训练设备120输出的值与目标值的差值小于一定的阈值,从而完成目标模型/规则101的训练。The following describes how the training device 120 obtains the target model/rule 101 based on the training data. The training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value is The value is less than a certain threshold, thereby completing the training of the target model/rule 101.
上述目标模型/规则101能够用于实现本申请实施例的数据处理。本申请实施例中的目标模型/规则101具体可以为神经网络模型。例如,深度神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。The above-mentioned target model/rule 101 can be used to implement data processing in the embodiment of the present application. The target model/rule 101 in the embodiment of this application may specifically be a neural network model. For example, deep neural networks. It should be noted that in actual applications, the training data maintained in the database 130 may not necessarily be collected by the data collection device 160, but may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rules 101 based entirely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a guide for this application. Limitations of Examples.
根据训练设备120训练得到的目标模型/规则101可以应用于不同的***或设备中,如应用于图2所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图2中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,输入数据在本申请实施例中可以包括:客户设备输入的待处理的数据。The target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as to the execution device 110 shown in Figure 2. The execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR), AR/virtual reality (VR), vehicle-mounted terminals, etc., or servers or clouds, etc. In Figure 2, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices. The user can input data to the I/O interface 112 through the client device 140. Input data In the embodiment of this application, it may include: data to be processed input by the client device.
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储***150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储***150中。When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processes, the execution device 110 can call data, codes, etc. in the data storage system 150 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150 .
最后,I/O接口112将处理结果,如上述得到的数据的处理结果返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing result, such as the processing result of the data obtained above, to the client device 140, thereby providing it to the user.
值得说明的是,训练设备120可以针对不同的目标或不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth mentioning that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the above. tasks to provide users with the desired results.
在图2中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入 数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 2 , the user can manually enter the input data, and the manual input can be operated through the interface provided by the I/O interface 112 . Alternatively, client device 140 may automatically send input to I/O interface 112 Data, if requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140 . The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc. The client device 140 can also be used as a data collection end to collect the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as new sample data, and store them in the database 130 . Of course, it is also possible to collect without going through the client device 140. Instead, the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure. The data is stored in database 130.
值得注意的是,图2仅是本申请实施例提供的一种***架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储***150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储***150置于执行设备110中。It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
如图2所示,通过训练设备120训练得到目标模型/规则101的过程中使用的损失函数,可以是通过本申请实施例自动搜索损失函数的方法得到的损失函数。As shown in FIG. 2 , the loss function used in the process of obtaining the target model/rule 101 through training by the training device 120 may be a loss function obtained by automatically searching for a loss function according to the embodiment of the present application.
图3是本申请实施例提供的一种训练装置部署示意图,如图3的(a)所示,训练装置310可以部署在在云环境中,云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,所述云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。Figure 3 is a schematic diagram of the deployment of a training device provided by an embodiment of the present application. As shown in (a) of Figure 3, the training device 310 can be deployed in a cloud environment. The cloud environment uses basic resources to provide users with information in a cloud computing mode. The entity of the cloud service. The cloud environment includes a cloud data center and a cloud service platform. The cloud data center includes a large number of basic resources (including computing resources, storage resources and network resources) owned by the cloud service provider. The computing resources included in the cloud data center can be a large amount of computing resources. Device (e.g. server).
训练装置310可以是云数据中心中进行神经网络模型训练的服务器,或者还可以是对神经网络模型进行训练的虚拟机。The training device 310 may be a server in a cloud data center that trains a neural network model, or it may be a virtual machine that trains a neural network model.
训练装置310还可以是部署在云数据中心中的服务器或者虚拟机上的软件装置,该软件装置用于对神经网络模型进行训练,该软件装置可以分布式地部署在多个服务器上、或者分布式地部署在多个虚拟机上、或者分布式地部署在虚拟机和服务器上。The training device 310 can also be a software device deployed on a server or a virtual machine in a cloud data center. The software device is used to train a neural network model. The software device can be distributed on multiple servers or distributed. Deployed on multiple virtual machines in a formal manner, or deployed on virtual machines and servers in a distributed manner.
如图3所示,训练装置310可以由云服务提供商在云服务平台抽象成一种训练神经网络模型的云服务提供给用户,用户在云服务平台购买该云服务后,云环境利用该云服务向用户提供训练神经网络的云服务。As shown in Figure 3, the training device 310 can be abstracted by the cloud service provider into a cloud service for training neural network models on the cloud service platform and provided to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment uses the cloud service. Provide users with cloud services for training neural networks.
例如,如图3的(b)所示,用户可以通过应用程序接口(application program interface,API)或者通过云服务平台提供的网页界面上传待训练的神经网络模型(进一步地还可以上传原始训练集)至云环境,由训练装置310接收待训练的神经网络以及原始训练集,通过自动搜索模块311进行自动搜索(例如,自动搜索损失函数),将搜索得到的搜索结果(例如,损失函数)输入到模型训练模块312中对待训练的神经网络模型进行训练,最终训练得到的目标神经网络由训练装置310返回至用户所在的边缘设备,边缘设备在前文中有详细介绍,在此不作赘述。其中,自动搜索模块311中包括训练好的自动搜索的性能预测模型。For example, as shown in (b) of Figure 3, users can upload the neural network model to be trained through the application program interface (API) or through the web interface provided by the cloud service platform (furthermore, they can also upload the original training set ) to the cloud environment, the training device 310 receives the neural network to be trained and the original training set, performs automatic search (for example, automatic search for loss function) through the automatic search module 311, and inputs the search results (for example, loss function) obtained from the search. The neural network model to be trained is trained in the model training module 312, and the finally trained target neural network is returned to the edge device where the user is located by the training device 310. The edge device is described in detail in the previous article and will not be described in detail here. Among them, the automatic search module 311 includes a trained automatic search performance prediction model.
例如,用户可以通过应用程序接口或者通过云服务平台提供的网页界面上传目标任务的类型至云环境,进一步地,还可以上传原始训练集,由训练装置接收目标任务类型以及原始训练集,通过自动搜索模块311进行自动搜索(例如,自动搜索损失函数),将搜索得到的搜索结果(例如,损失函数)输入到模型训练模块312中对目标任务的类型对应的神经网络模型进行训练,最终训练得到的目标神经网络由训练装置310返回至用户所在的边缘设备。 For example, the user can upload the target task type to the cloud environment through the application program interface or the web interface provided by the cloud service platform. Furthermore, the user can also upload the original training set, and the training device receives the target task type and the original training set, and automatically The search module 311 performs an automatic search (for example, automatically searches for a loss function), and inputs the search results (for example, a loss function) obtained by the search into the model training module 312 to train a neural network model corresponding to the type of the target task, and finally trains The target neural network is returned by the training device 310 to the edge device where the user is located.
以待训练的模型为图像处理模型为例,用户可以通过应用程序接口或者通过云服务平台提供的网页界面上传目标任务的类型为图像处理(例如人脸识别或物体检测等)至云环境,由训练装置310接收目标任务类型和原始训练集,通过自动搜索模块311进行自动搜索(例如,自动搜索损失函数),将搜索得到的搜索结果(例如,损失函数)输入到模型训练模块312中对目标任务的类型对应的神经网络模型进行训练,最终训练得到的图像处理模型由训练装置返回至用户所在的边缘设备。Taking the model to be trained as an image processing model as an example, the user can upload the target task type as image processing (such as face recognition or object detection, etc.) to the cloud environment through the application program interface or the web interface provided by the cloud service platform. The training device 310 receives the target task type and the original training set, performs an automatic search (for example, automatic search for a loss function) through the automatic search module 311, and inputs the search results (for example, a loss function) obtained by the search into the model training module 312 for the target The neural network model corresponding to the type of task is trained, and the finally trained image processing model is returned to the edge device where the user is located from the training device.
上述训练装置310可以为如图3的(a)所示部署在云环境中;或者,上述训练装置310也可以为终端设备,此时,执行设备310可以部署在用户终端侧,本申请实施例对此并不限定。The above-mentioned training device 310 can be deployed in a cloud environment as shown in (a) of Figure 3; or the above-mentioned training device 310 can also be a terminal device. In this case, the execution device 310 can be deployed on the user terminal side. Embodiments of the present application There is no limit to this.
神经网络模型的性能受到很多因素的影响,例如,神经网络模型的架构、训练过程、正则化方法、超参数以及损失函数等。目前大部分提高神经网络模型的性能的方法往往都是通过手工设计神经网络模型的架构,或者手工设计损失函数。随着AutoML的兴起,自动搜索损失函数、神经网络模型的架构或者超参数也成为了可能。AutoML能够根据用户输入训练数据和目标任务,提供相应的服务。The performance of a neural network model is affected by many factors, such as the architecture of the neural network model, the training process, regularization methods, hyperparameters, and loss functions. At present, most methods to improve the performance of neural network models are often by manually designing the architecture of the neural network model or manually designing the loss function. With the rise of AutoML, it has become possible to automatically search for loss functions, neural network model architecture or hyperparameters. AutoML can provide corresponding services based on user input training data and target tasks.
图4是本申请实施例提供了一种AutoML服务平台上的处理流程的示意图。AutoML服务平台基于用户提供的训练数据以及目标任务提供相应的服务。如图4所示,AutoML服务平台通过执行一种或多种搜索操作得到满足用户需求的方案。AutoML服务平台可以执行的搜索操作包括数据增强策略搜索、模型结构搜索、损失函数搜索以及超参数搜索等。其中,数据增强策略搜索、模型结构搜索、损失函数搜索以及超参数搜索均为可选操作。例如,若用户提供了模型结构,则无需执行模型结构搜索。Figure 4 is a schematic diagram of the processing flow on the AutoML service platform provided by this embodiment of the present application. The AutoML service platform provides corresponding services based on the training data and target tasks provided by users. As shown in Figure 4, the AutoML service platform obtains solutions that meet user needs by performing one or more search operations. The search operations that the AutoML service platform can perform include data enhancement strategy search, model structure search, loss function search, and hyperparameter search. Among them, data enhancement strategy search, model structure search, loss function search and hyperparameter search are all optional operations. For example, if the user provides a model structure, there is no need to perform a model structure search.
具体地,自动搜索方法可以采用本申请实施例中的方法执行,得到满足需求的搜索结果。具体的损失函数自动搜索方法的描述详见后文中图11中的描述。Specifically, the automatic search method can be executed using the method in the embodiment of the present application to obtain search results that meet the requirements. For a description of the specific loss function automatic search method, see the description in Figure 11 below.
AutoML服务平台的输出是根据用户的需求确定的。本申请实施例中,AutoML服务平台的输出可以包括目标神经网络模型和/或损失函数。例如,用户提供的训练数据为样本图像,目标任务为人脸识别任务,则AutoML服务平台可以输出能够用于执行人脸识别任务的目标神经网络模型。再如,用户提供的训练数据为样本图像,目标任务为人脸识别任务,用户要求输出用于目标神经网络模型训练的损失函数,则AutoML服务平台可以输出能够用于执行人脸识别任务的目标神经网络模型和损失函数。再如,用户提供的训练数据为样本图像,目标任务为人脸识别,用户还提供了神经网络模型的结构,要求输出目标神经网络模型中的损失函数,则AutoML服务平台可以输出能够用于执行人脸识别任务的目标神经网络模型训练过程中的损失函数。The output of the AutoML service platform is determined based on the user's needs. In this embodiment of the present application, the output of the AutoML service platform may include the target neural network model and/or loss function. For example, if the training data provided by the user is a sample image and the target task is a face recognition task, the AutoML service platform can output a target neural network model that can be used to perform the face recognition task. For another example, if the training data provided by the user is a sample image, the target task is a face recognition task, and the user requires the output of a loss function for training the target neural network model, the AutoML service platform can output the target neural network that can be used to perform the face recognition task. Network model and loss function. For another example, the training data provided by the user is a sample image, and the target task is face recognition. The user also provides the structure of the neural network model and requires the output of the loss function in the target neural network model. The AutoML service platform can output Loss function during training of target neural network model for face recognition task.
目前的自动搜索方法的搜索成本比较高,因此,如何提高自动搜索效率成为一个亟待解决的问题。本申请实施例提出了一种自动搜索的性能预测模型的训练方法,能够提高自动搜索效率的同时提高搜索结果的表现,从而提高目标神经网络模型的性能。The search cost of the current automatic search method is relatively high. Therefore, how to improve the efficiency of automatic search has become an urgent problem to be solved. The embodiment of the present application proposes a training method for an automatic search performance prediction model, which can improve the efficiency of automatic search while improving the performance of search results, thereby improving the performance of the target neural network model.
下面将结合图5至图7对本申请实施例中的自动搜索的性能预测模型的训练方法进行详细地描述。The training method of the automatic search performance prediction model in the embodiment of the present application will be described in detail below with reference to FIGS. 5 to 7 .
本申请实施例提供的性能预测模型的训练方法,具体可以应用于损失函数、神经网络框架或者超参数等的自动搜索方法,对训练数据(如本申请中的损失函数训练数据集)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的性能预测模型;并且,本申请实施例提供的自动搜索方法可以运用上述训练好的性能预测模型,将 输入数据(如本申请中的候选损失函数)输入到所述训练好的性能预测模型中,得到输出数据(如本申请中的预测指标)。需要说明的是,本申请实施例提供的性能预测模型的训练方法和自动搜索方法是基于同一个构思产生的发明,也可以理解为一个***中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。The training method of the performance prediction model provided by the embodiment of this application can be specifically applied to the automatic search method of loss function, neural network framework or hyperparameter, etc., to symbolize the training data (such as the loss function training data set in this application) and formalized intelligent information modeling, extraction, preprocessing, training, etc., to finally obtain a trained performance prediction model; and, the automatic search method provided by the embodiment of the present application can use the above-trained performance prediction model to Input data (such as the candidate loss function in this application) is input into the trained performance prediction model to obtain output data (such as the prediction index in this application). It should be noted that the performance prediction model training method and automatic search method provided in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts of a system, or two stages of an overall process. : Such as model training stage and model application stage.
图5是本申请实施例提供的一种自动搜索的性能预测模型的训练方法流程示意图。应理解图5所示的方法500可以由云环境中的训练设备来执行也可以由终端设备的训练设备来执行,本申请实施例对训练设备的具体形式不作限定。Figure 5 is a schematic flowchart of a training method for an automatic search performance prediction model provided by an embodiment of the present application. It should be understood that the method 500 shown in Figure 5 can be executed by a training device in a cloud environment or by a training device of a terminal device. The embodiment of the present application does not limit the specific form of the training device.
方法500包括步骤S510至步骤S520,下面对步骤S510至步骤S520进行详细说明。The method 500 includes steps S510 to S520, which will be described in detail below.
S510,获取第一训练数据集,所述第一训练数据包括样本数据和所述样本数据对应的评估分数。S510. Obtain a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data.
应理解,第一训练数据和自动搜索的任务有关。例如,如果自动搜索的任务是自动搜索损失函数,那么第一训练数据为经过性能评估的损失函数;或者如果自动搜索的任务是自动搜索神经网络模型结构,那么第一训练数据是经过性能评估的神经网络模型结构;或者如果自动搜索的任务是自动搜索超参数,那么第一训练数据是经过性能评估过的超参数。本申请实施例对第一训练数据的类型不作限定。本申请实施例后续以自动搜索的任务为损失函数为例进行详细说明。It should be understood that the first training data is related to the task of automatic search. For example, if the task of automatic search is to automatically search the loss function, then the first training data is the loss function that has been performance evaluated; or if the task of the automatic search is to automatically search the neural network model structure, then the first training data is the performance evaluation Neural network model structure; or if the task of automatic search is to automatically search for hyperparameters, then the first training data is the hyperparameters that have been performance evaluated. The embodiment of the present application does not limit the type of the first training data. The embodiment of this application will be described in detail later using the automatic search task as the loss function as an example.
S520,根据第一训练数据对自动搜索的性能预测模型进行训练,自动搜索的性能预测模型的损失函数包括可微分的排序损失函数和回归损失函数。S520: Train the automatic search performance prediction model based on the first training data. The loss function of the automatic search performance prediction model includes a differentiable ranking loss function and a regression loss function.
应理解,自动搜索的性能预测模型用于预测候选损失函数的性能指标,或者用于预测候选神经网络结构的性能指标,或者用于预测候选超参数的性能指标,本申请实施例对此不做限制。本申请实施例后续以预测模型用于预测损失函数性能为例进行详细说明。It should be understood that the performance prediction model of the automatic search is used to predict the performance index of the candidate loss function, or to predict the performance index of the candidate neural network structure, or to predict the performance index of the candidate hyperparameter, which is not done in the embodiment of this application. limit. The embodiments of this application will be described in detail later using the prediction model to predict the performance of the loss function as an example.
作为一种可能的实现方式,性能预测模型的损失函数通过平衡因子λ来平衡两部分损失函数。As a possible implementation, the loss function of the performance prediction model balances the two parts of the loss function through the balancing factor λ.
需要说明的是,排序损失函数中使用的排序指标可以为相似度Kendall's Tau排序指标,具体表达形式如公式(3)所示:
It should be noted that the ranking index used in the ranking loss function can be the similarity Kendall's Tau ranking index, and the specific expression is as shown in formula (3):
其中,P(xn)表示性能预测模型的输出,yn表示代理任务的性能精度,也就是真实的性能精度,B表示批数据的大小(batch size),sign(·)函数为如公式(4)的分段函数。
Among them, P(x n ) represents the output of the performance prediction model, y n represents the performance accuracy of the agent task, which is the real performance accuracy, B represents the size of the batch data (batch size), and the sign(·) function is as follows: 4) piecewise function.
由于sign(·)函数为分段函数,这导致Kendall's Tau排序指标并不可导,因此无法直接将公式(3)当作损失函数。tanh(·)函数的曲线与sign(·)函数曲线很相近,如图6所示,图6是本申请实施例提供的一种tanh(·)函数的曲线和sign(·)函数曲线可视化对比示意图。因此,将公式(3)中的第二个sign(x)函数替换为tanh(x/τ)函数,获得可微分的排序损失函数,具体形式如公式(5),其中τ为控制tanh(x/τ)替换sign(x)的强度,具体变化规律如图6所示。
Since the sign(·) function is a piecewise function, the Kendall's Tau ranking index is not differentiable, so formula (3) cannot be directly used as a loss function. The curve of tanh(·) function is very similar to the curve of sign(·) function, as shown in Figure 6. Figure 6 is a visual comparison between the curve of tanh(·) function and the curve of sign(·) function provided by the embodiment of the present application. Schematic diagram. Therefore, replace the second sign(x) function in formula (3) with the tanh(x/τ) function to obtain a differentiable ranking loss function. The specific form is as follows: formula (5), where τ is the control tanh(x /τ) replaces the intensity of sign(x), and the specific change rules are shown in Figure 6.
需要说明的是,在本申请实施例中使用了如公式(5)的相似度排序损失函数,还可以 使用基于其他相似度排序指标的损失函数,例如,spearman排序指标或者pearman排序指标,本申请实施例对此不作限制。应理解,spearman排序指标和pearman排序指标都是不可微分的,因此如果要使用基于这两个排序指标的损失函数,可以通过上述相似的方式得到可微分的排序损失函数,在此不作赘述。It should be noted that in the embodiment of the present application, the similarity ranking loss function such as formula (5) is used. It can also be Using a loss function based on other similarity ranking indicators, such as spearman ranking index or pearman ranking index, the embodiment of the present application does not limit this. It should be understood that both the spearman ranking index and the pearman ranking index are non-differentiable, so if you want to use a loss function based on these two ranking indicators, you can obtain a differentiable ranking loss function in a similar way as above, which will not be described in detail here.
应理解,回归损失函数可以为均方误差损失函数,平均绝对误差损失函数等,本申请实施例对此不作限制,示例性地,回归损失函数为均方误差损失函数,如公式(6)所示。
It should be understood that the regression loss function can be a mean square error loss function, a mean absolute error loss function, etc. The embodiments of the present application are not limited to this. For example, the regression loss function is a mean square error loss function, as shown in formula (6) Show.
其中,xn为预测模型的输入数据的特征表示,例如,当预测模型的输入数据为候选损失函数时,xn为候选损失函数的特征向量,其中,n∈[1,N]的正整数,N表示候选损失函数的数量。Among them, x n is the feature representation of the input data of the prediction model. For example, when the input data of the prediction model is a candidate loss function, x n is the feature vector of the candidate loss function, where n∈[1,N] is a positive integer , N represents the number of candidate loss functions.
作为一种可能的实现方式,性能预测模型的损失函数如公式(7)所示:
As a possible implementation, the loss function of the performance prediction model is shown in formula (7):
由于仅使用回归损失函数(例如MSE损失函数),性能预测模型必须具有精准预测候选数据的绝对性能指标的能力,而在性能预测模型训练过程中往往由于搜索空间过大而使用少量的数据进行训练,因此,仅使用回归损失函数作为性能预测模型的损失函数,容易导致性能预测模型过拟合而造成性能预测模型泛化能力弱。因此,在本申请实施例中,通过结合可微分排序损失函数和回归损失函数得到的性能预测模型的损失函数,相比于仅包括需要具有精准预测候选的绝对性能指标的能力的损失函数而言,本申请提出的性能预测模型的损失函数更加灵活,并且训练得到的性能预测模型的预测准确性也得到了提高,进而将训练好的性能预测模型加入自动搜索中可以提高自动搜索的效率和准确性。例如,在自动搜索损失函数中,加入了本申请实施例的性能预测模型可以提高损失函数的搜索效率,并且搜索到的损失函数的表现也更好。Since only regression loss functions (such as MSE loss functions) are used, the performance prediction model must have the ability to accurately predict the absolute performance indicators of candidate data. However, during the training process of the performance prediction model, a small amount of data is often used for training due to the large search space. , therefore, only using the regression loss function as the loss function of the performance prediction model can easily lead to overfitting of the performance prediction model and weak generalization ability of the performance prediction model. Therefore, in the embodiment of the present application, the loss function of the performance prediction model obtained by combining the differentiable ranking loss function and the regression loss function is compared to a loss function that only includes the ability to accurately predict the absolute performance index of the candidate. , the loss function of the performance prediction model proposed in this application is more flexible, and the prediction accuracy of the trained performance prediction model has also been improved. Then adding the trained performance prediction model to automatic search can improve the efficiency and accuracy of automatic search. sex. For example, in the automatic search for a loss function, adding the performance prediction model of the embodiment of the present application can improve the search efficiency of the loss function, and the performance of the searched loss function is also better.
图7为本申请实施例提供的一种自动搜索方法的流程示意图,下面将通过步骤S701至步骤S704对图7进行详细说明。Figure 7 is a schematic flowchart of an automatic search method provided by an embodiment of the present application. Figure 7 will be described in detail below through steps S701 to S704.
S701,获取至少两个候选数据,至少两个候选数据为待进行代理任务评估的数据。S701: Obtain at least two candidate data, and at least two candidate data are data to be evaluated for the agent task.
需要说明的是,后续将结合图9至图12以候选数据为损失函数为例对S701进行详细说明。It should be noted that S701 will be described in detail later with reference to Figures 9 to 12, taking the candidate data as the loss function as an example.
应理解,代理任务评估可以是人脸识别任务、行人再识别任务、分类任务或者是度量学习等,本申请实施例对此不作限制。It should be understood that the agent task evaluation can be a face recognition task, a pedestrian re-identification task, a classification task, or metric learning, etc., and the embodiments of the present application are not limited to this.
S702,将至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数,第一训练数据集包括样本数据和样本数据对应的评估分数。S702. Input at least two candidate data into the target performance prediction model to obtain prediction indicators corresponding to at least two candidate data. The target performance prediction model is obtained by training the performance prediction model based on the first training data set. Performance The loss function of the prediction model includes a differentiable ranking loss function L K and a regression loss function. The first training data set includes sample data and evaluation scores corresponding to the sample data.
应理解,目标性能损失函数为通过图6描述的方式训练好的目标性能预测模型。It should be understood that the target performance loss function is a target performance prediction model trained in the manner described in Figure 6 .
应理解,性能预测模型输出得到的预测指标和第一训练数据集中样本数据对应的评估分数使用的相同的度量,不同的是预测指标是候选数据的预测结果,评估分数是样本数据对应的实际结果。预测指标与实际的代理任务相关,例如在行人再识别中,预测指标是mAP,在分类任务中,预测指标可以是精度,本申请实施例对此不作限制。It should be understood that the prediction index obtained by the performance prediction model output and the evaluation score corresponding to the sample data in the first training data set use the same metric. The difference is that the prediction index is the prediction result of the candidate data, and the evaluation score is the actual result corresponding to the sample data. . The prediction index is related to the actual agent task. For example, in pedestrian re-identification, the prediction index is mAP. In the classification task, the prediction index may be accuracy. This is not limited in the embodiment of the present application.
S703,根据至少两个候选数据对应的预测指标,对至少两个候选数据中的部分候选数 据进行代理任务评估。S703: According to the prediction indicators corresponding to the at least two candidate data, select some candidate numbers in the at least two candidate data. Perform agent task assessment based on data.
应理解,至少两个候选数据中的部分候选数据表示,用于后续代理任务评估的数据要比至少两个候选数据的数据数量少,可以称部分数据为潜力数据,其中,代理任务评估主要用于获得潜力数据的实际评估分数,例如潜力损失函数的实际评估分数。It should be understood that part of the candidate data among at least two candidate data indicates that the data used for subsequent agent task evaluation is smaller than the number of data for at least two candidate data, and part of the data can be called potential data. Among them, agent task evaluation is mainly used. To obtain the actual evaluation score of the potential data, such as the actual evaluation score of the potential loss function.
作为一种可能的实现方式,对至少两个候选数据中预测指标最好的候选数据进行代理任务评估。As a possible implementation, the proxy task is evaluated on the candidate data with the best predictor among at least two candidate data.
作为一种可能的实现方式,对至少两个候选数据中预测指标前百分之P的候选数据进行代理任务评估,1≤P<100,P为实数。As a possible implementation method, perform proxy task evaluation on the candidate data with the top P percentile of prediction indicators among at least two candidate data, 1≤P<100, and P is a real number.
可选地,S704,将经过代理任务评估后的部分数据集,加入种群数据集中。Optionally, S704, add part of the data set evaluated by the agent task to the population data set.
应理解,种群数据集用于确定目标搜索结果,种群数据集包括样本数据和样本数据对应的评估分数,也就是种群数据集中的数据为已经经过代理任务评估得到实际评估分数的数据。种群数据集中的数据量是固定不变。可选地,将目前种群数据集中的排在第一个的数据淘汰。It should be understood that the population data set is used to determine the target search results. The population data set includes sample data and evaluation scores corresponding to the sample data. That is, the data in the population data set are data that have been evaluated by the agent task to obtain actual evaluation scores. The amount of data in a population dataset is fixed. Optionally, eliminate the first ranked data in the current population data set.
在本申请实施例中,通过将训练好的性能预测模型加入到自动搜索中,不仅可以提高搜索效率,通过性能预测模型筛选出来潜力候选数据的表现更好,进而提高了目标搜索结果的表现性能。例如,在损失函数自动搜索中,将性能预测模型加入到自动搜索过程中,不仅可以提高搜索空间的探索,还能提高目标损失函数的表现。In the embodiment of this application, by adding the trained performance prediction model to the automatic search, not only can the search efficiency be improved, but the potential candidate data screened out by the performance prediction model will perform better, thereby improving the performance of the target search results. . For example, in the automatic search of the loss function, adding the performance prediction model to the automatic search process can not only improve the exploration of the search space, but also improve the performance of the target loss function.
下面将结合图8具体说明性能预测模型的训练和推理的整体流程。图8是本申请实施例提供的一种性能预测模型的训练和推理的整体流程示意图。The overall process of training and inference of the performance prediction model will be explained in detail below with reference to Figure 8. Figure 8 is a schematic diagram of the overall flow of training and inference of a performance prediction model provided by an embodiment of the present application.
如图8所示,性能预测模型的训练过程8100和推理过程8200可以是穿插在一起进行的,图8所示的过程还可以称为是潜力数据选择过程。As shown in Figure 8, the training process 8100 and the inference process 8200 of the performance prediction model can be interspersed together. The process shown in Figure 8 can also be called a potential data selection process.
需要说明的是,对于性能预测模型的训练过程8100可以是如图6所示的经过一次训练就得到目标性能预测模型,也可以是如图8所示的对第一训练数据集不断增加更新后训练得到目标性能预测模型。下面结合图8说明目标性能预测模型的不断更新过程。It should be noted that the training process 8100 for the performance prediction model may be to obtain the target performance prediction model after one training as shown in Figure 6, or it may be to continuously increase and update the first training data set as shown in Figure 8. The target performance prediction model is obtained by training. The following describes the continuous updating process of the target performance prediction model in conjunction with Figure 8.
首先,当第一训练数据集中的带有评估分数的训练数据的数量达到E0时,将第一训练数据集输入到待训练的性能预测模型中对性能预测模型的参数进行训练,得到参数训练好的性能预测模型,也就是目标性能预测模型。其中,当第一训练数据集为损失函数训练集时,第一训练数据集可以使用表示,其中,Θi为损失函数数据集汇中每个损失函数的参数,pi是每个损失函数对应的性能。示例性地,性能预测模型可以是一维的ResNet50,也可以是其他神经网络模型,本申请实施例对此不作限制。First, when the number of training data with evaluation scores in the first training data set reaches E 0 , the first training data set is input into the performance prediction model to be trained to train the parameters of the performance prediction model to obtain parameter training A good performance prediction model is the target performance prediction model. Among them, when the first training data set is the loss function training set, the first training data set can use represents, where Θ i is the parameter of each loss function in the loss function data set, and p i is the performance corresponding to each loss function. For example, the performance prediction model may be a one-dimensional ResNet50 or other neural network model, which is not limited in the embodiment of the present application.
随后,将至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,并根据获得的预测性能,确定候选数据。其中,至少两个候选数据为待进行代理任务评估的数据。将潜力数据经过代理任务得到潜力数据的评估分数,并将该潜力数据和其对应的评估分数加入第一训练数据集中。当至少两个候选数据都经过性能预测结束后,清空至少两个候选数据,并将获取至少两个更新后的候选数据,使用和此前的至少两个候选数据的性能预测相同的方式,得到新的潜力数据,并将包括评估分数的新的潜力数据加入到第一训练数据集中。Subsequently, at least two candidate data are input into the target performance prediction model, prediction indicators corresponding to the at least two candidate data are obtained, and the candidate data is determined based on the obtained prediction performance. Among them, at least two candidate data are data to be evaluated by the agent task. The potential data is passed through the agent task to obtain the evaluation score of the potential data, and the potential data and its corresponding evaluation score are added to the first training data set. When at least two candidate data have passed the performance prediction, at least two candidate data will be cleared, and at least two updated candidate data will be obtained, using the same method as the previous performance prediction of at least two candidate data, to obtain the new potential data, and new potential data including evaluation scores are added to the first training data set.
最后,经过多次迭代预测,如图8的性能预测模型推理过程8200所示,经过目标性能预测模型的多次性能预测,得到多个潜力数据,直到第一训练数据集的增量ΔE达到第一阈值时,根据更新后的第一训练数据集对目标性能预测模型进行更新。 Finally, after multiple iterative predictions, as shown in the performance prediction model inference process 8200 of Figure 8, multiple potential data are obtained through multiple performance predictions of the target performance prediction model until the increment ΔE of the first training data set reaches the When a threshold is reached, the target performance prediction model is updated based on the updated first training data set.
示例性地,当第一训练数据集为损失函数训练集,候选数据中的数据类型为损失函数,潜力损失函数选择过程可以如算法1所示。For example, when the first training data set is a loss function training set and the data type in the candidate data is a loss function, the potential loss function selection process can be as shown in Algorithm 1.
在代理任务上评估的损失函数数量达到E0时,待训练的性能预测模型将会根据当前已评估集合(损失函数的参数Θi和其相对应的性能pi)进行待训练的性能预测模型参数的训练。之后每当已评估集合的数量|Eva|增加ΔE时,参数训练好的性能预测模型的参数会再根据当前已评估集合Eva进行更新。在待训练的性能预测模型训练第一次开始,每个生成的新的通过等价性验证策略的损失函数将会被加入到选择器的候选集中,直到该候选集的数量达到预设的Np时,利用参数训练好的性能预测模型对候选数据集中每个损失函数进行性能预测,并选择其中预测性能最高的作为最有潜力的损失函数用在后续的代理任务上进行评估,此时候选数据集中的所有损失函数将被清空,等该损失函数评估完毕把相应的参数和指标加入已评估集合Eva中。
The performance prediction model to be trained when the number of loss functions evaluated on the proxy task reaches E 0 It will be based on the currently evaluated set (parameters Θ i of the loss function and its corresponding performance p i ) Perform performance prediction model to be trained parameter training. Then whenever the number of evaluated sets |Eva| increases by ΔE, the parameters of the trained performance prediction model The parameters of will be updated based on the currently evaluated set Eva. When the performance prediction model to be trained is trained for the first time, each new generated loss function that passes the equivalence verification strategy will be added to the candidate set of the selector until the number of candidate sets reaches the preset N When p , use the parameter-trained performance prediction model Predict the performance of each loss function in the candidate data set, and select the loss function with the highest prediction performance as the most potential loss function for evaluation on subsequent agent tasks. At this time, all loss functions in the candidate data set will be cleared, etc. After the loss function is evaluated, the corresponding parameters and indicators are added to the evaluated set Eva.
上述训练方法和搜索过程可以应用于AutoML的网络架构搜索,损失函数搜索以及超参数搜索等。下面将结合图9至图12展开说明将上述训练方法应用于损失函数搜索的具 体过程。下面先说明一下目前损失函数的存在的问题。The above training methods and search processes can be applied to AutoML's network architecture search, loss function search, and hyperparameter search. The specific application of the above training method to loss function search will be explained below with reference to Figures 9 to 12. body process. Let’s first explain the problems with the current loss function.
目前的损失函数搜索可以分为两类,分别为动态损失函数搜索和固定损失函数搜索。The current loss function search can be divided into two categories, namely dynamic loss function search and fixed loss function search.
其中,动态损失函数搜索是将损失函数的搜索过程内嵌在模型训练中,每次迭代训练时都会对更新产生新的损失函数,当基于固定模型和数据集的训练结束之后,动态损失函数搜索也随之结束。而搜索结束获得的损失函数仅仅适用于训练过程中的模型以及训练数据集,当训练数据集或者待训练的神经网络模型中任意一个发生改变,需要重新进行损失函数搜索,才能搜索得到目标损失函数。因此,通过动态损失函数搜索得到的目标损失函数的跨数据集和模型的迁移能力差,对于不同的数据集和神经网络模型,在训练过程中都需要耗费算力来搜索目标损失函数,并且搜索得到的目标损失函数的通用泛化能力弱。例如,自动损失函数搜索方法(AutoML for loss function search,AM-LFS)和softmax函数搜索(searching for softmax,Search-Softmax)这两种动态损失函数搜索方法。Among them, the dynamic loss function search embeds the search process of the loss function in the model training. Each iteration of training will generate a new loss function for the update. After the training based on the fixed model and data set is completed, the dynamic loss function search It also ended. The loss function obtained at the end of the search is only applicable to the model and training data set during the training process. When any of the training data set or the neural network model to be trained changes, the loss function search needs to be re-searched to obtain the target loss function. . Therefore, the target loss function obtained through dynamic loss function search has poor transferability across data sets and models. For different data sets and neural network models, it takes a lot of computing power to search for the target loss function during the training process, and search The obtained target loss function has weak generalization ability. For example, automatic loss function search method (AutoML for loss function search, AM-LFS) and softmax function search (searching for softmax, Search-Softmax) are two dynamic loss function search methods.
因此,为了克服动态损失函数搜索得到的目标损失函数泛化能力弱的缺点,产生了固定损失函数搜索方法。固定损失函数搜索是从头开始搜索损失函数以寻找通用损失函数的通用搜索方法,通过计算图建模损失函数,并利用进化算法搜索最佳的损失函数形式。例如,收敛仿真驱动的进化搜索算法(convergence simulation driven evolutionary search algorithm,CSE-Autoloss)和从零开始搜索损失函数的方法(searching loss function from scratch,AutoLoss-Zero),通过这两种固定损失函数搜索得到的目标损失函数可以迁移到其他数据集和神经网络模型上进行模型训练,然而通过进化算法搜索目标损失函数的代价往往比较大。这个搜索代价不仅仅体现在需要耗费大量时间去评估搜索到的候选损失函数以得到最优的目标损失函数,而且还体现在如果待评估的候选损失函数为性能不佳的损失函数,在这个候选损失函数评估过程中需要耗费大量时间去评估本身性能欠佳的损失函数。虽然目前有很多对这两种方法的搜索效率的提升方法,但是提升的效率仍然有限。因此,如何提高固定损失函数搜索的搜索效率成为一个亟待解决的问题。Therefore, in order to overcome the shortcomings of weak generalization ability of the target loss function obtained by dynamic loss function search, a fixed loss function search method was developed. Fixed loss function search is a general search method that searches the loss function from scratch to find a universal loss function, models the loss function through computational graphs, and uses evolutionary algorithms to search for the best loss function form. For example, convergence simulation driven evolutionary search algorithm (CSE-Autoloss) and searching loss function from scratch (AutoLoss-Zero), search through these two fixed loss functions The obtained target loss function can be transferred to other data sets and neural network models for model training. However, the cost of searching the target loss function through evolutionary algorithms is often relatively high. This search cost is not only reflected in the need to spend a lot of time to evaluate the searched candidate loss function to obtain the optimal target loss function, but also reflected in the fact that if the candidate loss function to be evaluated is a loss function with poor performance, in this candidate The loss function evaluation process requires a lot of time to evaluate the loss function itself with poor performance. Although there are currently many ways to improve the search efficiency of these two methods, the efficiency improvement is still limited. Therefore, how to improve the search efficiency of fixed loss function search has become an urgent problem to be solved.
将基于性能预测模型的潜力数据选择模块(如图7所示)应用在自动搜索损失函数的潜力损失函数上,不仅可以提升损失函数搜索算法对损失函数搜索空间的探索能力,还能提高目标损失函数的表现。后续会结合图13至图14和表3至表5详细解释探索能力的提升和目标损失函数的表现的提升。Applying the potential data selection module based on the performance prediction model (shown in Figure 7) to the potential loss function of the automatic search loss function can not only improve the loss function search algorithm's ability to explore the loss function search space, but also improve the target loss Function performance. Subsequently, the improvement of exploration capabilities and the performance of the target loss function will be explained in detail in conjunction with Figures 13 to 14 and Tables 3 to 5.
为了进一步提升搜索损失函数的效率,可以在S701获取第一候选数据集上做进一步改进,下面将结合图9至图12以损失函数为GMS损失函数为例,对如何进一步提升搜索损失函数的效率进行详细说明。In order to further improve the efficiency of the search loss function, further improvements can be made in S701 to obtain the first candidate data set. The following will combine Figure 9 to Figure 12, taking the loss function as the GMS loss function as an example, to further improve the efficiency of the search loss function. Explain in detail.
首先,GMS损失函数的具体表达形式如上述的公式(2)所示,本申请实施例中的GMS损失函数的搜索空间是通过第一函数t(x)、第二函数n(x)以及常数s来表示的,具体地,通过计算图来表示,其中第一函数t(x)对应第一计算图第二函数n(x)对应第二计算图 First, the specific expression form of the GMS loss function is as shown in the above-mentioned formula (2). The search space of the GMS loss function in the embodiment of this application is through the first function t(x), the second function n(x) and the constant s, specifically, represented by a calculation graph, where the first function t(x) corresponds to the first calculation graph The second function n(x) corresponds to the second calculation graph
应理解,对于其他损失函数也可以根据其他损失函数包括的函数个数以及和常数个数,自定义其他损失函数对应的搜索空间,本申请实施例对此不作限制。It should be understood that for other loss functions, the search space corresponding to other loss functions can also be customized based on the number of functions and the number of constants included in the other loss functions, and the embodiments of the present application do not limit this.
示例性地,结合图9以及表2对表1中所示的ciecle损失函数中第一函数t(x)对应的第一计算图进行说明,图9是本申请实施例提供的一种损失函数中第一计算图的示意图。For example, the first calculation diagram corresponding to the first function t(x) in the ciecle loss function shown in Table 1 is combined with Figure 9 and Table 2. To illustrate, Figure 9 is the first calculation diagram of a loss function provided by the embodiment of the present application. schematic diagram.
如图9所示,计算图的输入节点示出了两种类型的输入节点,一种是常数节点,另一 种输入节点表示神经网络的输出,其中,常数节点c可以为公式(8)所示的常数集合中的一个值。
As shown in Figure 9, the input node of the calculation graph shows two types of input nodes, one is a constant node, and the other The input node represents the output of the neural network, where the constant node c can be a value in the constant set shown in formula (8).
其中,Δc和Nc为预设的值,Δc为实数,Nc为正整数。Among them, Δ c and N c are preset values, Δ c is a real number, and N c is a positive integer.
算子节点表示原始的数学操作,如表2所示。图9中所示的算子操作均可以从表2中查询到相应的表达式。Operator nodes represent primitive mathematical operations, as shown in Table 2. The corresponding expressions for the operator operations shown in Figure 9 can be queried from Table 2.
表2原始的数学操作
Table 2 Original mathematical operations
输出节点用于聚合没有后续算子节点的结果。Output nodes are used to aggregate results without subsequent operator nodes.
和图9所示的第一计算图相似,每个第二计算图也可以用相同的方式表示。对于常数s,常数s采用和常数节点c相同的离散化方式,常数s可以为公式(9)所示的常数集合中的一个值。
and the first calculation graph shown in Figure 9 Similarly, each second computational graph can also be represented in the same way. For the constant s, the constant s adopts the same discretization method as the constant node c, and the constant s can be a value in the constant set shown in formula (9).
其中,Δs和Ns为预设的值,Δs为实数,Ns为正整数Among them, Δ s and N s are preset values, Δ s is a real number, and N s is a positive integer.
本申请实施例中,根据损失函数中包括的函数个数以及常数个数,使用函数个数对应的计算图以及常数来构建损失函数的搜索空间,相比于传统根据整个损失函数对应的计算图构建搜索空间,本申请实施例中的构建搜索空间方式更加细致,更加有助于搜索出性能好的目标损失函数。In the embodiment of the present application, according to the number of functions and the number of constants included in the loss function, the calculation graph corresponding to the number of functions and the constants are used to construct the search space of the loss function. Compared with the traditional calculation graph corresponding to the entire loss function, Constructing the search space, the method of constructing the search space in the embodiment of the present application is more detailed, which is more conducive to searching for a target loss function with good performance.
下面将结合图10和图11对目标损失函数的搜索整体流程做完整说明,图10是本申请实施例提供的一种获取候选损失函数的流程示意图。图11是本申请实施例提供的一种GMS损失函数搜索方法的流程示意图。The overall process of searching the target loss function will be fully described below with reference to Figures 10 and 11. Figure 10 is a schematic process diagram for obtaining candidate loss functions provided by an embodiment of the present application. Figure 11 is a schematic flowchart of a GMS loss function search method provided by an embodiment of the present application.
S1001,确定当前种群损失函数集,其中,当前种群损失函数集中包括M个当前种群损失函数和每个当前种群损失函数对应的评估分数,其中M为正整数。S1001. Determine the current population loss function set, where the current population loss function set includes M current population loss functions and the evaluation score corresponding to each current population loss function, where M is a positive integer.
如果当前种群损失函数集为初始种群损失函数集,则根据搜索空间,确定初始种群损 失函数集,每个初始种群损失函数为根据先验经验获得的。If the current population loss function set is the initial population loss function set, determine the initial population loss function set based on the search space. Loss function set, each initial population loss function is obtained based on prior experience.
应理解,每个当前种群损失函数的评估分数为每个当前种群损失函数通过代理任务评估后获得的。It should be understood that the evaluation score of each current population loss function is obtained after each current population loss function is evaluated by the agent task.
示例性地,如图11所示,根据搜索空间,获得当前潜力GMS损失函数集,每个潜力GMS损失函数通过第一计算图、第二计算图和常数s表示。其中每个潜力GMS损失函数都对应一个评估分数。Exemplarily, as shown in Figure 11, according to the search space, a current potential GMS loss function set is obtained, and each potential GMS loss function is represented by a first calculation graph, a second calculation graph and a constant s. Each potential GMS loss function corresponds to an evaluation score.
S1002,对当前种群损失函数集进行初始筛选,获得K个第一损失函数,其中K正整数。S1002: Perform initial screening on the current population loss function set to obtain K first loss functions, where K is a positive integer.
示例性地,初始筛选的具体方式可以是锦标赛选择算法或者是轮盘赌选择算法,本申请实施例对初始筛选的具体方式不作限定。以锦标赛选择算法为例,随机采样当前种群损失函数集中T(例如,T=5%)比例的损失函数,并从随机采样的损失函数中选择性能最好的作为第一损失函数a,以此重复K次,从而获得K个第一损失函数。For example, the specific method of initial screening may be a tournament selection algorithm or a roulette selection algorithm. The embodiment of the present application does not limit the specific method of initial screening. Taking the tournament selection algorithm as an example, it randomly samples a loss function with a proportion of T (for example, T = 5%) in the current population loss function set, and selects the best-performing one from the randomly sampled loss functions as the first loss function a. Repeat K times to obtain K first loss functions.
以GMS损失函数为例,如图11所示,随机采样当前潜力GMS损失函数集中T比例的GMS损失函数,并从随机采样的损失函数中随机选择K个第一损失函数,其中K为大于或等于2的正整数,例如,在图11中所示的,选择了2个第一损失函数。应理解,对于GMS损失函数而言,本申请实施例通过两个计算图和1个常数来表示,因此从随机采样的损失函数中选择大于或等于2个的第一损失函数,有利于提高损失函数选择的随机性。Taking the GMS loss function as an example, as shown in Figure 11, randomly sample T proportion of GMS loss functions in the current potential GMS loss function set, and randomly select K first loss functions from the randomly sampled loss functions, where K is greater than or A positive integer equal to 2, for example, as shown in Figure 11, is chosen for the 2 first loss functions. It should be understood that for the GMS loss function, the embodiment of the present application is represented by two calculation graphs and one constant. Therefore, selecting a first loss function greater than or equal to 2 from randomly sampled loss functions is beneficial to improving the loss. Randomness in function selection.
S1003,根据K个第一损失函数,获得第二损失函数。S1003: Obtain the second loss function based on the K first loss functions.
作为一种可能实现的方式,如果K等于1,直接对第一损失函数进行变异或者复制或者重新随机初始化,得到第二损失函数。As a possible implementation method, if K is equal to 1, directly mutate, copy, or randomly re-initialize the first loss function to obtain the second loss function.
应理解,有A的概率对第一损失函数进行重新随机初始化,重新随机初始化可以理解为随机中当前种群损失函数集中选择一个种群损失函数作为第二损失函数;有B的概率对第一损失函数进行复制,也就是第二损失函数保持第一损失函数的形式不变;还有C的概率对第一损失函数进行变异,也就是对表示第一损失函数的计算图进行变异,本申请对A、B和C的具体取值不作限定,例如,A=40%、B=10%和C=50%。It should be understood that there is a probability of A to re-randomly initialize the first loss function. Re-random initialization can be understood as randomly selecting a population loss function as the second loss function in the current population loss function; there is a probability of B to re-initialize the first loss function. Copy, that is, the second loss function keeps the form of the first loss function unchanged; there is also a probability of C to mutate the first loss function, that is, mutate the calculation graph representing the first loss function, this application applies to A The specific values of , B and C are not limited, for example, A=40%, B=10% and C=50%.
下面,将结合图12具体说明对第一损失函数进行变异的具体实现方式,图12是本申请实施例提供的一种计算图的变异方式示意图。Next, a specific implementation method of mutating the first loss function will be described in detail with reference to Figure 12. Figure 12 is a schematic diagram of a mutation method of a calculation graph provided by an embodiment of the present application.
对表示第一损失函数的计算图的进行变异的方式主要有,在计算图中***新的算子节点、对计算图中原有的算子节点进行删除或者对计算图中原有的算子节点金信达替换。如图12所示,图12的(a)是待变异的计算图,图12的(b)是增加了新的Div算子节点,图12的(c)是原有的Sig算子节点,图12的(d)是将原Exp算子节点替换为Gd算子节点。The main ways to mutate the calculation graph representing the first loss function are to insert new operator nodes in the calculation graph, delete the original operator nodes in the calculation graph, or modify the original operator nodes in the calculation graph. Cinda replacement. As shown in Figure 12, Figure 12(a) is the calculation graph to be mutated, Figure 12(b) is the new Div operator node added, Figure 12(c) is the original Sig operator node, (d) in Figure 12 replaces the original Exp operator node with the Gd operator node.
作为一种可能实现的方式,如果K为大于或等于2的正整数,对K个第一损失函数进行交叉筛选,获得第二损失函数。As a possible implementation method, if K is a positive integer greater than or equal to 2, cross-screen the K first loss functions to obtain the second loss function.
其中,交叉筛选可以理解为对K个第一损失函数以D的概率进行交叉,从K个第一损失函数中选择出一个中间损失函数用于生成第二损失函数,本申请实施例对D的取值不作限定,例如,可以是60%、80%等。Among them, cross screening can be understood as crossing K first loss functions with a probability of D, and selecting an intermediate loss function from the K first loss functions to generate a second loss function. The embodiment of the present application performs D on The value is not limited, for example, it can be 60%, 80%, etc.
示例性地,以损失函数为GMS损失函数为例,对2个第一损失函数以60%的概率进行交叉,也就是以60%的概率,第一损失函数a被第一损失函数b取代,将取代后的第一 损失函数a作为中间损失函数。对中间损失函数进行重新初始化、复制或者变异,获得第二损失函数。具体实现方式和上述根据第一损失函数获得第二损失函数的方式类似,为避免重复,在此不作赘述。For example, taking the loss function as the GMS loss function as an example, the two first loss functions are crossed with a probability of 60%, that is, with a probability of 60%, the first loss function a is replaced by the first loss function b, will replace the first The loss function a serves as the intermediate loss function. Reinitialize, copy or mutate the intermediate loss function to obtain the second loss function. The specific implementation method is similar to the above-mentioned method of obtaining the second loss function based on the first loss function, and will not be described in detail here to avoid repetition.
S1004,如果第二损失函数通过损失函数拒绝准则,则对第二损失函数进行等价性验证。S1004. If the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function.
其中,损失函数拒绝准则可以理解为损失函数的基本属性是否满足要求的判断准则。以损失函数为GMS损失函数为例进行具体说明。Among them, the loss function rejection criterion can be understood as a judgment criterion for whether the basic properties of the loss function meet the requirements. Take the loss function as GMS loss function as an example for detailed explanation.
GMS损失函数的拒绝准则包括基本属性准则和目标任务指标。其中,基本属性准则是指S1003生成的第二损失函数的计算图对应的函数t(x),n(x)在区间x∈[-1,1]上是否满足公式(10):
The rejection criteria of the GMS loss function include basic attribute criteria and target task indicators. Among them, the basic attribute criterion refers to whether the functions t(x) and n(x) corresponding to the calculation graph of the second loss function generated by S1003 satisfy formula (10) on the interval x∈[-1, 1]:
目标任务指标为通过第二损失函数对任务数据进行训练得到的输出指标达到预设值,其中,目标任务指标的类型与任务相关的输出度量有关。例如,以是全类平均正确率(mean average precision,mAP)指标。mAP指标的预设值可以为τtoy=0.9。申请实施例目标任务指标的类型以及其预设值不作限定。The target task indicator is that the output indicator obtained by training the task data through the second loss function reaches a preset value, where the type of the target task indicator is related to the task-related output metric. For example, take the mean average accuracy (mAP) indicator of the entire class. The preset value of the mAP indicator may be τ toy =0.9. The types of target task indicators and their default values in the application embodiment are not limited.
在本申请实施例中,包括基本属性准则和目标任务指标的损失函数拒绝准则可以对第二损失函数进行快速筛选,提早筛除不满足要求的第二损失函数,相比于传统的仅基于基本属性的准则的损失函数拒绝准则,或者仅基于目标任务指标的损失函数拒绝准则,本申请实施例的损失函数拒绝准确考虑的因素更加全面,可以更全面地筛除不满足要求的第二损失函数,从而提高整体损失函数的搜索效率。In the embodiment of the present application, the loss function rejection criterion including basic attribute criteria and target task indicators can quickly screen the second loss function and early screen out the second loss function that does not meet the requirements. Compared with the traditional loss function based only on the basic Loss function rejection criteria based on attribute criteria, or loss function rejection criteria based only on target task indicators. The loss function rejection criteria in the embodiment of the present application accurately considers more comprehensive factors, and can more comprehensively screen out second loss functions that do not meet the requirements. , thereby improving the search efficiency of the overall loss function.
作为一种可能的实现方式,如果第二损失函数未通过损失函数拒绝准则,则重新对第一损失函数或者中间损失函数进行变异、复制或者重新随机初始化的过程,得到新的第二损失函数,直到更新后的第二损失函数通过损失函数拒绝准则,如图11所示。As a possible implementation method, if the second loss function does not pass the loss function rejection criterion, then the first loss function or the intermediate loss function is mutated, copied, or re-randomly initialized to obtain a new second loss function, Until the updated second loss function passes the loss function rejection criterion, as shown in Figure 11.
S1005,如果第二损失函数与当前种群损失函数集中的第m个种群损失函数不等价,则将第二损失函数确定为候选损失函数。S1005. If the second loss function is not equivalent to the mth population loss function in the current population loss function set, determine the second loss function as a candidate loss function.
作为一种可能的实现方式,根据第二损失函数的第一计算图对应的第一函数t(x)、第二计算图对应的第二函数n(x)和常数s,获得第一特征向量,其中,第一特征向量是满足公式(11)至公式(13)。
As a possible implementation, according to the first calculation graph of the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s obtain the first eigenvector, where the first eigenvector satisfies formula (11) to formula (13).
其中,TNmin表示t(x)和n(x)在区间[-1,1]上的最小函数值;TNmax表示t(x)和n(x)在区间[-1,1]上的最大函数值;k表示归一化尺度因子;b表示归一化平移因子;由于平移尺度 变换,会导致Θ0={t(x),n(x),s}和Θk,b={t(x)/k+b,n(x)/k+b,ks},是等价的,其中Θ0为根据第一函数t(x)和第二函数n(x)以及常数s表示的特征向量,因此通过公式(11)消除平移尺度变换造成的等价,第一特征向量使用来表示,具体可以表示为如公式(12)所示。
Among them, TN min represents the minimum function value of t(x) and n(x) on the interval [-1,1]; TN max represents the minimum function value of t(x) and n(x) on the interval [-1,1]. Maximum function value; k represents the normalized scale factor; b represents the normalized translation factor; due to the translation scale Transformation will result in Θ 0 = {t(x), n(x), s} and Θ k, b = {t(x)/k+b, n(x)/k+b, ks}, is equivalent, where Θ 0 is the eigenvector expressed according to the first function t(x), the second function n(x) and the constant s. Therefore, the equivalence caused by the translation scale transformation is eliminated through formula (11), first Feature vector usage to express, specifically it can be expressed as shown in formula (12).
其中,在x∈[-1,1]上的均匀离散插值。Γ是预设的搜索空间约束的阈值,第二损失函数的t(x),n(x),s满足公式(13)所示的约束。in, and for Uniform discrete interpolation on x∈[-1,1]. Γ is the threshold of the preset search space constraint, and the t(x), n(x), and s of the second loss function satisfy the constraints shown in formula (13).
log2((TNmax-TNmin)·s/2)≤Γ       (13)log 2 ((TN max -TN min )·s/2)≤Γ (13)
应理解,通过引用搜索空间的约束,可以保证特征向量中的可以归一化到[-1,1],可以使得性能预测模型有更好的结果。It should be understood that by referencing the constraints of the search space, it is guaranteed that in the feature vector can be normalized to [-1,1], which can make the performance prediction model have better results.
作为一种可能的实现方式,根据目前种群损失函数集中的种群损失函数,获得第二特征向量的集合,第二特征向量集合中包括每个种群损失函数集对应的第二特征向量;如果第一特征向量和每个种群损失函数对应的第二特征向量不等价,将第二损失函数确定为候选损失函数。As a possible implementation method, a second feature vector set is obtained based on the population loss function in the current population loss function set. The second feature vector set includes the second feature vector corresponding to each population loss function set; if the first If the feature vector is not equivalent to the second feature vector corresponding to each population loss function, the second loss function is determined as the candidate loss function.
作为一种可能的实现方式,如果第二损失函数与当前种群损失函数集中的第m个种群损失函数等价,则将第m个种群损失函数对应的评估分数赋予第二损失函数,并更新种群损失函数集。As a possible implementation, if the second loss function is equivalent to the mth population loss function in the current population loss function set, then the evaluation score corresponding to the mth population loss function is assigned to the second loss function, and the population is updated. loss function set.
作为一种可能的实现方式,更新种群损失函数集的方式可以为将第二损失函数作为潜力损失函数以及第二损失函数对应的评估分数加入种群损失函数集中,并淘汰一个种群损失函数,其中,淘汰的种群损失函数可以是最早的种群损失函数,例如,淘汰种群损失函数集中排在第一个的种群损失函数,本申请实施例对淘汰种群损失函数的方式不作限制。As a possible implementation method, the way to update the population loss function set can be to add the second loss function as the potential loss function and the evaluation score corresponding to the second loss function into the population loss function set, and eliminate one population loss function, where, The eliminated population loss function may be the earliest population loss function, for example, the eliminated population loss function ranked first in the concentration of eliminated population loss functions. The embodiment of the present application does not limit the method of eliminating the population loss function.
应理解,将种群损失函数集中的第一个种群损失函数作为淘汰目标,可以避免淘汰评估分数最低的种群损失函数,而产生数据多样性不足的问题,进而有助于保证搜索结果的表现,即目标损失函数的表现。It should be understood that using the first population loss function in the population loss function set as the elimination target can avoid eliminating the population loss function with the lowest evaluation score, which will cause the problem of insufficient data diversity, and thus help ensure the performance of the search results, that is, Performance of the target loss function.
示例性地,如图11所示,如果第二损失函数与当前种群GMS损失函数集中的第m个种群GMS损失函数等价,将当前种群GMS损失函数集中第m个种群GMS损失函数对应的评估分数赋予第二损失函数,直接将第二损失函数和第二损失函数对应的评估分数,加入到当前种群GMS损失函数集中,并淘汰一个种群损失函数,以获得更新后的种群GMS损失函数集。For example, as shown in Figure 11, if the second loss function is equivalent to the m-th population GMS loss function in the current population GMS loss function set, the evaluation corresponding to the m-th population GMS loss function in the current population GMS loss function set The score is assigned to the second loss function, and the second loss function and the evaluation score corresponding to the second loss function are directly added to the current population GMS loss function set, and one population loss function is eliminated to obtain the updated population GMS loss function set.
在本申请实施例中,基于特征向量的等价性验证有效地将等价的损失函数挑选出来,避免对和当前种群损失函数集中的损失函数进行重复代理任务评估,从而有效提高了损失函数的搜索效率。In the embodiments of this application, equivalence verification based on feature vectors effectively selects equivalent loss functions and avoids repeated agent task evaluations for loss functions in the current population loss function set, thereby effectively improving the accuracy of the loss function. Search efficiency.
随后,可以按照图7所示的搜索方式,对候选损失函数通过潜力损失函数选择模块,选择出潜力损失函数,以进行代理任务评估,从而对当前种群损失函数集进行更新。其中,潜力损失函数选择模块的工作原理和潜力数据选择模块相似,不同的是潜力数据为潜力损失函数,为了避免重复在此不作赘述;对当前种群损失函数集进行更新的方式也在上述内容中详细描述,为了避免重复在此不作赘述。Subsequently, according to the search method shown in Figure 7, the candidate loss function can be selected through the potential loss function selection module to conduct agent task evaluation, thereby updating the current population loss function set. Among them, the working principle of the potential loss function selection module is similar to that of the potential data selection module. The difference is that the potential data is a potential loss function. In order to avoid repetition, we will not go into details here. The method of updating the current population loss function set is also in the above content. Detailed description will not be repeated here to avoid repetition.
最后,对当前种群损失函数集经过多次迭代更新,从更新结束的种群损失函数集中选择一个评估分数最好的潜力损失函数作为目标损失函数。Finally, after multiple iterations of updating the current population loss function set, a potential loss function with the best evaluation score is selected from the updated population loss function set as the target loss function.
通过图5至图12对本申请自动搜索方法做了详细描述,当自动搜索的内容为损失函 数时,模型训练模块312将通过上述方式在自动搜索模块311中搜索的得到的目标损失函数,对从用户除获得的待训练的神经网络模型和原始训练数据进行训练,得到目标神经网络模型。应理解,根据待训练的神经网络模型和原始训练数据的训练任务不同,得到相应的目标神经网络模型,从而应用于对应的具体任务。其中,训练任务可以为人脸识别、行人再识别,度量学习等,本申请实施例对此不作限制。The automatic search method of this application is described in detail through Figures 5 to 12. When the content of the automatic search is a loss letter, After counting, the model training module 312 will use the target loss function searched in the automatic search module 311 through the above method to train the neural network model to be trained and the original training data obtained from the user to obtain the target neural network model. It should be understood that according to the different training tasks of the neural network model to be trained and the original training data, the corresponding target neural network model is obtained and applied to the corresponding specific tasks. Among them, the training tasks can be face recognition, pedestrian re-identification, metric learning, etc., and the embodiments of the present application do not limit this.
下面将结合图13和图14和表3至表5,对通过上述方式自动搜索GMS损失函数的效果做具体详细的描述。The effect of automatically searching the GMS loss function through the above method will be described in detail below in conjunction with Figures 13 and 14 and Tables 3 to 5.
首先,结合图13对潜力损失函数选择模块中的包括可微分的排序损失函数的性能预测模型的效果做详细描述。图13是本申请实施例提供的一种训练性能预测模型过程中的损失函数是否包括可微分的排序损失函数的效果对比示意图。First, the effect of the performance prediction model including the differentiable ranking loss function in the potential loss function selection module is described in detail with reference to Figure 13. Figure 13 is a schematic diagram comparing the effects of whether the loss function includes a differentiable ranking loss function in the process of training a performance prediction model provided by an embodiment of the present application.
如图13所示,横坐标为性能预测模型的训练数据量,纵坐标是性能预测模型的预测效果指标(KTau),预测效果指标越高则性能预测模型的预测准确性越高。从图13中可以看出,性能预测模型的损失函数仅为MSE损失函数的预测效果指标要低于性能预测模型的损失函数为MSE损失函数以及可微分的排序损失函数。因此,根据本申请实施例提出的可微分的排序损失函数和MSE损失函数训练得到的性能预测模型的预测准确性更好,进而有助于潜力损失器选择出表现更好的潜力损失函数。As shown in Figure 13, the abscissa is the amount of training data of the performance prediction model, and the ordinate is the prediction effect index (KTau) of the performance prediction model. The higher the prediction effect index, the higher the prediction accuracy of the performance prediction model. As can be seen from Figure 13, the prediction effect index of the performance prediction model whose loss function is only the MSE loss function is lower than that of the performance prediction model whose loss function is the MSE loss function and the differentiable ranking loss function. Therefore, the performance prediction model trained according to the differentiable ranking loss function and the MSE loss function proposed in the embodiments of this application has better prediction accuracy, which in turn helps the potential loss device to select a potential loss function with better performance.
其次,结合图14和表3对损失函数搜索过程中潜力损失函数选择模块(promissing-loss chooser,PLC)的效果做详细描述。图14是本申请实施例提供的一种自动损失函数搜索中是否加入潜力损失函数选择模块的效果对比示意图。Secondly, the effect of the potential loss function selection module (promissing-loss chooser, PLC) in the loss function search process is described in detail with reference to Figure 14 and Table 3. Figure 14 is a schematic diagram comparing the effects of whether to add a potential loss function selection module in an automatic loss function search provided by an embodiment of the present application.
如图14所示,横坐标为损失函数的搜索的损失函数数量,纵坐标为任务相关的输出度量,例如图14所示的mAP。mAP的数值越高,则搜索到的损失函数表现越好。因此,从图14可以看出,包括PLC的损失函数搜索方法(AutoLoss-MS)得到的损失函数的mAP大部分比不包括PLC的损失函数搜索方法(AutoLoss-MS w/o PLC)的mAP要高,因此,基于包括PLC的损失函数搜索方法得到的潜力损失函数整体性能更好。As shown in Figure 14, the abscissa is the number of loss functions searched for the loss function, and the ordinate is the task-related output metric, such as mAP shown in Figure 14. The higher the value of mAP, the better the performance of the searched loss function. Therefore, it can be seen from Figure 14 that the mAP of the loss function obtained by the loss function search method including PLC (AutoLoss-MS) is mostly higher than the mAP of the loss function search method without PLC (AutoLoss-MS w/o PLC). High, therefore, the overall performance of the potential loss function obtained based on the loss function search method including PLC is better.
表3潜力损失函数选择器的效果
Table 3 Effect of potential loss function selector
另外,从表3中可以看出,基于包括PLC的损失函数搜索方法所探索的损失函数数量要明显高于基于不包括PLC的损失函数搜索方法所探索的损失函数数量。这是因为,通过潜力损失函数选择模块中的性能预测模型,可以提前淘汰预测结果不好的候选损失函数,对预测结果好的候选损失函数进行代理任务评估,例如,从两个候选损失函数中选择一个作为潜力损失函数,或者从5个候选损失函数中选择一个潜力损失函数,或者从更多的候选损失函数总选择一个潜力损失函数,对选择出的潜力损失函数代理任务评估。而现有的不包括PLC的损失函数搜索方法,每个候选损失函数都需要进行代理任务评估,在相同的迭代次数下,基于包括PLC的损失函数搜索方法所探索的损失函数数量要更高。In addition, it can be seen from Table 3 that the number of loss functions explored based on the loss function search method including PLC is significantly higher than the number of loss functions explored based on the loss function search method not including PLC. This is because, through the performance prediction model in the potential loss function selection module, candidate loss functions with poor prediction results can be eliminated in advance, and candidate loss functions with good prediction results can be evaluated for proxy tasks, for example, from two candidate loss functions Select one as the potential loss function, or select a potential loss function from 5 candidate loss functions, or select a potential loss function from more candidate loss functions, and evaluate the selected potential loss function proxy task. However, in the existing loss function search method that does not include PLC, each candidate loss function needs to be evaluated by the agent task. Under the same number of iterations, the number of loss functions explored based on the loss function search method that includes PLC is higher.
下面将结合表4和表5对通过本申请实施例中的损失函数搜索方法得到的目标损失函数在不同模型下的表现效果。 The performance effects of the target loss function obtained through the loss function search method in the embodiment of the present application under different models will be combined with Table 4 and Table 5 below.
表4是不同模型(例如,残差网络(residual network,ResNet50)、全尺度网络(omni-scale network,OSNet)和多粒度网络(multiple granularity network,MGN))使用相同数据集(例如,Market1501数据集)搜索到的损失函数。Table 4 shows the results of different models (e.g., residual network (ResNet50), omni-scale network (OSNet), and multiple granularity network (MGN)) using the same data set (e.g., Market1501 data set) the loss function searched.
表4三个模型在Market-1501数据集上搜索到的损失函数
Table 4 Loss functions searched by the three models on the Market-1501 data set
将表4获得的损失函数分别在不同的数据集下进行训练实验得到的实验结果,和传统固定损失函数搜索方法中的先进算法进行训练实验得到的实验结果进行比较,比较结果如表5所示。Compare the experimental results obtained by training experiments on different data sets for the loss functions obtained in Table 4 with the experimental results obtained by training experiments using advanced algorithms in traditional fixed loss function search methods. The comparison results are shown in Table 5 .
表5四个数据集上本发明与其他方法对比
Table 5 Comparison between the present invention and other methods on four data sets
从表5中可以看到通过本申请实施例得到的目标损失函数可以移植到其他数据集上进行训练的同时,训练效果比传统的先进算法得到的损失函数要更好。It can be seen from Table 5 that the target loss function obtained through the embodiment of this application can be transplanted to other data sets for training, and the training effect is better than the loss function obtained by traditional advanced algorithms.
下面将结合图15至图18对本申请实施例的装置进行说明。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。The device according to the embodiment of the present application will be described below with reference to FIGS. 15 to 18 . It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.
图15是本申请实施例的自动搜索的性能预测模型训练装置3000的示意性框图。图15所示的神经网络模型的训练装置3000包括获取单元3010和处理单元3020。Figure 15 is a schematic block diagram of the automatic search performance prediction model training device 3000 according to the embodiment of the present application. The neural network model training device 3000 shown in FIG. 15 includes an acquisition unit 3010 and a processing unit 3020.
获取单元3010用于,获取第一训练数据集,第一训练数据包括样本数据和样本数据对应的评估分数。The acquisition unit 3010 is configured to acquire a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data.
处理单元3020用于,根据第一训练数据集对性能预测模型进行训练,得到目标性能预测模型,其中性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数。The processing unit 3020 is configured to train the performance prediction model according to the first training data set to obtain a target performance prediction model, where the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
应理解,上述内容仅是一种示例性描述,该自动搜索装置是用于执行前述方法实施例 所提及的方法或者步骤,因此,该处理计算任务装置与前述的方法实施例是对应的。具体内容可以参考前述方法实施例的描述,在此不再赘述。It should be understood that the above content is only an exemplary description, and the automatic search device is used to perform the foregoing method embodiments. The mentioned methods or steps, therefore, the device for processing computing tasks correspond to the aforementioned method embodiments. For specific content, please refer to the description of the foregoing method embodiments, which will not be described again here.
图16是本申请实施例提供的自动搜索装置4000示意性框图。图16所示的自动搜索装置4000包括获取单元4010和处理单元4020。Figure 16 is a schematic block diagram of the automatic search device 4000 provided by the embodiment of the present application. The automatic search device 4000 shown in FIG. 16 includes an acquisition unit 4010 and a processing unit 4020.
获取单元3010用于,获取至少两个候选数据,至少两个候选数据为待进行代理任务评估的数据。The acquisition unit 3010 is configured to acquire at least two candidate data, where the at least two candidate data are data to be evaluated for the agent task.
处理单元3020用于,将至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数,第一训练数据集包括样本数据和样本数据对应的评估分数;根据至少两个候选数据对应的预测指标对所述至少两个候选数据中的部分候选数据进行代理任务评估。The processing unit 3020 is configured to input at least two candidate data into the target performance prediction model, and obtain prediction indicators corresponding to the at least two candidate data, wherein the target performance prediction model trains the performance prediction model based on the first training data set. Obtained, the loss function of the performance prediction model includes the differentiable ranking loss function L K and the regression loss function. The first training data set includes sample data and the evaluation scores corresponding to the sample data; according to the prediction index pairs corresponding to at least two candidate data Part of the candidate data among the at least two candidate data is evaluated by the agent task.
应理解,上述内容仅是一种示例性描述,该自动搜索的性能预测模型训练装置是用于执行前述方法实施例所提及的方法或者步骤,因此,该处理计算任务装置与前述的方法实施例是对应的。具体内容可以参考前述方法实施例的描述,在此不再赘述。It should be understood that the above content is only an exemplary description. The automatic search performance prediction model training device is used to perform the methods or steps mentioned in the foregoing method embodiments. Therefore, the computing task processing device is implemented with the foregoing method. The examples are corresponding. For specific content, please refer to the description of the foregoing method embodiments, which will not be described again here.
需要说明的是,上述训练装置3000以及装置4000以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。It should be noted that the above-mentioned training device 3000 and device 4000 are embodied in the form of functional units. The term "unit" here can be implemented in the form of software and/or hardware, and is not specifically limited.
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Therefore, the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
图17是本申请实施例提供的自动搜索的性能预测模型训练装置的硬件结构示意图。图17所示的自动搜索的性能预测模型训练装置5000(该装置5000具体可以是一种计算机设备)包括存储器5001、处理器5002、通信接口5003以及总线5004。其中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。Figure 17 is a schematic diagram of the hardware structure of an automatic search performance prediction model training device provided by an embodiment of the present application. The automatic search performance prediction model training device 5000 shown in Figure 17 (the device 5000 may specifically be a computer device) includes a memory 5001, a processor 5002, a communication interface 5003, and a bus 5004. Among them, the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.
存储器5001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器5001可以存储程序,当存储器5001中存储的程序被处理器5002执行时,处理器5002用于执行本申请实施例的性能预测模型的训练方法的各个步骤。The memory 5001 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 5001 can store programs. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 is used to execute various steps of the training method of the performance prediction model in the embodiment of the present application.
处理器5002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的性能预测模型的训练方法。The processor 5002 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute relevant programs to implement the training method of the performance prediction model of the method embodiment of the present application.
处理器5002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的性能预测模型的训练方法的各个步骤可以通过处理器5002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 5002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the training method of the performance prediction model of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 5002.
上述处理器5002还可以是通用处理器、数字信号处理器(digital signal processing, DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器5001,处理器5002读取存储器5001中的信息,结合其硬件完成图15所示的训练装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的图5所示的性能预测模型的训练方法。The above-mentioned processor 5002 can also be a general-purpose processor or a digital signal processor (digital signal processing, DSP), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 5001. The processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the training device shown in Figure 15, or to execute Figure 5 of the method embodiment of the present application. The performance prediction model training method shown.
通信接口5003使用例如但不限于收发器一类的收发装置,来实现装置5000与其他设备或通信网络之间的通信。例如,可以通过通信接口5003获取训练数据。The communication interface 5003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, training data can be obtained through the communication interface 5003.
总线5004可包括在装置5000各个部件(例如,存储器5001、处理器5002、通信接口5003)之间传送信息的通路。Bus 5004 may include a path that carries information between various components of device 5000 (eg, memory 5001, processor 5002, communication interface 5003).
图18是本申请实施例的自动搜索装置的硬件结构示意图。图18所示的自动搜索装置6000包括存储器6001、处理器6002、通信接口6003以及总线6004。其中,存储器6001、处理器6002、通信接口6003通过总线6004实现彼此之间的通信连接。Figure 18 is a schematic diagram of the hardware structure of the automatic search device according to the embodiment of the present application. The automatic search device 6000 shown in FIG. 18 includes a memory 6001, a processor 6002, a communication interface 6003 and a bus 6004. Among them, the memory 6001, the processor 6002, and the communication interface 6003 implement communication connections between each other through the bus 6004.
存储器6001可以是ROM,静态存储设备和RAM。存储器6001可以存储程序,当存储器6001中存储的程序被处理器6002执行时,处理器6002和通信接口6003用于执行本申请实施例的自动搜索方法的各个步骤。具体地,处理器6002可以执行上文中图7所示的方法。Memory 6001 may be ROM, static storage device, and RAM. The memory 6001 can store programs. When the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 and the communication interface 6003 are used to execute various steps of the automatic search method in the embodiment of the present application. Specifically, the processor 6002 can perform the method shown in Figure 7 above.
处理器6002可以采用通用的,CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的自动搜索装置中的单元所需执行的功能,或者执行本申请方法实施例的自动搜索方法。The processor 6002 can be a general-purpose CPU, microprocessor, ASIC, GPU or one or more integrated circuits, and is used to execute related programs to realize the functions required to be performed by the units in the automatic search device in the embodiment of the present application. Or execute the automatic search method of the method embodiment of the present application.
处理器6002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的自动搜索方法的各个步骤可以通过处理器6002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 6002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the automatic search method in the embodiment of the present application can be completed through the integrated logic circuit of hardware in the processor 6002 or instructions in the form of software.
上述处理器6002还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器6001,处理器6002读取存储器6001中的信息,结合其硬件完成本申请实施例的自动搜索装置中包括的单元所需执行的功能,或者执行本申请方法实施例的自动搜索方法。The above-mentioned processor 6002 can also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 6001. The processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required to be performed by the units included in the automatic search device of the embodiment of the present application, or to perform the automatic search of the method embodiment of the present application. method.
通信接口6003使用例如但不限于收发器一类的收发装置,来实现装置6000与其他设备或通信网络之间的通信。例如,可以通过通信接口6003获取待处理的数据。The communication interface 6003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks. For example, the data to be processed can be obtained through the communication interface 6003.
总线6004可包括在装置6000各个部件(例如,存储器6001、处理器6002、通信接口6003)之间传送信息的通路。Bus 6004 may include a path that carries information between various components of device 6000 (eg, memory 6001, processor 6002, communication interface 6003).
应注意,尽管上述装置5000和装置6000仅仅示出了存储器、处理器、通信接口,但 是在具体实现过程中,本领域的技术人员应当理解,装置5000和装置6000还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置5000和装置6000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置5000和装置6000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图17和图18中所示的全部器件。It should be noted that although the above-mentioned device 5000 and device 6000 only show a memory, a processor, and a communication interface, During the specific implementation process, those skilled in the art should understand that the device 5000 and the device 6000 may also include other components necessary for normal operation. At the same time, based on specific needs, those skilled in the art should understand that the device 5000 and the device 6000 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 5000 and the device 6000 may only include components necessary to implement the embodiments of the present application, and do not necessarily include all the components shown in FIGS. 17 and 18 .
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor in the embodiment of the present application can be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (DSP), or application-specific integrated circuits. (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of random access memory (RAM) are available, such as static random access memory (static RAM (SRAM)), dynamic random access memory (DRAM), synchronous dynamic random access memory (RAM) Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmit to another website, computer, server or data center through wired (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive.
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。It should be understood that the term "and/or" in this article is only an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, and A and B exist simultaneously. , there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship, but it may also indicate an "and/or" relationship. For details, please refer to the previous and later contexts for understanding.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下 至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application, "at least one" refers to one or more, and "plurality" refers to two or more. "the following "At least one item (items)" or similar expressions refer to any combination of these items, including any combination of single items (items) or plural items (items). For example, at least one of a, b, or c (number), can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。 The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (26)

  1. 一种自动搜索方法,其特征在于,包括:An automatic search method is characterized by including:
    获取至少两个候选数据,所述至少两个候选数据为待进行代理任务评估的数据;Obtaining at least two candidate data, the at least two candidate data are data to be evaluated for the agent task;
    将所述至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,所述目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,所述性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数,所述第一训练数据集包括样本数据和所述样本数据对应的评估分数;Input the at least two candidate data into the target performance prediction model to obtain prediction indicators corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set , the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function, and the first training data set includes sample data and the evaluation score corresponding to the sample data;
    根据所述至少两个候选数据对应的预测指标,对所述至少两个候选数据中的部分候选数据进行代理任务评估。According to the prediction indicators corresponding to the at least two candidate data, a proxy task evaluation is performed on part of the candidate data in the at least two candidate data.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少两个候选数据对应的预测指标,对所述至少两个候选数据中的部分候选数据进行代理任务评估包括:The method according to claim 1, characterized in that, based on the prediction indicators corresponding to the at least two candidate data, performing proxy task evaluation on part of the candidate data in the at least two candidate data includes:
    对所述至少两个候选数据中预测指标最好的候选数据进行代理任务评估。The proxy task evaluation is performed on the candidate data with the best prediction index among the at least two candidate data.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, characterized in that, the method further includes:
    将经过代理任务评估后的所述部分候选数据加入所述第一训练数据集中,得到更新后的第一训练数据集;Add the partial candidate data that has been evaluated by the agent task to the first training data set to obtain an updated first training data set;
    获取至少两个更新后的候选数据,所述至少两个更新后的候选数据和所述至少两个候选数据不同;Obtaining at least two updated candidate data, the at least two updated candidate data are different from the at least two candidate data;
    将所述至少两个更新后的候选数据输入到更新后的目标性能预测模型中,得到至少两个更新后的候选数据对应的预测指标,其中,所述更新后的目标性能预测模型是根据所述更新后的第一训练数据集得到的;The at least two updated candidate data are input into the updated target performance prediction model to obtain prediction indicators corresponding to the at least two updated candidate data, wherein the updated target performance prediction model is based on the Obtained from the updated first training data set;
    根据所述至少两个更新后的候选数据对应的预测指标,对所述至少两个更新后的候选数据中部分候选数据进行代理任务评估。According to the prediction indicators corresponding to the at least two updated candidate data, a proxy task evaluation is performed on part of the candidate data in the at least two updated candidate data.
  4. 根据权利要求3所述的方法,其特征在于,所述回归损失函数为均方误差损失函数LMSEThe method according to claim 3, characterized in that the regression loss function is a mean square error loss function L MSE .
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述至少两个候选数据为至少两个候选损失函数,所述种群数据集为种群损失函数集。The method according to any one of claims 1 to 4, characterized in that the at least two candidate data are at least two candidate loss functions, and the population data set is a population loss function set.
  6. 根据权利要求5所述的方法,其特征在于,当所述候选损失函数中的损失函数类型为广义间隔softmax损失函数GMS损失函数时,所述获取至少两个候选损失函数包括:The method according to claim 5, characterized in that when the loss function type in the candidate loss function is a generalized margin softmax loss function GMS loss function, the obtaining at least two candidate loss functions includes:
    获取当前种群损失函数集,所述当前种群损失函数集中包括M个种群损失函数,其中,第m个种群损失函数通过第一计算图第二计算图和常数s表示,其中M为正整数,1≤m≤M;Obtain the current population loss function set, which includes M population loss functions, wherein the mth population loss function passes through the first calculation graph Second calculation graph and constant s, where M is a positive integer, 1≤m≤M;
    对所述当前种群损失函数集进行初始筛选,获得筛选后的K个第一初始损失函数,K为大于或等于2的正整数;Perform an initial screening on the current population loss function set to obtain the K first initial loss functions after screening, where K is a positive integer greater than or equal to 2;
    对所述K个第一初始损失函数以预设概率进行交叉筛选,获得第二损失函数;Cross-screen the K first initial loss functions with a preset probability to obtain a second loss function;
    如果所述第二损失函数通过损失函数拒绝准则,则对所述第二损失函数进行等价性验证;If the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function;
    如果第二损失函数与所述当前种群损失函数集中的第m个当前种群损失函数不等价,则所述第二损失函数确定为所述候选损失函数。 If the second loss function is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
  7. 根据权利要求6所述的方法,其特征在于,所述如果所述第二损失函数通过损失函数拒绝准则,则对所述第二损失函数进行等价性验证包括:所述损失函数拒绝准则包括损失函数基本属性准则和目标任务指标,The method of claim 6, wherein if the second loss function passes a loss function rejection criterion, performing equivalence verification on the second loss function includes: the loss function rejection criterion includes: Basic attribute criteria and target task indicators of the loss function,
    如果满足损失函数基本属性准则和目标任务指标,则对所述第二损失函数进行等价性验证;If the basic attribute criteria of the loss function and the target task indicators are met, the equivalence verification of the second loss function is performed;
    其中,所述第二损失函数满足所述损失函数基本属性准则为所述第二损失函数的第一计算图对应的第一函数t(x)和第二计算图对应的第二函数n(x)满足如下公式:
    Wherein, the second loss function satisfying the basic attribute criterion of the loss function is the first calculation graph of the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
    所述第二损失函数满足所述目标任务指标为通过所述第二损失函数对任务数据进行训练得到的输出指标达到预设值。The second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
  8. 根据权利要求6或7所述的方法,其特征在于,所述如果第二损失函数与所述当前种群损失函数集中的第m个当前种群损失函数不等价,则所述第二损失函数确定为所述候选损失函数包括:The method according to claim 6 or 7, characterized in that if the second loss function is not equivalent to the mth current population loss function in the current population loss function set, then the second loss function determines The candidate loss functions include:
    根据所述第二损失函数的第一计算图对应的第一函数t(x)、第二计算图对应的第二函数n(x)和常数s,获得第一特征向量;The first calculation graph according to the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector;
    根据所述当前种群损失函数集中的种群损失函数,获得第二特征向量集合,所述第二特征向量集合中包括每个所述种群损失函数对应的第二特征向量;Obtain a second feature vector set according to the population loss function in the current population loss function set, and the second feature vector set includes a second feature vector corresponding to each of the population loss functions;
    如果所述第一特征向量和所述每个所述种群损失函数对应的第二特征向量不等价,则所述第二损失函数确定为所述候选损失函数。If the first feature vector and the second feature vector corresponding to each of the population loss functions are not equivalent, the second loss function is determined as the candidate loss function.
  9. 一种自动搜索的性能预测模型的训练方法,其特征在于,包括:A training method for an automatic search performance prediction model, which is characterized by including:
    获取第一训练数据集,所述第一训练数据包括样本数据和所述样本数据对应的评估分数;Obtain a first training data set, where the first training data includes sample data and evaluation scores corresponding to the sample data;
    根据所述第一训练数据集对所述性能预测模型进行训练,得到目标性能预测模型,其中所述性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数。The performance prediction model is trained according to the first training data set to obtain a target performance prediction model, wherein the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function.
  10. 根据权利要求9所述的训练方法,其特征在于,所述回归损失函数为均方误差损失函数LMSEThe training method according to claim 9, characterized in that the regression loss function is a mean square error loss function L MSE .
  11. 根据权利要求9或10所述的训练方法,其特征在于,所述方法还包括:The training method according to claim 9 or 10, characterized in that the method further includes:
    更新所述第一训练数据集;Update the first training data set;
    当所述第一训练数据集的增量到达第一阈值时,根据更新后的第一训练数据集对所述目标性能预测模型进行训练,得到更新后的目标性能预测模型。When the increment of the first training data set reaches the first threshold, the target performance prediction model is trained according to the updated first training data set to obtain an updated target performance prediction model.
  12. 一种自动搜索装置,其特征在于,所述装置包括获取单元、处理单元:An automatic search device, characterized in that the device includes an acquisition unit and a processing unit:
    所述获取单元用于,获取至少两个候选数据,所述至少两个候选数据为待进行代理任务评估的数据;The acquisition unit is configured to acquire at least two candidate data, where the at least two candidate data are data to be evaluated for the agent task;
    所述处理单元用于:The processing unit is used for:
    将所述至少两个候选数据输入到目标性能预测模型中,得到至少两个候选数据对应的预测指标,其中,所述目标性能预测模型是基于第一训练数据集对性能预测模型进行训练得到的,所述性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数,所 述第一训练数据集包括样本数据和所述样本数据对应的评估分数;Input the at least two candidate data into the target performance prediction model to obtain prediction indicators corresponding to the at least two candidate data, wherein the target performance prediction model is obtained by training the performance prediction model based on the first training data set , the loss function of the performance prediction model includes a differentiable ranking loss function L K and a regression loss function, so The first training data set includes sample data and evaluation scores corresponding to the sample data;
    根据所述至少两个候选数据对应的预测指标,对所述至少两个候选数据中的部分候选数据进行代理任务评估。According to the prediction indicators corresponding to the at least two candidate data, a proxy task evaluation is performed on part of the candidate data in the at least two candidate data.
  13. 根据权利要求12所述的装置,其特征在于,所述处理单元用于:The device according to claim 12, characterized in that the processing unit is used for:
    对所述至少两个候选数据中预测指标最好的候选数据进行代理任务评估。The proxy task evaluation is performed on the candidate data with the best prediction index among the at least two candidate data.
  14. 根据权利要求12或13所述的装置,其特征在于,所述装置还包括更新单元:The device according to claim 12 or 13, characterized in that the device further includes an update unit:
    所述更新单元用于,将经过代理任务评估后的所述部分候选数据集加入所述第一训练数据集中,得到更新后的第一训练数据集;The update unit is configured to add the partial candidate data set that has been evaluated by the agent task to the first training data set to obtain an updated first training data set;
    所述获取单元用于,获取至少两个更新后的候选数据,所述至少两个更新后的候选数据和所述至少两个候选数据不同;The acquisition unit is configured to acquire at least two updated candidate data, the at least two updated candidate data being different from the at least two candidate data;
    所述处理单元用于:The processing unit is used for:
    将所述至少两个更新后的候选数据输入到更新后的目标性能预测模型中,得到至少两个更新后的候选数据对应的预测指标,其中,所述更新后的目标性能预测模型是根据所述更新后的第一训练数据集得到的;The at least two updated candidate data are input into the updated target performance prediction model to obtain prediction indicators corresponding to the at least two updated candidate data, wherein the updated target performance prediction model is based on the Obtained from the updated first training data set;
    根据所述至少两个更新后的候选数据对应的预测指标,对所述至少两个更新后的候选数据中的部分候选数据进行代理任务评估。According to the prediction indicators corresponding to the at least two updated candidate data, a proxy task evaluation is performed on part of the candidate data in the at least two updated candidate data.
  15. 根据权利要求14所述的装置,其特征在于,所述回归损失函数为均方误差损失函数LMSEThe device according to claim 14, characterized in that the regression loss function is a mean square error loss function L MSE .
  16. 根据权利要求12至15任一项所述的装置,其特征在于,所述至少两个候选数据为至少两个候选损失函数,所述种群数据集为种群损失函数集。The device according to any one of claims 12 to 15, wherein the at least two candidate data are at least two candidate loss functions, and the population data set is a population loss function set.
  17. 根据权利要求16所述的装置,其特征在于,当所述候选损失函数中的损失函数类型为广义间隔softmax损失函数GMS损失函数时,The device according to claim 16, characterized in that when the loss function type in the candidate loss function is a generalized interval softmax loss function GMS loss function,
    所述获取单元用于,获取当前种群损失函数集,所述当前种群损失函数集中包括M个种群损失函数,其中,第m个种群损失函数通过第一计算图第二计算图和常数s表示,其中M为正整数,1≤m≤M;The acquisition unit is used to acquire a current population loss function set. The current population loss function set includes M population loss functions, wherein the mth population loss function is calculated through the first calculation graph. Second calculation graph and constant s, where M is a positive integer, 1≤m≤M;
    所述处理单元用于:The processing unit is used for:
    对所述当前种群损失函数集进行初始筛选,获得筛选后的K个第一初始损失函数,K为大于或等于2的正整数;Perform an initial screening on the current population loss function set to obtain the K first initial loss functions after screening, where K is a positive integer greater than or equal to 2;
    对所述K个第一初始损失函数以预设概率进行交叉筛选,获得第二损失函数;Cross-screen the K first initial loss functions with a preset probability to obtain a second loss function;
    如果所述第二损失函数通过损失函数拒绝准则,则对所述第二损失函数进行等价性验证;If the second loss function passes the loss function rejection criterion, perform equivalence verification on the second loss function;
    如果第二损失函数与所述当前种群损失函数集中的第m个当前种群损失函数不等价,则所述第二损失函数确定为所述候选损失函数。If the second loss function is not equivalent to the mth current population loss function in the current population loss function set, the second loss function is determined as the candidate loss function.
  18. 根据权利要求17所述的装置,其特征在于,所述如果所述第二损失函数通过损失函数拒绝准则,则对所述第二损失函数进行等价性验证包括:所述损失函数拒绝准则包括损失函数基本属性准则和目标任务指标,The device according to claim 17, wherein if the second loss function passes a loss function rejection criterion, performing equivalence verification on the second loss function includes: the loss function rejection criterion includes: Basic attribute criteria and target task indicators of the loss function,
    所述处理单元用于:如果满足损失函数基本属性准则和目标任务指标,则对所述第二损失函数进行等价性验证;The processing unit is configured to: if the basic attribute criteria of the loss function and the target task indicator are met, perform equivalence verification on the second loss function;
    其中,所述第二损失函数满足所述损失函数基本属性准则为所述第二损失函数的第一计算图对应的第一函数t(x)和第二计算图对应的第二函数n(x)满足如下公式:
    Wherein, the second loss function satisfying the basic attribute criterion of the loss function is the first calculation graph of the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) satisfies the following formula:
    所述第二损失函数满足所述目标任务指标为通过所述第二损失函数对任务数据进行训练得到的输出指标达到预设值。The second loss function satisfies the target task index when the output index obtained by training the task data through the second loss function reaches a preset value.
  19. 根据权利要求17或18所述的装置,其特征在于,所述处理单元用于:The device according to claim 17 or 18, characterized in that the processing unit is used for:
    根据所述第二损失函数的第一计算图对应的第一函数t(x)、第二计算图对应的第二函数n(x)和常数s,获得第一特征向量;The first calculation graph according to the second loss function The corresponding first function t(x) and the second calculation graph The corresponding second function n(x) and constant s are used to obtain the first eigenvector;
    根据所述当前种群损失函数集中的种群损失函数,获得第二特征向量集合,所述第二特征向量集合中包括每个所述种群损失函数对应的第二特征向量;Obtain a second feature vector set according to the population loss function in the current population loss function set, and the second feature vector set includes a second feature vector corresponding to each of the population loss functions;
    如果所述第一特征向量和所述每个所述种群损失函数对应的第二特征向量不等价,则所述第二损失函数确定为所述候选损失函数。If the first feature vector and the second feature vector corresponding to each of the population loss functions are not equivalent, the second loss function is determined as the candidate loss function.
  20. 一种自动搜索的性能预测模型的训练装置,其特征在于,所述装置包括获取单元和处理单元:A training device for an automatically searched performance prediction model, characterized in that the device includes an acquisition unit and a processing unit:
    所述获取单元用于,获取第一训练数据集,所述第一训练数据包括样本数据和所述样本数据对应的评估分数;The acquisition unit is configured to acquire a first training data set, where the first training data includes sample data and an evaluation score corresponding to the sample data;
    所述处理单元用于,根据所述第一训练数据集对所述性能预测模型进行训练,得到目标性能预测模型,其中所述性能预测模型的损失函数包括可微分的排序损失函数LK和回归损失函数。The processing unit is configured to train the performance prediction model according to the first training data set to obtain a target performance prediction model, wherein the loss function of the performance prediction model includes a differentiable ranking loss function L K and regression loss function.
  21. 根据权利要求20所述的训练装置,其特征在于,所述回归损失函数为均方误差损失函数LMSEThe training device according to claim 20, wherein the regression loss function is a mean square error loss function L MSE .
  22. 根据权利要求20或21所述的训练装置,其特征在于,所述装置还包括更新单元:The training device according to claim 20 or 21, characterized in that the device further includes an update unit:
    所述更新单元用于,更新所述第一训练数据集;The update unit is used to update the first training data set;
    所述处理单元用于,当所述第一训练数据集的增量到达第一阈值时,根据更新后的第一训练数据集对所述目标性能预测模型进行训练,得到更新后的目标性能预测模型。The processing unit is configured to, when the increment of the first training data set reaches a first threshold, train the target performance prediction model according to the updated first training data set to obtain an updated target performance prediction. Model.
  23. 一种自动搜索装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求1至8中任一项所述的方法。An automatic search device, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to execute the method described in any one of claims 1 to 8 .
  24. 一种自动搜索的性能预测模型训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求9至11中任一项所述的方法。An automatic search performance prediction model training device, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to execute any one of claims 9 to 11 method described in the item.
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储程序代码,该程序代码包括用于执行如权利要求1至8中任一项所述的方法,或者用于执行如权利9至11任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable medium stores program code, and the program code includes a method for executing the method as described in any one of claims 1 to 8, or for executing the method as claimed in claim 1. The method described in any one of 9 to 11.
  26. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至8或9至11中任一项所述的方法。 A chip, characterized in that the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface to execute any one of claims 1 to 8 or 9 to 11 method described in the item.
PCT/CN2023/079287 2022-03-14 2023-03-02 Automatic search method, automatic-search performance prediction model training method and apparatus WO2023174064A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210249999.8 2022-03-14
CN202210249999.8A CN116805384A (en) 2022-03-14 2022-03-14 Automatic searching method, automatic searching performance prediction model training method and device

Publications (1)

Publication Number Publication Date
WO2023174064A1 true WO2023174064A1 (en) 2023-09-21

Family

ID=88022344

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079287 WO2023174064A1 (en) 2022-03-14 2023-03-02 Automatic search method, automatic-search performance prediction model training method and apparatus

Country Status (2)

Country Link
CN (1) CN116805384A (en)
WO (1) WO2023174064A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272839A (en) * 2023-11-20 2023-12-22 北京阿迈特医疗器械有限公司 Support press-holding performance prediction method and device based on neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488971A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Neural network model searching method and device, and image processing method and device
CN112488292A (en) * 2020-11-19 2021-03-12 杭州电子科技大学 Neural framework searching method for general multi-modal learning
CN113011575A (en) * 2019-12-19 2021-06-22 华为技术有限公司 Neural network model updating method, image processing method and device
CN113094822A (en) * 2021-03-12 2021-07-09 华中科技大学 Method and system for predicting residual life of mechanical equipment
US20210365517A1 (en) * 2020-12-18 2021-11-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for Training Fusion Ordering Model, Search Ordering Method, Electronic Device and Storage Medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011575A (en) * 2019-12-19 2021-06-22 华为技术有限公司 Neural network model updating method, image processing method and device
CN111488971A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Neural network model searching method and device, and image processing method and device
CN112488292A (en) * 2020-11-19 2021-03-12 杭州电子科技大学 Neural framework searching method for general multi-modal learning
US20210365517A1 (en) * 2020-12-18 2021-11-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for Training Fusion Ordering Model, Search Ordering Method, Electronic Device and Storage Medium
CN113094822A (en) * 2021-03-12 2021-07-09 华中科技大学 Method and system for predicting residual life of mechanical equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272839A (en) * 2023-11-20 2023-12-22 北京阿迈特医疗器械有限公司 Support press-holding performance prediction method and device based on neural network
CN117272839B (en) * 2023-11-20 2024-02-06 北京阿迈特医疗器械有限公司 Support press-holding performance prediction method and device based on neural network

Also Published As

Publication number Publication date
CN116805384A (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN113705772A (en) Model training method, device and equipment and readable storage medium
JP7250126B2 (en) Computer architecture for artificial image generation using autoencoders
CN110347932B (en) Cross-network user alignment method based on deep learning
CN114048331A (en) Knowledge graph recommendation method and system based on improved KGAT model
WO2022016556A1 (en) Neural network distillation method and apparatus
CN112905801A (en) Event map-based travel prediction method, system, device and storage medium
WO2023280113A1 (en) Data processing method, training method for neural network model, and apparatus
WO2022083624A1 (en) Model acquisition method, and device
WO2021136058A1 (en) Video processing method and device
WO2024067373A1 (en) Data processing method and related apparatus
US20200272812A1 (en) Human body part segmentation with real and synthetic images
WO2023185925A1 (en) Data processing method and related apparatus
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
WO2023174064A1 (en) Automatic search method, automatic-search performance prediction model training method and apparatus
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
WO2023050143A1 (en) Recommendation model training method and apparatus
EP3888008A1 (en) Computer architecture for artificial image generation
Vijayaprabakaran et al. Neuroevolution based hierarchical activation function for long short-term model network
WO2023231753A1 (en) Neural network training method, data processing method, and device
CN117390267A (en) Knowledge graph-based personalized multitask enhanced recommendation model
WO2023273934A1 (en) Method for selecting hyper-parameter of model, and related apparatus
CN116910357A (en) Data processing method and related device
Rahul et al. Deep auto encoder based on a transient search capsule network for student performance prediction
Janković Babić A comparison of methods for image classification of cultural heritage using transfer learning for feature extraction
CN116843022A (en) Data processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769570

Country of ref document: EP

Kind code of ref document: A1