WO2023179609A1 - 一种数据处理方法及装置 - Google Patents

一种数据处理方法及装置 Download PDF

Info

Publication number
WO2023179609A1
WO2023179609A1 PCT/CN2023/082786 CN2023082786W WO2023179609A1 WO 2023179609 A1 WO2023179609 A1 WO 2023179609A1 CN 2023082786 W CN2023082786 W CN 2023082786W WO 2023179609 A1 WO2023179609 A1 WO 2023179609A1
Authority
WO
WIPO (PCT)
Prior art keywords
hyperparameter
samples
neural
predictor
prediction
Prior art date
Application number
PCT/CN2023/082786
Other languages
English (en)
French (fr)
Inventor
周彧聪
钟钊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023179609A1 publication Critical patent/WO2023179609A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a data processing method and device.
  • Black box optimization also known as Hyper Parameter Optimization
  • Hyper Parameter Optimization is an important technology in scientific research and industrial production.
  • You can find a better parameter combination by trying different parameter combinations and observing the system output results. This attempt is expensive and requires a long time or a large amount of certain resources to obtain the output results.
  • black-box optimization can be used to solve the problem.
  • neural predictors used for prediction
  • several sets of prediction indicators corresponding to hyperparameters are obtained in advance before hyperparameter search.
  • neural predictors require more training data to train neural predictors with generalization properties.
  • black-box optimization scenarios a single evaluation generally costs a lot, so less training data is available, and the trained neural predictor has poor generalization, resulting in poor search results.
  • Embodiments of the present application provide a data processing method and device that use fewer training samples to obtain a neural predictor with better generalization performance.
  • embodiments of the present application provide a data processing method, including: receiving hyperparameter information sent by a user device, where the hyperparameter information is used to indicate a hyperparameter search space corresponding to a user task; and sampling multiple hyperparameter search spaces from the hyperparameter search space.
  • Parameter combination use the first hyperparameter combination, multiple samples included in the training set, and the evaluation indicators of the multiple samples as the input of the neural predictor, and determine the prediction indicator corresponding to the first hyperparameter combination through the neural predictor, and the first hyperparameter Combined into any one of multiple hyperparameter combinations to obtain multiple prediction indicators corresponding to multiple hyperparameter combinations; send K hyperparameter combinations to the user device, where K is a positive integer; among them, K corresponding to the K hyperparameter combinations A predictive indicator is the highest K among multiple predictive indicators.
  • user tasks can be molecular design tasks, materials science tasks, factory debugging tasks, chip design tasks, neural network structure design tasks, neural network training and tuning tasks, etc.
  • Design any task with neural network structure Taking the task as an example, the user task needs to optimize the design parameters of the neural network structure, such as the number of convolution layers, convolution kernel size, expansion size, etc.
  • Users can perform specific user tasks based on the received hyperparameter combinations, such as video classification, text recognition, image beautification, speech recognition and other tasks.
  • Hyper-parameters can be understood as operating parameters of a system, product or process. Hyperparameter information can be understood as containing the value range or value conditions of some hyperparameters.
  • hyperparameters are parameters whose initialization values are set by the user before starting the learning process. They are parameters that cannot be learned through the training process of the neural network itself.
  • these hyperparameters include: convolution kernel size, number of neural network layers, activation function, loss function, type of optimizer used, learning rate, batch size batch_size, number of training rounds epoch, etc.
  • the hyperparameter search space includes some hyperparameters required by the user's task.
  • the value of each hyperparameter can be a continuously distributed value or a discrete distributed value. For example:
  • wd numerical type (0.02, 0.4, 0.01), indicating weight attenuation
  • dropout numerical type (0.0, 0.3, 0.025), indicating the probability of dropout
  • drop_conn_rate numerical type (0.0, 0.4, 0.025), indicating the probability of drop connection
  • mixup numerical type (0.0, 1.0, 0.05), indicating the distribution parameters of mixup;
  • color numerical type (0.0, 0.5, 0.025), indicating the intensity of color data enhancement
  • re_prob Numeric type (0.0, 0.4, 0.025), indicating the probability of random erase.
  • hyperparameter space is just an example. In actual applications, any hyperparameters that need to be optimized can be defined.
  • the input to the neural predictor provided in this application includes more than just hyperparameters. It also includes samples in the training set (also called superparameter samples) and corresponding evaluation indicators. Hyperparameter samples and evaluation indicators of the hyperparameter samples are used to assist in predicting the hyperparameter combination sampled from the hyperparameter search space. Since the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators, that is, when predicting the hyperparameters When combined, hyperparameter samples that already have evaluation indicators and evaluation indicators can be combined to improve the accuracy of prediction.
  • the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
  • the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
  • the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used. With fewer training samples, better generalization can be obtained. Good neural predictors.
  • K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment are received; the K hyperparameter combinations are used as K samples, and the K samples and The corresponding K evaluation indicators are added to the training set.
  • the training set is continuously updated and combined with the updated training set to predict the prediction results corresponding to the hyperparameter combination. That is, the evaluation indicators of the samples participating in the auxiliary prediction are better, so the accuracy of prediction can be improved.
  • the neural predictor is trained in the following manner: selecting multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and selecting a target from the training set Samples; use the multiple samples, the evaluation indicators corresponding to the multiple samples, and the target sample as inputs to the neural predictor, and determine the prediction indicators corresponding to the target sample through the neural predictor; according to the The network parameters of the neural predictor are adjusted based on the comparison results between the prediction index of the target sample and the evaluation index corresponding to the target sample.
  • the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
  • the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
  • the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used. With fewer training samples, better generalization can be obtained. Good neural predictors.
  • the training set before each round of using the neural predictor to determine prediction indicators corresponding to multiple hyperparameter combinations, can be used to train the neural predictor.
  • the training set can be updated, and the generalization of the trained neural predictor becomes increasingly better.
  • the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples are used as inputs to the neural predictor, and the first hyperparameter is determined through the neural predictor.
  • the prediction index corresponding to the parameter combination includes: inputting the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor; the neural predictor is based on the The first hyperparameter combination, the multiple samples, the evaluation indicators of the multiple samples, and the two anchor point features determine the prediction indicators corresponding to the first hyperparameter combination; wherein, the two anchor point features are The coding features of the lowest predictive index and the coding features of the highest predictive index are used to calibrate the user task.
  • two anchor point features are used to participate in the prediction of hyperparameter combinations.
  • the two anchor point features are used to calibrate the coding features of the lowest predictive index of the user task and the coding features of the highest predictive index, thereby preventing the prediction results from deviating from the prediction range. , further improving the accuracy of prediction.
  • the T+2 weights include the weights of T samples and two The weight of the anchor point feature; the neural predictor weights T+2 evaluation indicators according to T+2 weights to obtain the prediction indicators of the first hyperparameter combination; wherein the T+2 evaluation indicators include all The evaluation indicators of the T samples and the evaluation indicators corresponding to the two anchor point features are described.
  • the similarity is calculated to allow samples to participate in the prediction of the hyperparameter combination, and the prediction indicators of the hyperparameter combination are obtained through weighted evaluation indicators, which can improve the accuracy of the prediction results of the hyperparameter combination.
  • the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
  • the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
  • the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds, thereby reducing the number of training samples used, and using fewer training samples to obtain better generalization. Excellent neural predictors.
  • the two anchor point features belong to network parameters of the neural predictor.
  • the two anchor point features are learnable as network parameters. During the process of training the neural predictor, the update of the two anchor point features is supported.
  • the number of input samples supported by the neural predictor is T; combining the first hyperparameter, Multiple samples included in the training set and evaluation indicators of the multiple samples are used as inputs to the neural predictor, and the predictive indicators corresponding to the first hyperparameter combination are determined through the neural predictor, including: the neural predictor pair
  • the input T samples are encoded to obtain T auxiliary features, and the first hyperparameter combination is encoded to obtain the target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features respectively; so The neural predictor determines the weights corresponding to the T samples based on the similarity between the target feature and the T auxiliary features; the neural predictor determines the weights corresponding to the T samples based on the weights corresponding to the T samples.
  • the evaluation indicators are weighted to obtain the predictive indicators of the first hyperparameter combination.
  • the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and two anchor point features, including: the neural prediction
  • the device performs inner product processing on the target feature and the T auxiliary features to obtain the similarities corresponding to the target feature and the T auxiliary features, and performs inner product processing on the target feature and the two anchor point features respectively.
  • Inner product processing is performed to obtain the similarity between the target feature and the two anchor point features.
  • the number of hyperparameter samples supported by the neural predictor input is T; the first hyperparameter combination, multiple samples included in the training set and the evaluation indicators of the multiple samples are used as the The input of the neural predictor, determining the prediction index corresponding to the first hyperparameter combination through the neural predictor, includes: inputting T+1 connection parameter information into the neural predictor; the T+1 connection parameter information includes T connection parameter information obtained by connecting each sample among T samples and the corresponding evaluation index, and connection parameter information obtained by connecting the first hyperparameter combination and the target prediction index mask, and the target prediction index mask
  • the code is used to characterize the unknown prediction index corresponding to the first hyperparameter combination; the neural predictor performs similarity matching on every two connection parameter information among the input T+1 connection parameter information to obtain each two The similarity between connection parameter information; the neural predictor determines the prediction index of the first hyperparameter combination based on the similarity between every two connection parameter information in the T+1 connection parameter information.
  • the similarity is calculated to allow samples to participate in the prediction of the hyperparameter combination, and the prediction indicators of the hyperparameter combination are obtained through weighted evaluation indicators, which can improve the accuracy of the prediction results of the hyperparameter combination.
  • the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
  • the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
  • the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds, thereby reducing the number of training samples used, and using fewer training samples to obtain better generalization. Excellent neural predictors.
  • embodiments of the present application also provide a data processing device, including: a receiving unit, configured to receive hyperparameter information sent by the user equipment, where the hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task; processing A unit configured to sample multiple hyperparameter combinations from the hyperparameter search space; use the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples as inputs to the neural predictor, through the The neural predictor determines the prediction index corresponding to the first hyperparameter combination, and the first hyperparameter combination is any one of the multiple hyperparameter combinations, so as to obtain multiple predictions corresponding to the multiple hyperparameter combinations.
  • Indicator a sending unit, configured to send K hyperparameter combinations to the user equipment, where K is a positive integer; wherein the K prediction indicators corresponding to the K hyperparameter combinations are the highest K among the plurality of prediction indicators. indivual.
  • the receiving unit is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment; the processing unit is also configured to convert the K hyperparameters into Combine them into K samples, and add the K samples and the corresponding K evaluation indicators to the training set.
  • the processor is further configured to train the neural predictor in the following manner: select multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and select the neural predictor from the training set. Select a target sample from the training set; use the multiple samples, the evaluation indicators corresponding to the multiple samples, and the target sample as inputs to the neural predictor, and determine the corresponding target sample through the neural predictor The prediction index; according to the comparison result of the prediction index of the target sample and the evaluation index corresponding to the target sample, adjust the network parameters of the neural predictor.
  • the processing unit is specifically configured to: input the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor ;
  • the neural predictor determines the prediction index corresponding to the first hyperparameter combination based on the first hyperparameter combination, the multiple samples, the evaluation indicators of the multiple samples, and the two anchor point features; wherein,
  • the two anchor point features are used to calibrate the coding feature of the lowest predictive index and the coding feature of the highest predictive index of the user task.
  • the number of input samples supported by the neural predictor is T, and T is a positive integer;
  • the processing unit is specifically configured to: the neural predictor encodes the input T samples to obtain T auxiliary features, and encoding the first hyperparameter combination to obtain a target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and the two anchors.
  • the neural predictor determines T+2 weights based on the similarity between the target feature and the T auxiliary features and the two anchor point features, and the T+2 weights include T The weights of samples and the weights of two anchor point features; the neural predictor weights T+2 evaluation indicators according to T+2 weights to obtain the prediction indicators of the first hyperparameter combination; wherein, the T+ The two evaluation indicators include the evaluation indicators of the T samples and the evaluation indicators corresponding to the two anchor point features.
  • the two anchor point features belong to network parameters of the neural predictor.
  • the number of input samples supported by the neural predictor is T, and T is a positive integer; the processing unit is specifically used to: encode the input T samples through the neural predictor to obtain T auxiliary features are used to encode the first hyperparameter combination to obtain the target feature; the neural predictor is used to determine the similarity between the target feature and the T auxiliary features; the neural predictor is used to determine the similarity between the target feature and the T auxiliary features according to the The similarity between the target feature and the T auxiliary features determines the weights corresponding to the T samples; the evaluation indicators corresponding to the T samples are weighted by the neural predictor according to the weights corresponding to the T samples. Obtain the predictive index of the first hyperparameter combination.
  • the number of hyperparameter samples supported by the neural predictor is T, and T is a positive integer; the processing unit is specifically used to: input T+1 connection parameter information into the neural predictor. Predictor; the T+1 connection parameter information includes T connection parameter information obtained after connecting each sample in the T samples and the corresponding evaluation index, and the first hyperparameter combination and the target prediction index mask connection After obtaining the connection parameter information, the target prediction index mask is used to characterize the unknown prediction index corresponding to the first hyperparameter combination; through the neural predictor, each of the input T+1 connection parameter information is Similarity matching between two connection parameter information is performed to obtain the similarity between each two connection parameter information; the neural predictor is used to calculate the similarity between each two connection parameter information in the T+1 connection parameter information. Determine a predictor for the first combination of hyperparameters.
  • embodiments of the present application also provide a data processing system, including user equipment and execution equipment.
  • User device used to send hyperparameter information to the execution device.
  • Hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
  • the execution device is used to receive the hyperparameter information sent by the user device and sample multiple hyperparameter combinations from the hyperparameter search space.
  • the execution device uses the first hyperparameter combination, the multiple samples included in the training set, and the evaluation indicators of the multiple samples as the input of the neural predictor, and determines the prediction indicator corresponding to the first hyperparameter combination through the neural predictor.
  • the first hyperparameter group Combined into any one of multiple hyperparameter combinations to obtain multiple predictive indicators corresponding to multiple hyperparameter combinations.
  • the execution device sends K hyperparameter combinations to the user device, where K is a positive integer; among them, the K prediction indicators corresponding to the K hyperparameter combinations are the K highest among the multiple prediction indicators.
  • the user device is also used to receive K hyperparameter combinations sent by the execution device.
  • the user device may perform evaluation of K hyperparameter combinations.
  • the user device sends K evaluation indicators corresponding to K hyperparameter combinations to the execution device.
  • the execution device is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment; use the K hyperparameter combinations as K samples, and combine the K samples and the corresponding The K evaluation indicators are added to the training set.
  • embodiments of the present application provide a data processing device, including: a processor and a memory; the memory is used to store instructions, and when the device is running, the processor executes the instructions stored in the memory, so that the device The method provided by the first aspect or any design of the first aspect is executed. It should be noted that the memory can be integrated into the processor or independent of the processor.
  • embodiments of the present application also provide a readable storage medium, which stores programs or instructions that, when run on a computer, cause any of the methods described in the above aspects to be executed. .
  • embodiments of the present application also provide a computer program product containing a computer program or instructions, which when run on a computer, causes the computer to perform any of the methods described in the above aspects.
  • the present application provides a chip system.
  • the chip is connected to a memory and is used to read and execute the software program stored in the memory to implement the method of any design in any aspect.
  • Figure 1 is a schematic diagram of an artificial intelligence subject framework applied in this application
  • Figure 2 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a system architecture 300 provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of a neural predictor provided by an embodiment of the present application.
  • Figure 5A is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • Figure 5B is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of the training process of a neural predictor provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
  • Figure 8A is a schematic diagram of the processing flow of a neural predictor provided by an embodiment of the present application.
  • Figure 8B is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
  • Figure 9A is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
  • Figure 9B is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
  • Figure 10A is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
  • Figure 10B is a schematic diagram of the processing flow of yet another neural predictor provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework.
  • the main framework describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of artificial intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA, etc.);
  • the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include cloud storage and Computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, smart city, smart terminal, etc.
  • the neural network used in the neural predictor involved in this application serves as an important node and is used to implement machine learning, deep learning, search, reasoning, decision-making, etc.
  • the neural networks mentioned in this application can include various types, such as deep neural networks (DNN), convolutional neural networks (CNN), recurrent neural networks (RNN), residual network, neural network using transformer model or other neural networks, etc.
  • DNN deep neural networks
  • CNN convolutional neural networks
  • RNN recurrent neural networks
  • residual network neural network using transformer model or other neural networks, etc.
  • the work of each layer in a deep neural network can be expressed mathematically To describe: From the physical level, the work of each layer in the deep neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the columns) through five operations on the input space (a collection of input vectors). space), these five operations include: 1. Dimension raising/reducing; 2. Zoom in/out; 3. Rotation; 4. Translation; 5. "Bend”. Among them, the operations of 1, 2 and 3 are performed by Completed, the operation of 4 is completed by +b, and the operation of 5 is implemented by a(). The reason why the word "space” is used here is because the object to be classified is not a single thing, but a class of things.
  • W is a weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
  • This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how to transform the space.
  • the purpose of training a neural network is to finally obtain the weight matrix of all layers of the trained neural network (a weight matrix formed by the vector W of many layers). Therefore, the training process of neural network is essentially to learn how to control spatial transformation, and more specifically, to learn the weight matrix.
  • the neural network using the transformer model in this application can include several encoders.
  • Each encoder can include an attention (self attention) layer and a feed forward layer (feed forward layer).
  • the attention layer can use the Multi-Head self-Attention mechanism.
  • the feedforward layer can use a feedforward neural network (FNN).
  • FNN feedforward neural network
  • Each neuron in a feedforward neural network is arranged hierarchically, and each neuron is only connected to the neuron of the previous layer. Receives the output of the previous layer and outputs it to the next layer. There is no feedback between layers.
  • the encoder is used to convert the input corpus into feature vectors.
  • the multi-head self-attention layer uses calculations between three matrices to calculate the data input to the encoder.
  • the three matrices include query matrix Q (query), key matrix K (key) and value matrix V (value).
  • the multi-head self-attention layer refers to the various interdependencies between the word at the current position and words at other positions in the sequence.
  • the feedforward layer is a linear transformation layer that linearly transforms the representation of each word.
  • CNN Convolutional neural networks
  • the deep learning architecture refers to the algorithm of machine learning in different abstractions. Learning at multiple levels.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network processes the data input into it.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the convolution layer/pooling layer 120 may include layers 121-126 as examples.
  • layer 121 is a convolution layer
  • layer 122 is a pooling layer
  • layer 123 is a convolution layer
  • 124 is a pooling layer
  • 121 and 122 are convolution layers
  • 123 is a pooling layer
  • 124 and 125 are convolution layers
  • 126 is Pooling layer.
  • the output of the convolutional layer can be used as the input of the subsequent pooling layer, or can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators, and the convolution operators are also called convolution kernels.
  • the convolution operator can essentially be a weight matrix, which is usually predefined. Taking image processing as an example, different weight matrices extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract the specific color of the image, and another weight matrix is used to extract the image in the image. Blur unwanted noise.
  • weight values in these weight matrices require a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can extract information from the input data, thereby helping the convolutional neural network 100 to make correct predictions.
  • the initial convolutional layer (for example, 121) often extracts more The general characteristics of Features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can also be a multi-layer convolution layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the average value of pixel values in an image within a specific range.
  • the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
  • the operators in the pooling layer should also be related to the size of the image.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
  • Each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 120 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to use the neural network layer 130 to generate an output or a set of required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in Figure 2) and an output layer 140. The parameters included in the multiple hidden layers may be based on specific task types. Related training data are pre-trained. For example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc.
  • the output layer 140 After the multi-layer hidden layer in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140.
  • the output layer 140 has a loss function similar to classification cross entropy, specifically used to calculate the prediction error.
  • the convolutional neural network 100 shown in Figure 2 is only an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, multiple The convolutional layers/pooling layers are parallel, and the extracted features are all input to the neural network layer 130 for processing.
  • black-box optimization is performed through neural predictors.
  • Black-box optimization can be used to find optimal operating parameters for a system, product, or process whose performance can be measured or evaluated as a function of these parameters.
  • Black box optimization can also be understood as hyperparameter optimization, which is used to optimize hyperparameters.
  • Hyper-parameters are parameters whose values are set before starting the learning process and are parameters obtained without training. Hyperparameters can be understood as operating parameters of a system, product or process.
  • hyperparameter tuning of neural networks can be understood as a black-box optimization problem. The various neural networks currently used are trained through data and a certain learning algorithm to obtain a model that can be used for prediction and estimation.
  • hyperparameters parameters such as the learning rate in the algorithm or the number of samples in each batch that are not obtained through training are generally called hyperparameters.
  • the array set can contain the values of all or part of the hyperparameters of the neural network.
  • the weight of each neuron will be optimized with the value of the loss function to reduce the value of the loss function. In this way, the parameters can be optimized through algorithms to obtain the model.
  • the hyperparameters are used to adjust the entire network training process, such as the number of hidden layers of the aforementioned convolutional neural network, the size or number of kernel functions, etc. Hyperparameters are not directly involved in the training process, but only serve as configuration variables.
  • Bayesian optimization can be employed for black-box optimization.
  • Bayesian optimization is based on Gaussian models. Model the objective function of the position based on known samples to obtain the mean function and the confidence of the mean function. For a certain point, the larger the confidence range, the lower the uncertainty of the modeling of the point, that is, the greater the probability that the true value deviates from the predicted value of the mean at that point. Bayesian optimization decides which point to try to model next based on the mean and confidence. Bayesian optimization methods generally make continuity assumptions for the target problem. For example, the larger the hyperparameter value, the larger the prediction result. However, if the target problem does not comply with this continuity assumption, the modeling effect of Bayesian optimization will be poor and the sampling efficiency will be reduced.
  • the neural predictor uses a neural network. Compared with Bayesian optimization, the neural network is used instead of the Gaussian model to model the target problem. However, the neural network method requires more training data to train a generalized neural predictor. In a black-box optimization scenario, the cost of a single evaluation is large, so the number of training samples obtained is small, resulting in poor generalization of the neural predictor obtained by training, resulting in the hyperparameters searched not being the hyperparameters with the optimal evaluation results. parameter.
  • embodiments of the present application provide a data processing method that combines training samples to assist in the prediction of target hyperparameter combinations. Since the evaluation results corresponding to the hyperparameter combinations in the training samples have been verified by users, the accuracy is high.
  • the prediction of the target hyperparameter combination uses the assistance of user-verified training samples. Compared with the assistance of no user-verified training samples, the solution adopted in the embodiment of the present application predicts the target hyperparameter combination with higher accuracy. Furthermore, in order to obtain a neural predictor with better generalization, compared with the auxiliary solution that does not use user-verified training samples, the solution adopted in the embodiment of the present application uses fewer training samples to obtain a neural predictor with better generalization. predictor.
  • the embodiments of this application can be used for hyperparameter optimization of various complex systems. Scenarios where the embodiments of this application can be applied may include molecular design, materials science, factory debugging, chip design, neural network structure design, neural network training and tuning, etc.
  • tunable parameters e.g., compositions or types of ingredients
  • a physical product or a process for producing a physical product such as, for example, an alloy, a metamaterial, a concrete mixture, a process of pouring concrete, a pharmaceutical mixture, or a process of performing a treatment. or quantity, production sequence, production timing).
  • design parameters used to optimize the neural network structure such as the number of convolution layers, convolution kernel size, expansion size, and the position of the rectified linear unit (ReLU).
  • the data processing method provided by the embodiment of the present application can be executed by an execution device.
  • An execution device may be implemented by one or more computing devices.
  • Figure 3 shows a system architecture 300 provided by an embodiment of the present application. Included in the system architecture 300 is an execution device 210 .
  • Execution device 210 may be implemented by one or more computing devices. Execution device 210 may be arranged on one physical site, or distributed across multiple physical sites.
  • System architecture 300 also includes data storage system 250 .
  • the execution device 210 cooperates with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 210 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the data processing method provided by this application.
  • One or more computing devices can be deployed in a cloud network.
  • the data processing method provided by the embodiment of the present application is deployed in one or more computing devices of the cloud network in the form of a service, and the user device accesses the cloud service through the network.
  • the data processing method provided by the embodiment of the present application can be deployed on one or more local computing devices in the form of a software tool.
  • Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone Mobile phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearables, set-top boxes, game consoles, etc.
  • Each user's local device can interact with the execution device 210 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • execution device 210 may be implemented by each local device, for example, local device 301 may provide local data or feedback evaluation results to execution device 210 .
  • the execution device 210 can also be implemented by local devices.
  • the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for users of the local device 302 .
  • FIG 4 is a schematic structural diagram of a neural predictor provided by an embodiment of the present application.
  • the inputs of the neural predictor are multiple samples in the training set (which can also be called hyperparameter samples or auxiliary hyperparameter samples), the evaluation indicators corresponding to the multiple samples, and the target hyperparameter combination that needs to be predicted.
  • the output of the neural predictor is a predictor of the target hyperparameter combination that needs to be predicted.
  • Multiple auxiliary hyperparameter samples are used to assist the neural predictor in predicting the predictive index of the target hyperparameter combination that needs to be predicted.
  • FIG. 5A is a schematic flow chart of a data processing method provided by this application.
  • the method may be executed by an execution device, such as the execution device 210 in FIG. 3 .
  • the hyperparameter search space includes the hyperparameters required for the user's task. Hyperparameters can be sampled from the hyperparameter search space to obtain multiple hyperparameter values as a hyperparameter combination. It should be understood that a hyperparameter combination may include one or more hyperparameter values.
  • obtaining multiple hyperparameter combinations may receive hyperparameter information from a user device.
  • Hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
  • multiple hyperparameter combinations can be obtained by sampling from the hyperparameter search space.
  • the user device may send the hyperparameter information to the execution device 210 by calling a service.
  • the hyperparameter search space may include a variety of hyperparameters required for user tasks, and the value of each hyperparameter may be a continuously distributed value or a discrete distributed value.
  • the hyperparameter search space may include the value range of hyperparameter A as [1, 20], and the value range of hyperparameter B may include: 2, 3, 6, 7, etc. Therefore, when sampling in the hyperparameter search space, you can take any value from a continuous distribution or a value from a discrete distribution to get a set of hyperparameter combinations.
  • the following steps take prediction for a hyperparameter combination as an example. For example, taking the prediction of the first hyperparameter combination as an example, if the first hyperparameter combination is any one of multiple hyperparameter combinations, then the prediction of each hyperparameter combination can predict the first hyperparameter combination.
  • auxiliary hyperparameter samples Use the first hyperparameter combination, multiple hyperparameter samples in the training set, and evaluation indicators of the multiple hyperparameter samples as inputs to the neural predictor, and determine the prediction indicators corresponding to the first hyperparameter combination through the neural predictor.
  • multiple hyperparameter samples are called auxiliary hyperparameter samples.
  • auxiliary hyperparameter samples are used to assist the neural predictor in predicting the prediction index of the first hyperparameter combination.
  • the auxiliary hyperparameter samples used to assist the neural predictor in predicting the first hyperparameter combination can also be called "support samples" or other names, which are not specifically limited in the embodiments of this application.
  • auxiliary hyperparameter sample can be understood as a hyperparameter combination.
  • the hyperparameters corresponding to the auxiliary hyperparameter samples of the training set are combined into auxiliary hyperparameter combinations.
  • This auxiliary hyperparameter combination is also sampled from the hyperparameter search space.
  • Multiple hyperparameter combinations are not the same as multiple auxiliary hyperparameter combinations.
  • the evaluation index corresponding to the auxiliary hyperparameter combination can be obtained by evaluating the auxiliary hyperparameter combination through user tasks.
  • the evaluation indicators corresponding to each hyperparameter sample included in the training set can also be evaluated in other ways.
  • the index results of the hyperparameter combination predicted by the neural predictor are called prediction indicators, and the index results of the hyperparameter combination obtained by user task evaluation are called evaluation indicators.
  • the execution device may perform steps 203 and 204 .
  • the execution device can determine K hyperparameter combinations from multiple hyperparameter combinations. Specifically, K hyperparameter combinations with optimal (or highest) predictive indicators are obtained from multiple hyperparameter combinations. It can be understood that the predictive indicators corresponding to the K hyperparameter combinations are all higher than the multiple hyperparameter combinations.
  • the prediction index corresponding to any predicted hyperparameter combination except the K hyperparameter combinations mentioned above, K is a positive integer.
  • the execution device can send K hyperparameter combinations to the user device.
  • the embodiment of the present application can also update the training set. Determining which hyperparameter combinations to update into the training set can be based on the results of multiple iterative evaluations for multiple hyperparameter combinations.
  • the user device can trigger the user task.
  • Users can perform specific user tasks based on the received hyperparameter combinations, such as video classification, text recognition, image beautification, speech recognition and other tasks.
  • User tasks can be evaluated separately for the K hyperparameter combinations, and the evaluation indicators of the K hyperparameter combinations are sent to the execution device.
  • the execution device adds K hyperparameter combinations and corresponding evaluation indicators as auxiliary hyperparameter samples to the training set.
  • the function of the execution device is implemented by one or more computing devices deployed on the cloud network as an example. That is, the above-mentioned actions of sending K hyperparameter combinations and receiving evaluation results are between one or more computing devices and user devices in the cloud network. In some scenarios, when the data processing method is deployed on one or more local computing devices, the above actions of sending K hyperparameter combinations and receiving evaluation results can be between different components of the computing device, or between different computing devices. , or it can be to obtain the evaluation results of K hyperparameter combinations from the storage space of the computing device when the software program of the computing device is executed.
  • a component in a local computing device for executing an iterative sampling process sends K hyperparameter combinations to a component for evaluating hyperparameter combinations.
  • the component that evaluates the hyperparameter combination then performs the evaluation operation and sends the evaluated evaluation metrics to the component that performs the iterative sampling process.
  • steps 201 to 204 are called iterative sampling processes.
  • the execution device may execute multiple rounds of the iterative sampling process including steps 201-204.
  • the execution device After the execution device adds K hyperparameter combinations and corresponding evaluation indicators as auxiliary hyperparameter samples to the training set, the updated training set can be used for the next round of iterative sampling process.
  • the iterative sampling stop condition may include at least one of the following:
  • N is an integer greater than 1. After N rounds of iterative sampling are performed, the iterative sampling process is stopped.
  • the optimal evaluation index in the training set has not changed during the M consecutive rounds of iterative sampling.
  • the optimal evaluation metrics included in the training set can be recorded. For example, after the i-th round of iterative sampling, the training set The optimal evaluation index included is A. After the i+1 iterative sampling round, the optimal evaluation index included in the training set is still A, and so on. After the i+M-1 iterative sampling round, the training set includes the optimal The evaluation index is still A. From the i-th round of iterative sampling to the i+M-1 iterative sampling round, the optimal evaluation indicators included in the training set have not changed. Therefore, the next round of iterative sampling process will not be executed. Optionally, when the number of rounds of iterative sampling reaches the set maximum number of sampling rounds and condition (2) is not met, the next round of iterative sampling process will not be executed.
  • the optimal evaluation index in the training set reaches the set index. Specifically, after multiple rounds of iterative sampling processes, the optimal evaluation index in the training set reaches the set indicator, so the next round of iterative sampling processes is no longer performed. Optionally, when the number of iterative sampling rounds reaches the set maximum number of sampling rounds, the optimal evaluation index in the training set has not yet reached the set index, and the next round of iterative sampling process will not be executed.
  • the neural predictor in the embodiment of this application is trained using the training set. As shown in Figure 6, in each iteration of the training process, a sample is selected from the training set as the target sample.
  • the target sample can be understood as a target hyperparameter combination in the training set.
  • the training set also includes evaluation metrics corresponding to the target hyperparameter combination. In order to facilitate the distinction, the target hyperparameter combination and the evaluation index corresponding to the target hyperparameter combination are called target sample combination ⁇ target hyperparameter combination, evaluation index ⁇ .
  • the input of the neural predictor also needs to select multiple hyperparameter samples from the training set as auxiliary hyperparameter samples.
  • the training set also includes evaluation indicators corresponding to auxiliary hyperparameter samples.
  • auxiliary hyperparameter samples ie, auxiliary hyperparameter combinations
  • auxiliary hyperparameter combinations auxiliary hyperparameter combinations
  • evaluation indicators auxiliary hyperparameter combinations
  • the multiple auxiliary hyperparameter samples selected from the training set are different from the target samples.
  • the number of input auxiliary hyperparameter samples is T as an example.
  • the T auxiliary sample combinations and the target hyperparameter combination are input to the neural predictor, and the output of the neural predictor is the prediction indicator of the target hyperparameter combination.
  • the prediction index of the target hyperparameter combination is compared with the corresponding evaluation index of the target hyperparameter combination, the loss value is calculated, and then the weight of the neural predictor is updated based on the loss value.
  • a loss function can be used when calculating the loss value.
  • the loss function is also the objective function in the weight optimization process. Generally, the smaller the value of the loss function, the more accurate the output of the neural predictor.
  • the training process of neural predictors can be understood as the process of minimizing the loss function. Commonly used loss functions can include logarithmic loss function, square loss function, exponential loss function, etc.
  • optimization algorithms such as gradient descent algorithm, stochastic gradient descent algorithm, or momentum gradient descent algorithm (adaptive moment estimation, Adam) can be used to optimize the weight.
  • the training of the neural predictor can be performed in advance, or can be performed in the iterative sampling process of each round.
  • the neural predictor is first trained through multiple iterative trainings, or each time after the training set is updated, the neural predictor is trained, and then based on the neural predictor trained in this round of sampling process to predict multiple hyperparameter combinations.
  • the iterative evaluation process is interleaved with the iterative training process.
  • FIG. 7 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
  • Figure 7 takes as an example the intersection of the iterative evaluation process and the iterative training process in each round of iterative sampling process. Take, for example, updating K hyperparameter samples in the training set in each round of sampling process. The maximum number of sampling rounds is N.
  • the initial training set is empty.
  • User task execution evaluation can be triggered to obtain evaluation indicators.
  • the evaluation process and evaluation results of user tasks are not specifically limited in the embodiments of this application.
  • User task assessment can be manual or through user device assessment.
  • the data processing method is deployed on one or more computing devices of the cloud network as an example. That is, the above-mentioned actions of sending K hyperparameter combinations and receiving evaluation results are between one or more computing devices and user devices in the cloud network. In some scenarios, when the data processing method is deployed on one or more local computing devices, the above actions of sending K hyperparameter combinations and receiving evaluation results can be between different components of the computing device, or between different computing devices. It may also be that the evaluation indicators of K hyperparameter combinations are obtained from the storage space of the computing device when the software program is executed.
  • step 403 can be replaced by: the component for executing the iterative sampling process in the local computing device sends K hyperparameter combinations to the component for evaluating hyperparameter combinations, and then step 404 can be replaced by: used for evaluating hyperparameters
  • the combined components perform the evaluation operation and send the evaluated evaluation metrics to the component that performs the iterative sampling process.
  • L is greater than K.
  • the iterative sampling stop condition is that the number of rounds of iterative sampling reaches the maximum number of sampling rounds as an example.
  • the above-mentioned training of the neural predictor and the prediction of L hyperparameter combinations are performed in each round of iterative sampling process.
  • the number of epochs for training the neural predictor may be smaller than the number of epochs for iterative sampling.
  • the prediction of the training neural predictor and L hyperparameter combinations is performed in the first a-round iterative sampling process in the N-round iterative sampling process.
  • the training of the neural predictor is no longer performed, but only the prediction of L hyperparameter combinations is performed.
  • FIG. 8A is a schematic diagram of the processing flow of a neural predictor provided by an embodiment of the present application.
  • Select multiple hyperparameter samples from the training set as auxiliary hyperparameter samples.
  • the training set also includes evaluation indicators corresponding to auxiliary hyperparameter samples.
  • the input of the neural predictor includes T auxiliary samples and the evaluation indicators corresponding to the T auxiliary samples and the target hyperparameter combination.
  • Figure 8A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
  • the neural predictor jointly encodes T auxiliary hyperparameter samples and target hyperparameter combinations to obtain T+1 features.
  • the encoded features corresponding to the T auxiliary hyperparameter samples are called auxiliary features
  • the encoded features corresponding to the target hyperparameter combination are called target features.
  • the neural predictor includes at least one encoder layer.
  • two coding layers are taken as an example.
  • the encoding layer can use the encoding module in the transformer structure.
  • the encoding layer is composed of an attention layer (Attention Layer) and a feed-forward layer (Feed-forward layer).
  • the attention layer is used to perform similarity on T+1 hyperparameter combinations (including the hyperparameter combinations corresponding to the auxiliary hyperparameter samples and the target hyperparameter combination) obtained by combining the T auxiliary hyperparameter samples and the target hyperparameter combination.
  • the similarity matrix is obtained by calculation, and then T+1 hyperparameter combinations are weighted according to the similarity matrix to obtain T+1 features.
  • the T+1 features are sent to the feedforward layer for feature transformation, and finally the encoding layer Output T+1 encoded features. Fusion coding is performed on T+1 hyperparameter combinations through at least one coding layer.
  • the neural predictor determines the similarity between the target feature and the T auxiliary features, and then determines the weights corresponding to the T auxiliary hyperparameter samples based on the similarities between the target feature and the T auxiliary features.
  • the neural predictor weights the evaluation indicators included in the T auxiliary hyperparameter samples according to the corresponding weights of the T auxiliary hyperparameter samples to obtain the prediction index of the target hyperparameter combination.
  • the neural predictor performs inner product processing on the target feature and the T auxiliary features to obtain the similarities corresponding to the target feature and the T auxiliary features. Then, the neural predictor uses the softmax function to convert the similarities corresponding to the target feature and the T auxiliary features into weights corresponding to the T auxiliary hyperparameter samples.
  • the processing flow of the neural predictor shown in Figures 8A-8B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Furthermore, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the training set is used to adjust the weight of the neural predictor.
  • FIG. 9A is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
  • the input of the neural predictor includes T auxiliary hyperparameter samples, the evaluation indicators corresponding to the T auxiliary hyperparameter samples, and the target hyperparameter combination.
  • Figure 9A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
  • the neural predictor determines the prediction index corresponding to the target hyperparameter combination based on the target hyperparameter combination, the auxiliary hyperparameter samples 1 to N, the evaluation indicators 1 to N corresponding to the auxiliary hyperparameter samples 1 to N, and the two anchor point features. .
  • the two anchor features are used to calibrate the coding features of the lowest predictive index and the coding features of the highest predictive index of the target task.
  • the neural predictor when determining the predictive index corresponding to the target hyperparameter combination, can jointly encode T auxiliary hyperparameter samples and the target hyperparameter combination to obtain T+1 features.
  • the encoded features corresponding to the T auxiliary hyperparameter samples are called auxiliary features
  • the encoded features corresponding to the target hyperparameter combination are called target features.
  • the neural predictor includes at least one encoding layer (encoder layer). The neural predictor combines T+1 hyperparameters (composed of T auxiliary hyperparameter samples and target hyperparameters) through the encoding layer. obtained by combining) for joint encoding. The specific method can be seen in Figure 8B, which will not be described again here.
  • the neural predictor determines the similarity between the target feature and the T auxiliary features and the similarity between the target feature and the two anchor features respectively.
  • anchor feature 1 the anchor feature of the coding feature with the lowest predictive index
  • anchor feature 2 the anchor feature of the coding feature with the highest predictive index
  • the inner product method can be used when determining the similarity.
  • the neural predictor determines the weights corresponding to the T auxiliary hyperparameter samples and the two anchor point features based on the similarity between the target feature and the T auxiliary features and the two anchor point features.
  • the evaluation indicators included in the T auxiliary hyperparameter samples and the prediction indicators corresponding to the two anchor points are weighted according to the weights corresponding to the T auxiliary hyperparameter samples and the two anchor points to obtain the target hyperparameter combination output by the neural predictor.
  • predictive indicators For example, as shown in FIG. 9B , the prediction index corresponding to anchor feature 1 can be configured as 0, and the prediction index corresponding to anchor feature 2 can be configured as 1.
  • the Softmax function can be used.
  • two anchor point features are learnable, and the two anchor point features can be understood as learnable parameters of the neural predictor.
  • the two anchor features can be updated simultaneously every time the weights of the neural predictor are updated.
  • the processing flow of the neural predictor shown in Figures 9A-9B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Then, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the hyperparameter sample is used to adjust the weight of the neural predictor and the two anchor point features. .
  • FIG. 10A is a schematic diagram of the processing flow of another neural predictor provided by an embodiment of the present application.
  • the input of the neural predictor includes auxiliary hyperparameter samples, evaluation indicators corresponding to T auxiliary hyperparameter samples, target hyperparameter combination, and target prediction indicator mask corresponding to the target hyperparameter combination.
  • Figure 10A refers to the T auxiliary hyperparameter samples as auxiliary hyperparameter samples 1 to N, and the evaluation indicators corresponding to the auxiliary superparameter samples 1 to N are called evaluation indicators 1 to N; that is, ⁇ auxiliary superparameter sample 1, evaluation index 1 ⁇ ,..., ⁇ auxiliary hyperparameter sample T, evaluation index T ⁇ .
  • the auxiliary hyperparameter sample and the corresponding evaluation index can be connected to obtain the connection parameter information, and then the connection parameter information is input into the neural predictor.
  • evaluation index 1 ⁇ the connection parameter information 1 is obtained by connecting the auxiliary hyperparameter sample 1 and the evaluation index 1.
  • Target connection parameter information is obtained for the target hyperparameter combination and target prediction indicator mask connection.
  • the target prediction index mask is used to characterize the unknown prediction index corresponding to the target hyperparameter combination.
  • the target predictor mask is learnable. After initial configuration, when training a neural predictor, this target predictor mask can be updated each time the neural predictor's weights are updated.
  • the neural predictor performs similarity matching on each two connection parameter information among the input T+1 connection parameter information to obtain the similarity between each two connection parameter information. Further, the neural predictor determines the prediction index corresponding to the target hyperparameter combination based on the similarity between each two of the T+1 connection parameter information.
  • the neural predictor includes multiple coding layers.
  • two coding layers are taken as an example.
  • Neural predictors also include FC/sigmoid layers.
  • the coding layer can be the standard coding layer in the Transformer structure, which is composed of an attention layer (Attention Layer) and a feed-forward layer (Feed-forward layer).
  • the attention layer is used to calculate the similarity of the input T+1 connection parameter information in pairs to obtain a similarity matrix, and then weight the T+1 connection information according to the similarity matrix to obtain T+1 features.
  • the T+1 features are sent to the feed-forward layer, and the feed-forward layer performs feature transformation on the T+1 features.
  • T+1 connection parameter information can be fused to comprehensively predict target prediction indicators.
  • the features corresponding to the target connection parameter information in the T+1 features output by the coding layer are input to the FC/sigmoid layer.
  • the neural predictor reduces the dimensionality of the features corresponding to the target connection parameter information through the FC/sigmoid layer to obtain 1-dimensional features. This feature is normalized by the Sigmoid function to a value between 0 and 1, which is the target prediction index corresponding to the predicted target hyperparameter combination.
  • the target predictor mask is learnable, and the target predictor mask can be understood as a learnable parameter of the neural predictor.
  • the target predictor mask can be updated simultaneously with each update of the neural predictor's weights.
  • the processing flow of the neural predictor shown in Figures 10A-10B is applicable to both the training flow and the evaluation flow. If in the training process, the target hyperparameter combination also comes from the training set. Furthermore, the comparison result between the prediction index corresponding to a target hyperparameter combination output by the neural predictor and the evaluation index corresponding to the target hyperparameter combination in the hyperparameter sample is used to adjust the weight of the neural predictor and the target prediction index mask.
  • the hyperparameter search space is defined as follows, where the three numerical values represent the minimum value, maximum value and step size of the hyperparameter respectively.
  • wd numerical type (0.02, 0.4, 0.01), indicating weight attenuation
  • dropout numerical type (0.0, 0.3, 0.025), indicating the probability of dropout
  • drop_conn_rate numerical type (0.0, 0.4, 0.025), indicating the probability of drop connection
  • mixup numerical type (0.0, 1.0, 0.05), indicating the distribution parameters of mixup;
  • color numerical type (0.0, 0.5, 0.025), indicating the intensity of color data enhancement
  • re_prob Numeric type (0.0, 0.4, 0.025), indicating the probability of random erase.
  • hyperparameter space is just an example. In actual applications, any hyperparameters that need to be optimized can be defined.
  • the optimizer refers to the parameters used to optimize the machine learning algorithm, such as network weights. Optimization algorithms such as gradient descent, stochastic gradient descent, or momentum gradient descent algorithm (adaptive moment estimation, Adam) can be used for parameter optimization.
  • the learning rate refers to the magnitude of updating parameters in each iteration of the optimization algorithm, also called the step size. When the step size is too large, the algorithm will not converge and the objective function of the model will be in a state of oscillation, while when the step size is too small, the convergence speed of the model will be too slow.
  • A1 initialize the neural predictor and execute A2. Understandably, the initial training set is empty.
  • A2 sample 16 hyperparameter combinations from the hyperparameter search space, and execute A3.
  • A4 update the 16 hyperparameter combinations and the evaluation indicators corresponding to the 16 hyperparameter combinations into the training set as hyperparameter samples.
  • A5 perform multiple iterations of training on the neural predictor based on the training set to obtain the neural predictor obtained in the i-th round of sampling process.
  • the number of rounds of iterative training is not specifically limited in the embodiment of this application.
  • For the training process please refer to the description of the corresponding embodiment in Figure 3 and will not be described again here.
  • A6 sample 1000 hyperparameter combinations from the hyperparameter search space. It can be understood that these 1000 hyperparameter combinations are different from the previously sampled hyperparameter combinations.
  • the iterative sampling stop condition is that the number of rounds of iterative sampling reaches the maximum number of sampling rounds as an example.
  • A9 select the 16 hyperparameter combinations with the best prediction indicators from 1000 hyperparameter combinations, and continue to execute A3.
  • the solution provided by the embodiment of the present application reaches the level of Bayesian optimization, the number of samples of the embodiment of the present application is lower than the number of samples of Bayesian optimization (that is, the number of manually confirmed prediction indicators).
  • the solution provided by the embodiments of this application is used to combine training samples to assist in the prediction of target hyperparameter combinations. Since the evaluation results corresponding to the hyperparameter combinations in the training samples have been verified by users, the accuracy is high.
  • the prediction of the target hyperparameter combination is assisted by user-verified training samples. Compared with the existing ordinary predictors whose input only includes the target hyperparameter combination, the scheme adopted in the embodiment of the present application predicts the target hyperparameter combination. The prediction results are more accurate.
  • the input of existing neural predictors only includes target evaluation samples, and there are no other reference samples and evaluation indicators. It is necessary to obtain evaluation indicators of many real samples in advance to train the neural predictor.
  • the input of the neural predictor includes hyperparameter samples and evaluation indicators that already have evaluation indicators.
  • the evaluation indicators of the hyperparameter sample have been referred to, so that the accuracy of predicting the prediction indicators of the target sample is This improves the accuracy of adjusting the weight of the neural predictor based on the accuracy of the prediction index, thereby reducing the number of training rounds and thus reducing the number of training samples used.
  • the solution adopted in the embodiment of this application uses fewer training samples to obtain a neural predictor with better generalization.
  • the embodiment of the present application also provides a data processing device.
  • the device may be a processor, a chip or a chip system in the execution device, or a module in the execution device.
  • the device may include a receiving unit 1101, a processing unit 1102 and a sending unit 1103.
  • the receiving unit 1101, the processing unit 1102, and the sending unit 1103 are configured to perform the method steps shown in the embodiments corresponding to FIG. 5A and FIG. 7 .
  • the receiving unit 1101 is configured to receive hyperparameter information sent by the user equipment, where the hyperparameter information is used to indicate the hyperparameter search space corresponding to the user task.
  • the processing unit 1102 is configured to sample multiple hyperparameter combinations from the hyperparameter search space; use the first hyperparameter combination, multiple samples included in the training set, and the evaluation indicators of the multiple samples as inputs to the neural predictor, and use the neural predictor Determine the predictive index corresponding to the first hyperparameter combination, and the first hyperparameter combination is any one of multiple hyperparameter combinations, so as to obtain multiple predictive indicators corresponding to the multiple hyperparameter combinations.
  • the sending unit 1103 is configured to send K hyperparameter combinations to the user equipment, where K is a positive integer; where the K prediction indicators corresponding to the K hyperparameter combinations are the K highest ones among the multiple prediction indicators.
  • the receiving unit 1101 is also configured to receive K evaluation indicators corresponding to the K hyperparameter combinations sent by the user equipment.
  • the processing unit 1102 is also used to combine the K hyperparameters into K samples, and add the K samples and the corresponding K evaluation indicators to the training set.
  • the processing unit 1102 is also used to train the neural predictor by selecting multiple samples from the training set, evaluation indicators corresponding to the multiple samples, and selecting a target sample from the training set. Multiple samples, evaluation indicators corresponding to the multiple samples, and target samples are used as inputs to the neural predictor, and the prediction indicators corresponding to the target samples are determined through the neural predictor. Adjust the network parameters of the neural predictor according to the comparison results between the prediction indicators of the target sample and the evaluation indicators corresponding to the target sample.
  • the processing unit 1102 is specifically configured to input the first hyperparameter combination, multiple samples included in the training set, and evaluation indicators of the multiple samples into the neural predictor.
  • the neural predictor determines the prediction index corresponding to the first hyperparameter combination based on the first hyperparameter combination, multiple samples, evaluation indicators of the multiple samples, and two anchor point features.
  • two anchor point features are used to calibrate the coding features of the lowest predictive index and the coding features of the highest predictive index of the user task.
  • the number of input samples supported by the neural predictor is T, and T is a positive integer; processing Unit 1102 is specifically used for: the neural predictor encodes the input T samples to obtain T auxiliary features, and encodes the first hyperparameter combination to obtain the target feature; the neural predictor determines the similarity between the target feature and the T auxiliary features.
  • the neural predictor determines T+2 weights based on the similarity between the target feature and T auxiliary features and the two anchor point features, and the T+2 weights include T The weights of samples and the weights of two anchor point features; the neural predictor weights the T+2 evaluation indicators according to the T+2 weights to obtain the prediction indicators of the first hyperparameter combination; among them, the T+2 evaluation indicators include The evaluation indicators of T samples and the evaluation indicators corresponding to the two anchor point features.
  • the two anchor features belong to the network parameters of the neural predictor.
  • the number of input samples supported by the neural predictor is T; the processing unit 1102 is specifically configured to: encode the input T samples through the neural predictor to obtain T auxiliary features, and then encode the first super
  • the combination of parameters is encoded to obtain the target features.
  • the similarity between the target feature and T auxiliary features is determined through the neural predictor.
  • the neural predictor is used to determine the weights corresponding to the T samples based on the similarities between the target feature and the T auxiliary features.
  • the neural predictor weights the evaluation indicators corresponding to the T samples according to the weights corresponding to the T samples to obtain the prediction indicators of the first hyperparameter combination.
  • the number of hyperparameter samples supported by the neural predictor is T; the processing unit 1102 is specifically used to: input T+1 connection parameter information into the neural predictor; T+1 connection parameters
  • the information includes the connection parameter information obtained by connecting each sample in the T samples and the corresponding evaluation index, and the connection parameter information obtained by connecting the first hyperparameter combination and the target prediction index mask.
  • the target prediction index mask is used to characterize The unknown predictor corresponding to the first hyperparameter combination.
  • the neural predictor is used to perform similarity matching on each two connection parameter information among the input T+1 connection parameter information to obtain the similarity between each two connection parameter information.
  • the neural predictor determines the prediction index of the first hyperparameter combination based on the similarity between each two connection parameter information in the T+1 connection parameter information.
  • the device 1200 may include a communication interface 1210 and a processor 1220.
  • the device 1200 may also include a memory 1230.
  • the memory 1230 may be provided inside the device or outside the device.
  • the receiving unit 1101, the processing unit 1102 and the sending unit 1103 shown in FIG. 11 can all be implemented by the processor 1220.
  • the functions of the receiving unit 1101 and the sending unit 1103 are implemented by the communication interface 1210.
  • the functions of the processing unit 1102 are implemented by the processor 1220.
  • the processor 1220 receives the hyperparameter information through the communication interface 1210 and sends the hyperparameter combination, and is used to implement the methods described in FIG. 5A and FIG. 7 .
  • each step of the processing flow can complete the methods described in Figures 5A and 7 through instructions in the form of hardware integrated logic circuits or software in the processor 1220.
  • the communication interface 1210 may be a circuit, a bus, a transceiver, or any other device that can be used for information exchange.
  • the other device may be a device connected to the device 1200.
  • the processor 1220 can be a general processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and can implement or execute The disclosed methods, steps and logical block diagrams in the embodiments of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software units in the processor.
  • the program code executed by the processor 1220 to implement the above method may be stored in the memory 1230 . Memory 1230 and processor 1220 are coupled.
  • the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
  • the processor 1220 may cooperate with the memory 1230.
  • the memory 1230 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), or a volatile memory (volatile memory), such as a random access memory (random access memory). -access memory, RAM).
  • Memory 1230 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the embodiment of the present application does not limit the specific connection medium between the communication interface 1210, the processor 1220, and the memory 1230.
  • the memory 1230, the processor 1220 and the communication interface 1210 are connected through a bus in Figure 12.
  • the bus is represented by a thick line in Figure 12.
  • the connection methods between other components are only schematically explained. It is not limited.
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 12, but it does not mean that there is only one bus or one type of bus.
  • embodiments of the present application also provide a computer storage medium, which stores a software program.
  • the software program can implement any one or more of the above. Examples provide methods.
  • the computer storage medium may include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.
  • embodiments of the present application also provide a chip, which includes a processor and is used to implement the functions involved in any one or more of the above embodiments, such as obtaining or processing the information involved in the above methods or information.
  • the chip further includes a memory, and the memory is used for necessary program instructions and data executed by the processor.
  • the chip may be composed of chips or may include chips and other discrete devices.
  • One embodiment of the present application provides a computer-readable medium for storing a computer program.
  • the computer program includes instructions for executing the method steps in the method embodiment corresponding to FIG. 4 .
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including, but not limited to, disk storage, optical storage, etc.
  • This application refers to flowcharts of methods, devices (systems), and computer program products according to embodiments of the application. and/or block diagrams. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了人工智能领域的一种数据处理方法及装置,提高神经预测器预测的准确度,该神经预测器使用的训练样本数量较少。数据处理方法中将从用户任务对应的超参数搜索空间采样的超参数组合、训练集包括的多个样本和多个样本的评估指标作为神经预测器的输入,通过神经预测器确定该超参数组合对应的预测指标。通过超参样本以及超参样本的评估指标来辅助预测超参数搜索空间采样到的超参数组合,在预测超参数组合时结合了已经具有评估指标的超参样本以及评估指标,可以提高准确度。本申请采用训练样本的数量较少,通过较少的训练样本可以得到泛化性较好的神经预测器。

Description

一种数据处理方法及装置
相关申请的交叉引用
本申请要求在2022年03月24日提交中国专利局、申请号为202210303118.6、申请名称为“一种数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别涉及一种数据处理方法及装置。
背景技术
黑盒优化(Black box optimization)也可以称为超参数优化(Hyper Parameter Optimization),是科学研究和工业生产中的重要技术。实际问题中存在很多复杂的机器学习***,其中的一些参数会对机器学习***的结果产生影响,但具体的机制无法被完全解析,只能观察到***对给定输入的输出结果(也就是所谓的黑盒),因此这些参数很难通过梯度优化等比较高效的方法所优化。可以通过尝试不同的参数组合,观察***输出的结果,来寻找较优的参数组合。这种尝试的方式代价高昂,需要耗费较长的时间或者较多的某种资源,才能够得到输出结果。为了减少尽可能少的尝试,就得到较优的输入参数,可以采用黑盒优化的方式来解决。
由于神经网络具有较强的拟合能力,因此针对用于做预测的神经网络预测器(简称神经预测器)的黑盒优化,在超参数搜索之前,预先获取若干组超参数对应的预测指标,用于训练神经预测器。神经预测器训练完成之后,使用训练完成的神经预测器来寻找预测指标较优的超参数。但是神经预测器需要比较多的训练数据才能训练出具有泛化性的神经预测器。在黑盒优化场景中,一般单次评估的开销都很大,因此可以获得的训练数据较少,训练得到的神经预测器泛化性较差,导致搜索效果不佳。
发明内容
本申请实施例提供一种数据处理方法及装置,使用较少训练样本得到泛化性较优的神经预测器。
第一方面,本申请实施例提供一种数据处理方法,包括:接收用户设备发送的超参数信息,超参数信息用于指示用户任务对应的超参数搜索空间;从超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和多个样本的评估指标作为神经预测器的输入,通过神经预测器确定第一超参数组合对应的预测指标,第一超参数组合为多个超参数组合的任一个,以得到多个超参数组合对应的多个预测指标;向用户设备发送K个超参数组合,K为正整数;其中,K个超参数组合对应的K个预测指标为多个预测指标中最高的K个。
作为一种举例,用户任务可以是分子设计任务、材料科学任务、工厂调试任务、芯片设计任务、神经网络结构设计任务、神经网络训练调优任务等等。以神经网络结构设计任 务为例,用户任务需要优化神经网络结构的设计参数,例如卷积层数、卷积核大小、扩张大小等。用户可以根据接收到的超参数组合执行具体的用户任务,例如,做视频分类、文字识别、图像美化、语音识别等任务。
超参数(hyper-parameter)可以理解为***、产品或过程的操作参数。超参数信息可以理解为包含一些超参数的取值范围或取值条件。在神经网络模型中,超参数是在开始学习过程之前由用户设置初始化值的参数,是不能通过神经网络本身的训练过程学习得到的参数。在卷积神经网络中,这些超参数包括:卷积核大小、神经网络层数、激活函数、损失函数、所用的优化器类型、学习率、批大小batch_size、训练的轮数epoch等等。超参数搜索空间包括用户任务所需的一些超参数。每种超参数的值可以是连续分布的值,也可以是离散分布的值。例如:
lr:数值型(0.0005,0.02,0.00025),表示学习率;
wd:数值型(0.02,0.4,0.01),表示权重衰减;
optim:选择型(“AdamW”,“LAMB”),表示优化器类型;
dropout:数值型(0.0,0.3,0.025),表示dropout概率;
drop_conn_rate:数值型(0.0,0.4,0.025),表示drop connection概率;
mixup:数值型(0.0,1.0,0.05),表示mixup的分布参数;
color:数值型(0.0,0.5,0.025),表示color数据增强的强度;
re_prob:数值型(0.0,0.4,0.025),表示random erase的概率。
上述超参数空间的定义仅是一个示例,实际应用中可以定义任意需要优化的超参数。
本申请提供的神经预测器的输入不仅仅包括超参数。还包括训练集中的样本(也可以称为超参样本)以及对应的评估指标。通过超参样本以及超参样本的评估指标来辅助预测超参数搜索空间采样到的超参数组合,由于神经预测器的输入中包括已经具有评估指标的超参样本以及评估指标,即在预测超参数组合时结合了已经具有评估指标的超参样本以及评估指标,可以提高预测的准确度。现有中神经预测器的输入仅包括目标评估样本,没有其他的可参考的样本以及评估指标,需要预先获得很多真实样本的评估指标来训练神经预测器。而本申请中神经预测器的输入包括已经具有评估指标的超参样本以及评估指标,预测目标样本的预测指标时已经参考了超参样本的评估指标,使得预测目标样本的预测指标的准确度有所提高,从而基于预测指标的准确度来调整神经预测器的权重时准确较高,从而较少训练的轮数,进而减少使用的训练样本数量,通过较少的训练样本可以得到泛化性较好的神经预测器。
在一种可能的设计中,接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
上述设计中,通过不断更新训练集,从而再结合更新的训练集来预测超参数组合对应的预测结果,也就是参与辅助预测的样本的评估指标更优,因此可以提高预测的准确度。
在一种可能的设计中,所述神经预测器是通过以下方式训练得到的:从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
现有中神经预测器的输入仅包括目标评估样本,没有其他的可参考的样本以及评估指标,需要预先获得很多真实样本的评估指标来训练神经预测器。而本申请中神经预测器的输入包括已经具有评估指标的超参样本以及评估指标,预测目标样本的预测指标时已经参考了超参样本的评估指标,使得预测目标样本的预测指标的准确度有所提高,从而基于预测指标的准确度来调整神经预测器的权重时准确较高,从而较少训练的轮数,进而减少使用的训练样本数量,通过较少的训练样本可以得到泛化性较好的神经预测器。
一些实施例中,可以每轮通过神经预测器确定多个超参数组合对应的预测指标之前,都可以采用训练集来训练神经预测器。训练集可以是经过更新的,训练得到的神经预测器的泛化性越来越优。
在一种可能的设计中,将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
上述设计中,通过两个锚点特征来参与超参数组合的预测,两个锚点特征用于标定用户任务的最低预测指标的编码特征以及最高预测指标的编码特征,从而防止预测结果偏离预测范围,进一步提高预测的准确度。
在一种可能的设计中,所述神经预测器支持输入的样本数量为T,T为正整数;所述通过所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标,包括:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参数组合进行编码得到目标特征;所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征以及两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括T个样本的权重以及两个锚点特征的权重;所述神经预测器根据T+2个权重对T+2个评估指标进行加权得到所述第一超参数组合的预测指标;其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
上述设计中,通过计算相似度的方式来使得样本来参与超参数组合的预测,从而通过加权评估指标的方式获取超参数组合的预测指标,可以提高超参数组合的预测结果的准确度。现有中神经预测器的输入仅包括目标评估样本,没有其他的可参考的样本以及评估指标,需要预先获得很多真实样本的评估指标来训练神经预测器。而本申请中神经预测器的输入包括已经具有评估指标的超参样本以及评估指标,预测目标样本的预测指标时已经参考了超参样本的评估指标,使得预测目标样本的预测指标的准确度有所提高,从而基于预测指标的准确度来调整神经预测器的权重时准确较高,从而较少训练的轮数,进而减少使用的训练样本数量,可以使用较少的训练样本获得泛化性较优的神经预测器。
在一种可能的设计中,所述两个锚点特征属于所述神经预测器的网络参数。两个锚点特征作为网络参数是可学习的,在训练神经预测器的流程中,支持更新两个锚点特征。
在一种可能的设计中,所述神经预测器支持输入的样本数量为T;将第一超参数组合、 训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定T个样本分别对应的权重;所述神经预测器根据T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
在一种可能的设计中,所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与两个锚点特征的相似度,包括:所述神经预测器对所述目标特征与所述T个辅助特征分别进行内积处理得到所述目标特征与所述T个辅助特征分别对应的相似度,以及对所述目标特征与两个锚点特征分别进行内积处理得到所述目标特征与两个锚点特征分别对应的相似度。
在一种可能的设计中,所述神经预测器支持输入的超参样本的数量为T;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
上述设计中,通过计算相似度的方式来使得样本来参与超参数组合的预测,从而通过加权评估指标的方式获取超参数组合的预测指标,可以提高超参数组合的预测结果的准确度。现有中神经预测器的输入仅包括目标评估样本,没有其他的可参考的样本以及评估指标,需要预先获得很多真实样本的评估指标来训练神经预测器。而本申请中神经预测器的输入包括已经具有评估指标的超参样本以及评估指标,预测目标样本的预测指标时已经参考了超参样本的评估指标,使得预测目标样本的预测指标的准确度有所提高,从而基于预测指标的准确度来调整神经预测器的权重时准确较高,从而较少训练的轮数,进而减少使用的训练样本数量,可以使用较少的训练样本获得泛化性较优的神经预测器。
第二方面,本申请实施例还提供一种数据处理装置,包括:接收单元,用于接收用户设备发送的超参数信息,所述超参数信息用于指示用户任务对应的超参数搜索空间;处理单元,用于从所述超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为神经预测器的输入,通过所述神经预测器确定所述第一超参数组合对应的预测指标,所述第一超参数组合为所述多个超参数组合的任一个,以得到所述多个超参数组合对应的多个预测指标;发送单元,用于向所述用户设备发送K个超参数组合,K为正整数;其中,所述K个超参数组合对应的K个预测指标为所述多个预测指标中最高的K个。
在一种可能的设计中,所述接收单元,还用于接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;处理单元,还用于将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
在一种可能的设计中,所述处理器,还用于通过以下方式训练得到所述神经预测器:从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
在一种可能的设计中,所述处理单元,具体用于:将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
在一种可能的设计中,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参数组合进行编码得到目标特征;所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;所述神经预测器根据所述目标特征与所述T个辅助特征以及两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括T个样本的权重以及两个锚点特征的权重;所述神经预测器根据T+2权重对T+2个评估指标进行加权得到所述第一超参数组合的预测指标;其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
在一种可能的设计中,所述两个锚点特征属于所述神经预测器的网络参数。
在一种可能的设计中,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:通过所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;通过所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;通过所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定T个样本分别对应的权重;通过所述神经预测器根据T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
在一种可能的设计中,所述神经预测器支持输入的超参样本的数量为T,T为正整数;所述处理单元,具体用于:将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;通过所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
第三方面,本申请实施例还提供一种数据处理***,包括用户设备和执行设备。用户设备,用于向执行设备发送超参数信息。超参数信息用于指示用户任务对应的超参数搜索空间。执行设备,用于接收用户设备发送的超参数信息,从超参数搜索空间采样多个超参数组合。执行设备将第一超参数组合、训练集包括的多个样本和多个样本的评估指标作为神经预测器的输入,通过神经预测器确定第一超参数组合对应的预测指标,第一超参数组 合为多个超参数组合的任一个,以得到多个超参数组合对应的多个预测指标。执行设备向用户设备发送K个超参数组合,K为正整数;其中,K个超参数组合对应的K个预测指标为多个预测指标中最高的K个。用户设备还用于接收执行设备发送的K个超参数组合。
一些实施例中,用户设备可以执行对K个超参数组合的评估。
在一种可能的设计中,用户设备将K个超参数组合对应的K个评估指标发送给执行设备。执行设备,还用于接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
第四方面,本申请实施例提供一种数据处理装置,包括:处理器和存储器;该存储器用于存储指令,当该装置运行时,该处理器执行该存储器存储的该指令,以使该装置执行上述第一方面或第一方面的任一设计提供的方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。
第五方面,本申请实施例还提供一种可读存储介质,所述可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述各方面所述的任一方法被执行。
第六方面,本申请实施例还提供一种包含计算机程序或指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的任一方法。
第七方面,本申请提供了一种芯片***,所述芯片与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现任一方面的任一设计所述的方法。
另外,第二方面至第七方面中任一种设计方式所带来的技术效果可参见第一方面至第二方面中不同实现方式所带来的技术效果,此处不再赘述。
本申请在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。
附图说明
图1为本申请应用的一种人工智能主体框架示意图;
图2为本申请实施例提供的一种卷积神经网络结构示意图;
图3为本申请实施例提供的一种***架构300示意图;
图4为本申请实施例提供的一种神经预测器的结构示意图;
图5A为本申请实施例提供的一种数据处理方法流程示意图;
图5B为本申请实施例提供的另一种数据处理方法流程示意图;
图6为本申请实施例提供的一种神经预测器的训练流程示意图;
图7为本申请实施例提供的又一种数据处理方法流程示意图;
图8A为本申请实施例提供的一种神经预测器的处理流程示意图;
图8B为本申请实施例提供的另一种神经预测器的处理流程示意图;
图9A为本申请实施例提供的又一种神经预测器的处理流程示意图;
图9B为本申请实施例提供的又一种神经预测器的处理流程示意图;
图10A为本申请实施例提供的又一种神经预测器的处理流程示意图;
图10B为本申请实施例提供的又一种神经预测器的处理流程示意图;
图11为本申请实施例提供的一种数据处理装置结构示意图;
图12为本申请实施例提供的另一种数据处理装置结构示意图。
具体实施方式
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能***总体工作流程,适用于通用的人工智能领域需求。
下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。
“IT价值链”从人工智能的底层基础设施、信息(提供和处理技术实现)到***的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施:
基础设施为人工智能***提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算***中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有***的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能***中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用***,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能***在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,智慧城市,智能终端等。
本申请涉及的神经预测器(neural predictor)所采用的神经网络作为重要的节点,用于实现机器学习,深度学习,搜索,推理,决策等。本申请提及的神经网络可以包括多种类型,如深度神经网络(deep neural networks,DNN)、卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural networks,RNN)、残差网络、采用transformer模型的神经网络或其他神经网络等。下面对一些神经网络进行示例性介绍。
深度神经网络中的每一层的工作可以用数学表达式来描述:从物理层面深度神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换,即每一层的权重W控制着如何变换空间。
训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
本申请中采用transformer模型的神经网络可以包括若干个编码器。每个编码器可以包括注意力(self attention)层和前馈层(feed forward layer)。注意力层可以采用多头自注意力(Multi-Head self Attention)机制。前馈层可以采用前馈神经网络(feedforward neural network,FNN)。前馈神经网络中各神经元分层排列,每个神经元只与前一层的神经元相连。接收前一层的输出,并输出给下一层,各层之间没有反馈。编码器用于将输入的语料转换为特征向量。多头自注意力层利用三种矩阵间的计算对输入编码器的数据进行计算。三种矩阵包括查询矩阵Q(query)、键矩阵K(key)和值矩阵V(value)。多头自注意力层在编码序列中当前位置的单词时,参考当前位置的单词与序列中其他位置的单词间的多种相互依赖关系。前馈层是一种线性变换层,其用于对每个词的表示进行线性变换。
卷积神经网络(convolutional neural networks,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元对输入其中的数据进行处理。
如图2所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120,其中池化层为可选的,以及神经网络层130。如图2所示卷积层/池化层120可以包括如示例121-126层,在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为卷积核。卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义。以图像处理为例,不同的权重矩阵所提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入的数据中提取信息,从而帮助卷积神经网络100进行正确的预测。
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多 的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,即如图2中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像大小相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图2所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在神经网络层130中的多层隐含层之后,也就是整个卷积神经网络100的最后层为输出层140,该输出层140具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络100的前向传播(如图2由110至140的传播为前向传播)完成,反向传播(如图2由140至110的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络100的损失及卷积神经网络100通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,多个卷积层/池化层并行,将分别提取的特征均输入给神经网络层130进行处理。
本申请实施例中通过神经预测器来进行黑盒优化。黑盒优化可以用于找到***、产品或过程的最佳操作参数,这些***、产品或过程的性能可以作为这些参数的函数进行测量或评估。黑盒优化也可以理解为超参数优化,用于优化超参数。超参数(hyper-parameter)是在开始学习过程之前设置值的参数,是不通过训练得到的参数。超参数可以理解为***、产品或过程的操作参数。比如,神经网络的超参数调优,可以理解为黑盒优化的问题。现在使用的各种神经网络,经由数据,通过某种学习算法训练后,便得到了一个可以用来进行预测,估计的模型,如果这个模型表现的不好,有经验的工作者便会调整网络结构,算法中学习率或是每批处理的样本的个数等不通过训练得到的参数,一般称之为超参数。通常是通过大量的实践经验来调整超参数,使得神经网络的模型表现更为优秀,直到神经网络的输出满足需求。比如,本申请应用到神经网络的超参数调优时,本申请所涉及的超参 数组合可以包括神经网络的全部或者部分超参数的值。在神经网络训练时候,每个神经元的权重会随着损失函数的值来优化从而减小损失函数的值。这样便可以通过算法来优化参数得到模型。而超参数是用来调节整个网络训练过程的,如前述的卷积神经网络的隐藏层的数量,核函数的大小或数量等等。超参数并不直接参与到训练的过程中,而只作为配置变量。
通常,可以采用贝叶斯优化来进行黑盒优化。贝叶斯优化基于高斯模型。根据已知样本对位置的目标函数进行建模得到均值函数以及均值函数的置信度。对于某个点来说,置信度范围越大,表明对该点的建模的不确定性越低,即真实值偏离均值在该点的预测值的概率越大。贝叶斯优化根据均值和置信度来决定下一次尝试对哪个点进行建模。贝叶斯优化方法一般会对目标问题进行连续性假设,比如超参数值越大,则预测结果越大。但是如果目标问题不符合该连续性假设,则贝叶斯优化的建模效果较差,采样效率也会降低。
神经预测器采用神经网络,相比于贝叶斯优化,采用神经网络替代高斯模型,对目标问题建模。但是神经网络的方式,需要较多的训练数据才能训练出有泛化性的神经预测器。在黑盒优化场景中,单次评估的开销较大,因此获得的训练样本数较少,导致训练得到的神经预测器泛化性较差,导致搜索到的超参数并非评估结果最优的超参数。
基于此,本申请实施例提供一种数据处理方法,结合训练样本来辅助目标超参数组合的预测,由于训练样本中超参数组合对应的评估结果都是经过用户验证过的,准确度较高。针对目标超参数组合的预测采用用户验证的训练样本的辅助,相比不使用用户验证的训练样本的辅助,本申请实施例采用的方案对目标超参数组合进行预测的预测结果准确度更高。进一步地,为了得到泛化性较好的神经预测器,相比不使用用户验证的训练样本的辅助的方案,本申请实施例采用的方案采用较少的训练样本得到泛化性较好的神经预测器。
本申请实施例可以用于多种复杂***的超参数优化。本申请实施例可以应用的场景可以包括分子设计、材料科学、工厂调试、芯片设计、神经网络结构设计、神经网络训练调优等。比如,用于优化物理产品或生产物理产品的过程(诸如,例如合金、超材料、混凝土混合物、浇注混凝土的过程、药物混合物或执行治疗的过程)的可调参数(例如,组分或成分类型或量、生产顺序、生产定时)。再比如,用于优化神经网络结构的设计参数,例如卷积层数、卷积核大小、扩张大小、修正线性单元(rectified linear unit,ReLU)的位置等参数。
本申请实施例提供的数据处理方法可以由执行设备执行。执行设备可以由一个或者多个计算设备实现。示例性地,参见图3所示,为本申请实施例提供的一种***架构300。***架构300中包括执行设备210。执行设备210可以由一个或者多个计算设备实现。执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。***架构300还包括数据存储***250。可选地,执行设备210与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备。执行设备210可以使用数据存储***250中的数据,或者调用数据存储***250中的程序代码实现本申请提供的数据处理方法。一个或者多个计算设备可以部署于云网络中。一种示例中,本申请实施例提供的数据处理方法以服务的形式部署云网络的一个或者多个计算设备中,用户设备通过网络访问云服务。另一种示例中,本申请实施例提供的数据处理方法可以以软件工具形式部署于在本地的一个或者多个计算设备上。
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能 手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在另一种实现中,执行设备210的一个方面或多个方面可以由每个本地设备实现,例如,本地设备301可以为执行设备210提供本地数据或反馈评估结果。
需要注意的,执行设备210的所有功能也可以由本地设备实现。例如,本地设备301实现执行设备210的功能并为自己的用户提供服务,或者为本地设备302的用户提供服务。
下面结合附图对本申请实施例提供的数据处理方法进行详细说明。
参见图4所示,为本申请实施例提供的一种神经预测器的结构示意图。神经预测器的输入为训练集中的多个样本(也可以称为超参样本或者辅助超参样本)、多个样本分别对应的评估指标以及需要预测的目标超参数组合。神经预测器的输出为需要预测的目标超参数组合的预测指标。多个辅助超参样本用于辅助所述神经预测器对需要预测的目标超参数组合的预测指标的预测。
下面结合神经预测器的结构,对本申请实施例提供的数据处理方法进行详细说明。
参阅图5A所示,为本申请提供的一种数据处理方法的流程示意图。该方法可以由执行设备来执行,比如图3中的执行设备210。
201,获取多个超参数组合。
超参数搜索空间中包括用户任务所需的超参数。可以从超参数搜索空间采样超参数得到多个超参数的值,作为超参数组合。应理解的是,一个超参数组合可以包括一个或者多个超参数的值。
示例性地,获取多个超参数组合可以从用户设备接收超参数信息。超参数信息用于指示用户任务对应的超参数搜索空间。从而可以从超参数搜索空间中采样来得到多个超参数组合。
一些实施例中,用户设备可以通过调用服务的方式将超参数信息发送给执行设备210。
具体地,超参数搜索空间中可以包括用户任务所需的多种超参数,每种超参数的值可以是连续分布的值,也可以是离散分布的值。例如,超参数搜索空间中可以包括超参数A的取值范围为[1,20],超参数B的取值可以包括:2、3、6、7等。因此,在超参数搜索空间中采样时,可以从连续分布的值中任意取一个值,或者在离散分布的值中任意取一个值,得到一组超参数组合。可选地,对超参数搜索空间进行采样的方式也可以有多种,比如可以是随机采样的方式,还可以采用概率分布采样的方式。如下步骤以针对一个超参数组合进行预测为例。比如以第一超参数组合的预测为例,第一超参数组合为多个超参数组合中的任一个,则每个超参数组合的预测均可以对第一超参数组合进行预测的方式。
202,将第一超参数组合、训练集中的多个超参样本以及多个超参样本的评估指标作为神经预测器的输入,通过神经预测器确定第一超参数组合对应的预测指标。为了便于区分,将多个超参样本称为辅助超参样本。其中,多个辅助超参样本用于辅助所述神经预测器对所述第一超参数组合的预测指标的预测。用于辅助神经预测器对第一超参数组合进行预测的辅助超参样本也可以称为“支撑样本”,也可以采用其它的名称,本申请实施例对此不作具体限定。
针对多个超参数组合来说,可以通过多次迭代评估,进而得到多个超参数组合分别对 应的预测指标。一个辅助超参样本可以理解为一个超参数组合。为了便于区分将训练集辅助超参样本对应的超参数组合成为辅助超参数组合。该辅助超参数组合也是从超参数搜索空间采样得到的。多个超参数组合与多个辅助超参数组合均不相同。辅助超参数组合对应的评估指标可以是通过用户任务对辅助超参数组合进行评估得到的。训练集中包括的各个超参样本对应的评估指标也可以是通过其它方式评估得到的。
需要说明的是,本申请实施例中为了便于描述,将通过神经预测器预测得到的超参数组合的指标结果称为预测指标,将用户任务评估得到的超参数组合的指标结果称为评估指标。
一些实施例中,参见图5B所示,在执行步骤202获得多个超参数组合分别对应的预测指标后,执行设备可以执行步骤203和步骤204。
203,执行设备可以从多个超参数组合中确定K个超参数组合。具体的,从多个超参数组合中获取预测指标最优(或者最高)的K个超参数组合,可以理解为,K个超参数组合分别对应的预测指标均高于所述多个超参数组合中除所述K个超参数组合以外的任意预测超参数组合对应的预测指标,K为正整数。
204,执行设备可以向用户设备发送K个超参数组合。
在一种可能的实现方式中,本申请实施例还可以针对训练集进行更新。可以基于针对多个超参数组合进行多次迭代评估的结果来确定将哪些超参数组合更新到训练集中。
执行设备向用户设备发送给K个超参数组合后,用户设备可以触发用户任务。用户可以根据接收到的超参数组合执行具体的用户任务,例如,做视频分类、文字识别、图像美化、语音识别等任务。用户任务可以分别针对该K个超参数组合分别进行评估,将K个超参数组合的评估指标再发送给执行设备。执行设备将K个超参数组合以及分别对应的评估指标作为辅助超参样本加入所述训练集。
应理解的是,本实施例中以执行设备的功能由部署于云网络的一个或者多个计算设备实现为例。即上述发送K个超参数组合和接收评估结果的动作在云网络的一个或者多个计算设备与用户设备之间。一些场景中,数据处理方法部署于本地的一个或者多个计算设备上时,上述发送K个超参数组合和接收评估结果的动作可以是计算设备的不同组件之间,或者不同的计算设备之间,也可以是运行在计算设备软件程序执行时从本计算设备的存储空间中获取K个超参数组合的评估结果。例如,本地的计算设备中用于执行迭代采样流程的组件向用于评估超参数组合的组件发送K个超参数组合。然后用于评估超参数组合的组件执行评估操作,并将评估得到的评估指标发送给用于执行迭代采样流程的组件。
在本申请实施例中,为了便于描述,将步骤201-步骤204称为迭代采样流程。执行设备可以执行多轮包括步骤201-步骤204的迭代采样流程。
执行设备在将K个超参数组合以及分别对应的评估指标作为辅助超参样本加入所述训练集后,更新后的训练集可以用于进行下一轮的迭代采样流程。
在执行迭代采样流程时,在满足迭代采样停止条件时,停止迭代采样流程。示例性地,迭代采样停止条件可以包括如下至少一项:
(1)迭代采样轮数达到N,N为大于1的整数。在迭代采样执行N轮后,停止迭代采样流程。
(2)在连续M轮迭代采样过程中训练集中的最优评估指标未发生变化。在每次更新训练集时,可以记录训练集包括的最优评估指标。比如,经过第i轮迭代采样后,训练集 包括的最优评估指标为A,第i+1轮迭代采样后,训练集包括的最优评估指标依然为A,依次类推,第i+M-1轮迭代采样后,训练集包括的最优评估指标依然为A。从第i轮迭代采样开始到第i+M-1轮迭代采样,训练集包括的最优评估指标并未发生变化。因此不再执行下一轮的迭代采样流程。可选地,当迭代采样的轮数达到设定的最大采样轮数后,未满足条件(2),不再执行下一轮的迭代采样流程。
(3)所述训练集中的最优评估指标达到设定指标。具体的,经过多轮迭代采样流程后,训练集中的最优评估指标达到了设定指标,从而不再执行下一轮的迭代采样流程。可选地,当迭代采样的轮数达到设定的最大采样轮数后,训练集中的最优评估指标还未达到设定指标,也不再执行下一轮的迭代采样流程。
本申请实施例中的神经预测器是采用训练集进行训练得到的。参见图6所示,在每次迭代训练流程中,从训练集中选择一个样本作为目标样本。目标样本可以理解为训练集中的一个目标超参数组合。训练集中还包括目标超参数组合对应的评估指标。为了便于区分将目标超参数组合以及目标超参数组合对应的评估指标称为目标样本组合{目标超参数组合,评估指标}。神经预测器的输入除了包括目标超参数组合以外,还需要从训练集中选择多个超参样本作为辅助超参样本。训练集中还包括辅助超参样本对应的评估指标。为了便于描述将辅助超参样本(即辅助超参数组合)以及对应的评估指标称为辅助样本组合{辅助超参数组合,评估指标}。需要说明的是,从训练集中选择的多个辅助超参样本与目标样本不同。图6中,以输入的辅助超参样本数量为T为例。将T个辅助样本组合和目标超参数组合输入到神经预测器,神经预测器输出为目标超参数组合的预测指标。进而将目标超参数组合的预测指标与目标超参数组合对应评估指标进行比较,计算损失值,然后基于损失值来更新神经预测器的权重。
计算损失值时可以采用损失函数。损失函数也就是权重的优化过程中的目标函数。通常,损失函数的值越小,表示神经预测器输出的结果越准确。神经预测器的训练过程可以理解为最小化损失函数的过程。常用的损失函数可以包括对数损失函数、平方损失函数、指数损失函数等。
在基于损失值来更新神经预测器的权重时,可以采用梯度下降算法、随机梯度下降算法或者动量梯度下降算法((adaptive moment estimation,Adam)等优化算法来进行权重的优化。
在本申请实施例中,神经预测器的训练可以提前进行的,也可以在每轮的迭代采样流程中执行。比如在执行迭代评估流程之前,先通过多次迭代训练来训练神经预测器,或者说每次在训练集更新后,执行训练神经预测器,然后再基于本轮采样流程中训练完成的神经预测器来对多个超参数组合进行预测。也就是说,迭代评估流程与迭代训练流程交叉进行。
参见图7所示,为本申请实施例提供的另一种数据处理方法流程示意图。图7中以在每轮迭代采样流程中迭代评估流程与迭代训练流程交叉进行为例。以每轮采样流程中,在训练集中更新K个超参样本为例。最大采样轮数为N。
401,初始化神经预测器,执行402。可以理解的是,初始训练集为空。
402,从超参数搜索空间中采样K个超参数组合,执行403。
403,在第i轮迭代采样流程中,向用户设备发送K个超参数组合。
404,接收用户设备发送的针对K个超参数组合进行评估得到的评估指标。用户设备 可以触发用户任务执行评估获得评估指标。用户任务的评估过程以及评估结果,本申请实施例不作具体限定。用户任务评估可以是人工评估,也可以通过用户设备评估。
应理解的是,本实施例中以数据处理方法部署于云网络的一个或者多个计算设备上为例的。即上述发送K个超参数组合和接收评估结果的动作在云网络的一个或者多个计算设备与用户设备之间。一些场景中,数据处理方法部署于本地的一个或者多个计算设备上时,上述发送K个超参数组合和接收评估结果的动作可以计算设备的不同组件之间,或者不同的计算设备之间,也可以是运行在计算设备软件程序执行时从本计算设备的存储空间中获取K个超参数组合的评估指标。例如,步骤403中可以替换为:本地的计算设备中用于执行迭代采样流程的组件向用于评估超参数组合的组件发送K个超参数组合,然后步骤404可以替换为:用于评估超参数组合的组件执行评估操作,并将评估得到的评估指标发送给用于执行迭代采样流程的组件。
405,将K个超参数组合以及K个超参数组合对应的评估指标作为超参样本更新到训练集中。
406,根据训练集对神经预测器进行多次迭代训练获得第i轮采样流程得到的神经预测器。迭代训练的轮数,本申请实施例不作具体限定。训练流程可以参见图6对应的实施例的描述,此处不再赘述。
407,从超参数搜索空间中采样L个超参数组合。可以理解该L个超参数组合与之前采样的超参数组合均不同,即该L个超参数组合时。L大于K。L可以是K的倍数。比如,K=16,L=1000。再比如,K=20,L=1500。本申请实施例对此不作具体限定,可以根据需求来设置。
408,基于训练集通过第i轮采样流程得到的神经预测器对L个超参数组合分别进行预测得到L个超参数组合分别对应的预测指标。具体的,针对每个超参数组合进行预测时,从训练集中选择T个超参样本作为辅助超参样本,输入到神经预测器输出该超参数组合的预测指标。经过L轮迭代评估后,得到L个超参数组合分别对应的预测指标。
409,执行i=i+1,判断i>N?(i值是否大于N)。若是,结束采样迭代流程,否则,执行410。此处以迭代采样停止条件为迭代采样的轮数达到最大采样轮数为例。
410,从L个超参数组合中选择预测指标最优的K个超参数组合,继续执行403。
需要说明的是,上述在每轮迭代采样流程中执行训练神经预测器以及L个超参数组合的预测。在一些实施例中,训练神经预测器的轮数可以小于迭代采样的轮数。比如N轮迭代采样流程中的前a轮迭代采样流程中执行训练神经预测器和L个超参数组合的预测。之后的N-a轮迭代采样流程中不再执行训练神经预测器,仅执行L个超参数组合的预测。
下面通过举例的方式来对本申请实施例的神经预测器的处理方式进行描述。
参见图8A所示,为本申请实施例提供的一种神经预测器的处理流程示意图。从训练集中选择多个超参样本作为辅助超参样本。训练集中还包括辅助超参样本对应的评估指标。神经预测器的输入包括T个辅助样本以及T个辅助样本对应的评估指标以及目标超参数组合。图8A将T个辅助超参样本分别称为辅助超参样本1~N,辅助超参样本1~N分别对应的评估指标称为评估指标1~N;即{辅助超参样本1,评估指标1},……,{辅助超参样本T,评估指标T}。神经预测器对T个辅助超参样本以及目标超参数组合进行联合编码得到T+1的特征。为了便于区分,将T个辅助超参样本对应的编码后的特征称为辅助特征,将 目标超参数组合对应编码后的特征称为目标特征。
示例性地,参见图8B所示,神经预测器包括至少一个编码层(encoder layer)。图8B中以两层编码层为例。例如,编码层可以采用transformer结构中的编码模块。编码层由注意力层(Attention Layer)和前馈层(Feed-forward layer)组合而成。其中注意力层用于将由T个辅助超参样本和目标超参数组合进行组合得到的T+1个超参数组合(包括辅助超参样本对应的超参数组合以及目标超参数组合)两两进行相似度计算得到相似度矩阵,再根据相似度矩阵对T+1个超参数组合进行加权,得到T+1个特征,T+1个特征被送入前馈层中进行特征变换,最终该编码层输出T+1个编码后的特征。通过至少一个编码层来对T+1个超参数组合进行融合编码。
进一步地,神经预测器分别确定目标特征与T个辅助特征的相似度,然后根据所述目标特征与所述T个辅助特征分别对应的相似度确定T个辅助超参样本分别对应的权重。所述神经预测器根据T个辅助超参样本分别对应的权重对所述T个辅助超参样本包括的评估指标进行加权得到目标超参数组合的预测指标。
示例性地,参见图8B所示,神经预测器对目标特征与T个辅助特征分别进行内积处理得到所述目标特征与T个辅助特征分别对应的相似度。然后,神经预测器通过softmax函数来将所述目标特征与所述T个辅助特征分别对应的相似度转换为T个辅助超参样本分别对应的权重。
图8A-图8B中所示的神经预测器的处理流程既适用于训练流程也适用于评估流程。若在训练流程中,则目标超参数组合也来自于训练集。进而神经预测器输出的一个目标超参数组合对应的预测指标与所述目标超参数组合对应在训练集中的评估指标的比较结果用于调整神经预测器的权重。
参见图9A所示,为本申请实施例提供的另一种神经预测器的处理流程示意图。神经预测器的输入包括T个辅助超参样本、T个辅助超参样本对应的评估指标以及目标超参数组合。图9A将T个辅助超参样本分别称为辅助超参样本1~N,辅助超参样本1~N分别对应的评估指标称为评估指标1~N;即{辅助超参样本1,评估指标1},……,{辅助超参样本T,评估指标T}。
进一步地,神经预测器基于目标超参数组合、辅助超参样本1~N、辅助超参样本1~N分别对应的评估指标1~N以及两个锚点特征确定目标超参数组合对应的预测指标。两个锚点特征用于标定目标任务的最低预测指标的编码特征以及最高预测指标的编码特征。
一些实施例中,在确定目标超参数组合对应的预测指标时,神经预测器可以对T个辅助超参样本以及目标超参数组合进行联合编码得到T+1的特征。为了便于区分,将T个辅助超参样本对应的编码后的特征称为辅助特征,将目标超参数组合对应编码后的特征称为目标特征。示例性地,参见图9B所示,神经预测器包括至少一个编码层(encoder layer),神经预测器通过编码层对T+1个超参数组合(由T个辅助超参样本和目标超参数组合进行组合得到)进行联合编码,具体方式可以参见图8B所示,此处不再赘述。
神经预测器分别确定目标特征与所述T个辅助特征的相似度以及所述目标特征与两个锚点特征的相似度。图9A以最低预测指标的编码特征的锚点特征称为锚点特征1,将最高预测指标的编码特征的锚点特征称为锚点特征2。参见图9B所示,在确定相似度时可以采用内积的方式。然后,神经预测器根据目标特征与T个辅助特征以及两个锚点特征分别对应的相似度得到确定T个辅助超参样本以及两个锚点特征分别对应的权重。神经预测器根 据T个辅助超参样本以及两个锚点分别对应的权重对T个辅助超参样本包括的评估指标以及两个锚点对应的预测指标进行加权得到所述神经预测器输出的目标超参数组合的预测指标。示例性地,参见图9B所示,锚点特征1对应的预测指标可以配置为0,锚点特征2对应的预测指标可以配置为1。作为一种举例,在将相似度转换为权重时,可以采用Softmax函数。
一些实施例中,两个锚点特征是可学习的,该两个锚点特征可以理解为神经预测器的可学习的参数。经过初始配置后,在训练神经预测器时,每次更新神经预测器的权重时,可以同时更新该两个锚点特征。图9A-图9B中所示的神经预测器的处理流程既适用于训练流程也适用于评估流程。若在训练流程中,则目标超参数组合也来自于训练集。进而神经预测器输出的一个目标超参数组合对应的预测指标与所述目标超参数组合对应在超参样本中的评估指标的比较结果用于调整神经预测器的权重以及所述两个锚点特征。
参见图10A所示,为本申请实施例提供的又一种神经预测器的处理流程示意图。神经预测器的输入包括辅助超参样本、T个辅助超参样本对应的评估指标以及目标超参数组合、目标超参数组合对应的目标预测指标掩码。图10A将T个辅助超参样本分别称为辅助超参样本1~N,辅助超参样本1~N分别对应的评估指标称为评估指标1~N;即{辅助超参样本1,评估指标1},……,{辅助超参样本T,评估指标T}。
示例性地,每个辅助超参样本和对应的评估指标在输入时,可以将该辅助超参样本和对应的评估指标连接后得到连接参数信息,然后将连接参数信息输入到神经预测器中。以{辅助超参样本1,评估指标1}为例,将辅助超参样本1和评估指标1连接后得到连接参数信息1。针对目标超参数组合和目标预测指标掩码连接得到目标连接参数信息。所述目标预测指标掩码用于表征所述目标超参数组合对应的未知预测指标。一些实施例中,该目标预测指标掩码是可学习的。经过初始配置后,在训练神经预测器时,每次更新神经预测器的权重时,可以通过更新该目标预测指标掩码。神经预测器对输入的T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度。进一步地,神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定目标超参数组合对应的预测指标。
示例性地,参见图10B所示,神经预测器包括多个编码层,图10B中以两层编码层为例。神经预测器还包括FC/sigmoid层。编码层可以是Transformer结构中标准的编码层,由注意力层(Attention Layer)和前馈层(Feed-forward layer)组合而成。注意力层用于将输入的T+1个连接参数信息两两进行相似度计算得到相似度矩阵,再根据相似度矩阵对T+1个连接信息进行加权,得到T+1个特征。T+1特征被送入前馈层,由前馈层对T+1特征进行特征变换。通过多个编码层,可以对T+1个连接参数信息进行融合,综合对目标预测指标进行预测。进一步,编码层输出的T+1特征中目标连接参数信息对应的特征输入到FC/sigmoid层。神经预测器通过FC/sigmoid层对目标连接参数信息对应的特征进行降维,得到1维的特征。该特征经过Sigmoid函数归一化到0到1之间,即为预测得到的目标超参数组合对应的目标预测指标。
一些实施例中,目标预测指标掩码是可学习的,目标预测指标掩码可以理解为神经预测器的可学习的参数。经过初始配置后,在训练神经预测器时,每次更新神经预测器的权重时,可以同时更新该目标预测指标掩码。图10A-图10B中所示的神经预测器的处理流程既适用于训练流程也适用于评估流程。若在训练流程中,则目标超参数组合也来自于训练 集。进而神经预测器输出的一个目标超参数组合对应的预测指标与所述目标超参数组合对应在超参样本中的评估指标的比较结果用于调整神经预测器的权重以及目标预测指标掩码。
如下结合具体场景对本申请实施例提供的方案以及效果进行说明。以卷积神经网络(CNN)模型的超参数优化为例。以采用imagenet数据集进行交叉验证为例。当然也可以采用其它的数据集,本申请实施例对此不作具体限定。
定义超参数搜索空间如下,其中,数值型的三个值分别表示超参数的最小值,最大值和步长。
lr:数值型(0.0005,0.02,0.00025),表示学习率;
wd:数值型(0.02,0.4,0.01),表示权重衰减;
optim:选择型(“AdamW”,“LAMB”),表示优化器类型;
dropout:数值型(0.0,0.3,0.025),表示dropout概率;
drop_conn_rate:数值型(0.0,0.4,0.025),表示drop connection概率;
mixup:数值型(0.0,1.0,0.05),表示mixup的分布参数;
color:数值型(0.0,0.5,0.025),表示color数据增强的强度;
re_prob:数值型(0.0,0.4,0.025),表示random erase的概率。
上述超参数空间的定义仅是一个示例,实际应用中可以定义任意需要优化的超参数。
需要说明的是,优化器是指用于优化机器学习算法的参数,比如网络权重。可以采用梯度下降、随机梯度下降或者动量梯度下降算法(adaptive moment estimation,Adam)等优化算法来进行参数优化。学习率是指在优化算法中每次迭代更新参数的幅度,也叫做步长。当步长过大会导致算法不收敛,模型的目标函数处于震荡的状态,而步长过小会导致模型的收敛速度过慢。
以采用图4所示的数据处理方法为例。K=16,N=10。
A1,初始化神经预测器,执行A2。可以理解的是,初始训练集为空。
A2,从超参数搜索空间中采样16个超参数组合,执行A3。
A3,在第i轮迭代采样流程中,获取用户任务针对16个超参数组合进行评估得到的评估指标。用户任务的评估过程以及评估结果,本申请实施例不作具体限定。用户任务评估可以是人工评估,也可以通过用户设备评估。
A4,将16个超参数组合以及16个超参数组合对应的评估指标作为超参样本更新到训练集中。
A5,根据训练集对神经预测器进行多次迭代训练获得第i轮采样流程得到的神经预测器。迭代训练的轮数,本申请实施例不作具体限定。训练流程可以参见图3对应的实施例的描述,此处不再赘述。
A6,从超参数搜索空间中采样1000个超参数组合。可以理解该1000个超参数组合与之前采样的超参数组合均不同。
A7,基于训练集通过第i轮采样流程得到的神经预测器对1000个超参数组合分别进行预测得到1000个超参数组合分别对应的预测指标。
A8,执行i=i+1,若i>=10,结束采样迭代流程,否则,执行A9。此处以迭代采样停止条件为迭代采样的轮数达到最大采样轮数为例。
A9,从1000个超参数组合中选择预测指标最优的16个超参数组合,继续执行A3。
通过上述方式验证采用本申请实施例提供的方案在达到贝叶斯优化水平时,本申请实施例采样数量更低于贝叶斯优化的采样数量(即人工确认的预测指标的数量)。采用本申请实施例提供的方案,结合训练样本来辅助目标超参数组合的预测,由于训练样本中超参数组合对应的评估结果都是经过用户验证过的,准确度较高。针对目标超参数组合的预测采用用户验证的训练样本的辅助,相比现有中采用输入仅包括目标超参数组合的普通预测器来说,本申请实施例采用的方案对目标超参数组合进行预测的预测结果的准确度更高。现有中神经预测器的输入仅包括目标评估样本,没有其他的可参考的样本以及评估指标,需要预先获得很多真实样本的评估指标来训练神经预测器。而本申请中神经预测器的输入包括已经具有评估指标的超参样本以及评估指标,预测目标样本的预测指标时已经参考了超参样本的评估指标,使得预测目标样本的预测指标的准确度有所提高,从而基于预测指标的准确度来调整神经预测器的权重时准确较高,从而较少训练的轮数,进而减少使用的训练样本数量。本申请实施例采用的方案采用较少的训练样本得到泛化性较好的神经预测器。
基于与方法实施例同样的发明构思,本申请实施例还提供了一种数据处理装置,该装置具体可以是执行设备中的处理器,或者芯片或者芯片***,或者是执行设备中一个模块。示意性的,参见图11所示,该装置可以包括接收单元1101、处理单元1102和发送单元1103。接收单元1101、处理单元1102和发送单元1103用于执行图5A、图7对应的实施例所示的方法步骤。
接收单元1101,用于接收用户设备发送的超参数信息,超参数信息用于指示用户任务对应的超参数搜索空间。
处理单元1102,用于从超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和多个样本的评估指标作为神经预测器的输入,通过神经预测器确定第一超参数组合对应的预测指标,第一超参数组合为多个超参数组合的任一个,以得到多个超参数组合对应的多个预测指标。
发送单元1103,用于向用户设备发送K个超参数组合,K为正整数;其中,K个超参数组合对应的K个预测指标为多个预测指标中最高的K个。
在一种可能的实现方式中,接收单元1101,还用于接收用户设备发送的K个超参数组合对应的K个评估指标。处理单元1102,还用于将K个超参数组合作为K个样本,并将K个样本以及对应的K个评估指标加入训练集。
在一种可能的实现方式中,处理单元1102,还用于通过以下方式训练得到神经预测器:从训练集中选择多个样本、多个样本对应的评估指标,以及从训练集中选择一个目标样本。将多个样本、多个样本对应的评估指标以及目标样本作为神经预测器的输入,通过神经预测器确定目标样本对应的预测指标。根据目标样本的预测指标与目标样本对应的评估指标的比较结果,调整神经预测器的网络参数。
在一种可能的实现方式中,处理单元1102,具体用于:将第一超参数组合、训练集包括的多个样本和多个样本的评估指标输入神经预测器。神经预测器基于第一超参数组合、多个样本、多个样本的评估指标以及两个锚点特征确定第一超参数组合对应的预测指标。其中,两个锚点特征用于标定用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
在一种可能的实现方式中,神经预测器支持输入的样本数量为T,T为正整数;处理 单元1102,具体用于:神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对第一超参数组合进行编码得到目标特征;神经预测器确定目标特征与T个辅助特征的相似度以及目标特征与两个锚点特征的相似度;神经预测器根据目标特征与T个辅助特征以及两个锚点特征分别对应的相似度确定T+2个权重,T+2个权重包括T个样本的权重以及两个锚点特征的权重;神经预测器根据T+2个权重对T+2个评估指标进行加权得到第一超参数组合的预测指标;其中,T+2个评估指标包括T个样本的评估指标以及两个锚点特征对应的评估指标。
在一种可能的实现方式中,两个锚点特征属于神经预测器的网络参数。
在一种可能的实现方式中,神经预测器支持输入的样本数量为T;处理单元1102,具体用于:通过神经预测器对输入的T个样本进行编码得到T个辅助特征,对第一超参数组合进行编码得到目标特征。通过神经预测器分别确定目标特征与T个辅助特征的相似度。通过神经预测器根据目标特征与T个辅助特征分别对应的相似度确定T个样本分别对应的权重。通过神经预测器根据T个样本分别对应的权重对T个样本对应的评估指标进行加权得到第一超参数组合的预测指标。
在一种可能的实现方式中,神经预测器支持输入的超参样本的数量为T;处理单元1102,具体用于:将T+1个连接参数信息输入神经预测器;T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的连接参数信息,以及第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,目标预测指标掩码用于表征第一超参数组合对应的未知预测指标。通过神经预测器对输入的T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度。通过神经预测器根据T+1个连接参数信息中每两个连接参数信息之间的相似度确定第一超参数组合的预测指标。
本申请实施例还提供该装置另外一种结构,如图12所示,装置1200中可以包括通信接口1210、处理器1220。可选的,装置1200中还可以包括存储器1230。其中,存储器1230可以设置于装置内部,还可以设置于装置外部。一种示例中,上述图11中所示的接收单元1101、处理单元1102和发送单元1103均可以由处理器1220实现。另一种示例中,接收单元1101和发送单元1103的功能由通信接口1210来实现。处理单元1102的功能由处理器1220实现。处理器1220通过通信接口1210接收超参数信息,以及发送超参数组合,并用于实现图5A、图7中所述的方法。在实现过程中,处理流程的各步骤可以通过处理器1220中的硬件的集成逻辑电路或者软件形式的指令完成图5A、图7中所述的方法。
本申请实施例中通信接口1210可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。其中,示例性地,该其它装置可以是与装置1200相连的设备。
本申请实施例中处理器1220可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件单元组合执行完成。处理器1220用于实现上述方法所执行的程序代码可以存储在存储器1230中。存储器1230和处理器1220耦合。
本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
处理器1220可能和存储器1230协同操作。存储器1230可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器1230是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例中不限定上述通信接口1210、处理器1220以及存储器1230之间的具体连接介质。本申请实施例在图12中以存储器1230、处理器1220以及通信接口1210之间通过总线连接,总线在图12中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现上述任意一个或多个实施例提供的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
基于以上实施例,本申请实施例还提供了一种芯片,该芯片包括处理器,用于实现上述任意一个或多个实施例所涉及的功能,例如获取或处理上述方法中所涉及的信息或者消息。可选地,所述芯片还包括存储器,所述存储器,用于处理器所执行必要的程序指令和数据。该芯片,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的通信***的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请的一个实施例提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行图4对应的方法实施例中的方法步骤的指令。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的通信***的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图 和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (20)

  1. 一种数据处理方法,其特征在于,包括:
    接收用户设备发送的超参数信息,所述超参数信息用于指示用户任务对应的超参数搜索空间;
    从所述超参数搜索空间采样多个超参数组合;
    将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为神经预测器的输入,通过所述神经预测器确定所述第一超参数组合对应的预测指标,所述第一超参数组合为所述多个超参数组合的任一个,以得到所述多个超参数组合对应的多个预测指标;
    向所述用户设备发送K个超参数组合,K为正整数;
    其中,所述K个超参数组合对应的K个预测指标为所述多个预测指标中最高的K个。
  2. 如权利要求1所述的方法,其特征在于,还包括:
    接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;
    将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
  3. 如权利要求1或2所述的方法,其特征在于,所述神经预测器是通过以下方式训练得到的:
    从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;
    将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;
    根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
  4. 如权利要求1-3任一项所述的方法,其特征在于,将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:
    将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;
    所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;
    其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
  5. 如权利要求4所述的方法,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述通过所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标,包括:
    所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参 数组合进行编码得到目标特征;
    所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;
    所述神经预测器根据所述目标特征与所述T个辅助特征以及所述两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括所述T个样本的权重以及所述两个锚点特征的权重;
    所述神经预测器根据所述T+2个权重对T+2个评估指标进行加权以得到所述第一超参数组合的预测指标;
    其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
  6. 如权利要求4或5所述的方法,其特征在于,所述两个锚点特征属于所述神经预测器的网络参数。
  7. 如权利要求1-3任一项所述的方法,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:
    所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;
    所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;
    所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定所述T个样本分别对应的权重;
    所述神经预测器根据所述T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
  8. 如权利要求1-4任一项所述的方法,其特征在于,所述神经预测器支持输入的超参样本的数量为T,T为正整数;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为所述神经预测器的输入,通过神经预测器确定所述第一超参数组合对应的预测指标,包括:
    将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;
    通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;
    所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
  9. 一种数据处理装置,其特征在于,包括:
    接收单元,用于接收用户设备发送的超参数信息,所述超参数信息用于指示用户任务对应的超参数搜索空间;
    处理单元,用于从所述超参数搜索空间采样多个超参数组合;将第一超参数组合、训练集包括的多个样本和所述多个样本的评估指标作为神经预测器的输入,通过所述神经预测器确定所述第一超参数组合对应的预测指标,所述第一超参数组合为所述多个超参数组合的任一个,以得到所述多个超参数组合对应的多个预测指标;
    发送单元,用于向所述用户设备发送K个超参数组合,K为正整数;
    其中,所述K个超参数组合对应的K个预测指标为所述多个预测指标中最高的K个。
  10. 如权利要求9所述的装置,其特征在于,所述接收单元,还用于接收所述用户设备发送的所述K个超参数组合对应的K个评估指标;
    所述处理单元,还用于将所述K个超参数组合作为K个样本,并将所述K个样本以及对应的所述K个评估指标加入所述训练集。
  11. 如权利要求9或10所述的装置,其特征在于,所述处理单元,还用于通过以下方式训练得到所述神经预测器:
    从所述训练集中选择多个样本、所述多个样本对应的评估指标,以及从所述训练集中选择一个目标样本;
    将所述多个样本、所述多个样本对应的评估指标以及所述目标样本作为所述神经预测器的输入,通过所述神经预测器确定所述目标样本对应的预测指标;
    根据所述目标样本的预测指标与所述目标样本对应的评估指标的比较结果,调整所述神经预测器的网络参数。
  12. 如权利要求9-11任一项所述的装置,其特征在于,所述处理单元,具体用于:
    将所述第一超参数组合、所述训练集包括的多个样本和所述多个样本的评估指标输入所述神经预测器;
    所述神经预测器基于所述第一超参数组合、所述多个样本、所述多个样本的评估指标以及两个锚点特征确定所述第一超参数组合对应的预测指标;
    其中,所述两个锚点特征用于标定所述用户任务的最低预测指标的编码特征以及最高预测指标的编码特征。
  13. 如权利要求12所述的装置,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:所述神经预测器对输入的T个样本进行编码得到T个辅助特征,以及对所述第一超参数组合进行编码得到目标特征;
    所述神经预测器确定所述目标特征与所述T个辅助特征的相似度以及所述目标特征与所述两个锚点特征的相似度;
    所述神经预测器根据所述目标特征与所述T个辅助特征以及两个锚点特征分别对应的相似度确定T+2个权重,所述T+2个权重包括所述T个样本的权重以及所述两个锚点特征的权重;
    所述神经预测器根据所述T+2个权重对T+2个评估指标进行加权得到所述第一超参数组合的预测指标;
    其中,所述T+2个评估指标包括所述T个样本的评估指标以及所述两个锚点特征对应的评估指标。
  14. 如权利要求12或13所述的装置,其特征在于,所述两个锚点特征属于所述神经预测器的网络参数。
  15. 如权利要求9-11任一项所述的装置,其特征在于,所述神经预测器支持输入的样本数量为T,T为正整数;所述处理单元,具体用于:
    通过所述神经预测器对输入的T个样本进行编码得到T个辅助特征,对所述第一超参数组合进行编码得到目标特征;
    通过所述神经预测器分别确定所述目标特征与所述T个辅助特征的相似度;
    通过所述神经预测器根据所述目标特征与所述T个辅助特征分别对应的相似度确定T个样本分别对应的权重;
    通过所述神经预测器根据T个样本分别对应的权重对所述T个样本对应的评估指标进行加权得到所述第一超参数组合的预测指标。
  16. 如权利要求9-11任一项所述的装置,其特征在于,所述神经预测器支持输入的超参样本的数量为T,T为正整数;所述处理单元,具体用于:
    将T+1个连接参数信息输入所述神经预测器;所述T+1个连接参数信息包括T个样本中每个样本和对应的评估指标连接后得到的T个连接参数信息,以及所述第一超参数组合和目标预测指标掩码连接后得到的连接参数信息,所述目标预测指标掩码用于表征所述第一超参数组合对应的未知预测指标;
    通过所述神经预测器对输入的所述T+1个连接参数信息中每两个连接参数信息进行相似度匹配得到每两个连接参数信息之间的相似度;
    通过所述神经预测器根据所述T+1个连接参数信息中每两个连接参数信息之间的相似度确定所述第一超参数组合的预测指标。
  17. 一种数据处理装置,其特征在于,包括至少一个处理器和存储器;
    所述存储器,用于存储计算机程序或指令;
    所述至少一个处理器,用于执行所述计算机程序或指令,以使得如权利要求1-8中任一项所述的方法被执行。
  18. 一种芯片***,其特征在于,所述芯片***包括处理器;所述处理器与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现如权利要求1-8任一项所述的方法。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,当所述指令被计算机执行时,使得如权利要求1-8任一项所述的方法被执行。
  20. 一种包含计算机程序或指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得上述权利要求1-8任一项所述的方法被执行。
PCT/CN2023/082786 2022-03-24 2023-03-21 一种数据处理方法及装置 WO2023179609A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210303118.6 2022-03-24
CN202210303118.6A CN116861962A (zh) 2022-03-24 2022-03-24 一种数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023179609A1 true WO2023179609A1 (zh) 2023-09-28

Family

ID=88099991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082786 WO2023179609A1 (zh) 2022-03-24 2023-03-21 一种数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN116861962A (zh)
WO (1) WO2023179609A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117764631A (zh) * 2024-02-22 2024-03-26 山东中翰软件有限公司 基于源端静态数据建模的数据治理优化方法及***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657805A (zh) * 2018-12-07 2019-04-19 泰康保险集团股份有限公司 超参数确定方法、装置、电子设备及计算机可读介质
CN109740113A (zh) * 2018-12-03 2019-05-10 东软集团股份有限公司 超参数阈值范围确定方法、装置、存储介质及电子设备
KR20190118937A (ko) * 2018-04-11 2019-10-21 삼성에스디에스 주식회사 하이퍼파라미터의 최적화 시스템 및 방법
WO2020048722A1 (en) * 2018-09-04 2020-03-12 Siemens Aktiengesellschaft Transfer learning of a machine-learning model using a hyperparameter response model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190118937A (ko) * 2018-04-11 2019-10-21 삼성에스디에스 주식회사 하이퍼파라미터의 최적화 시스템 및 방법
WO2020048722A1 (en) * 2018-09-04 2020-03-12 Siemens Aktiengesellschaft Transfer learning of a machine-learning model using a hyperparameter response model
CN109740113A (zh) * 2018-12-03 2019-05-10 东软集团股份有限公司 超参数阈值范围确定方法、装置、存储介质及电子设备
CN109657805A (zh) * 2018-12-07 2019-04-19 泰康保险集团股份有限公司 超参数确定方法、装置、电子设备及计算机可读介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117764631A (zh) * 2024-02-22 2024-03-26 山东中翰软件有限公司 基于源端静态数据建模的数据治理优化方法及***

Also Published As

Publication number Publication date
CN116861962A (zh) 2023-10-10

Similar Documents

Publication Publication Date Title
CN110119467B (zh) 一种基于会话的项目推荐方法、装置、设备及存储介质
CN109891897B (zh) 用于分析媒体内容的方法
WO2021120719A1 (zh) 神经网络模型更新方法、图像处理方法及装置
EP3711000B1 (en) Regularized neural network architecture search
CN111353076B (zh) 训练跨模态检索模型的方法、跨模态检索的方法和相关装置
US11776092B2 (en) Color restoration method and apparatus
CN111819580A (zh) 用于密集图像预测任务的神经架构搜索
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
CN112115352B (zh) 基于用户兴趣的会话推荐方法及***
US11585918B2 (en) Generative adversarial network-based target identification
CN114519469B (zh) 一种基于Transformer框架的多变量长序列时间序列预测模型的构建方法
CN113039555B (zh) 在视频剪辑中进行动作分类的方法、***及存储介质
JP2022507255A (ja) 自動エンコーダを用いる人工画像生成のためのコンピュータアーキテクチャ
WO2021218470A1 (zh) 一种神经网络优化方法以及装置
CN108876044B (zh) 一种基于知识增强神经网络的线上内容流行度预测方法
CN111428854A (zh) 一种结构搜索方法及结构搜索装置
CN114095381B (zh) 多任务模型训练方法、多任务预测方法及相关产品
WO2021129668A1 (zh) 训练神经网络的方法和装置
WO2023179609A1 (zh) 一种数据处理方法及装置
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
US20240046067A1 (en) Data processing method and related device
Ngo et al. Adaptive anomaly detection for internet of things in hierarchical edge computing: A contextual-bandit approach
WO2020112188A1 (en) Computer architecture for artificial image generation
Sun et al. Active learning for image classification: A deep reinforcement learning approach
EP4200746A1 (en) Neural networks implementing attention over object embeddings for object-centric visual reasoning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773856

Country of ref document: EP

Kind code of ref document: A1