CN116629352A

CN116629352A - Hundred million-level parameter optimizing platform

Info

Publication number: CN116629352A
Application number: CN202310371162.5A
Authority: CN
Inventors: 王子田
Original assignee: Suzhou Huwei Zhisu Technology Co ltd
Current assignee: Suzhou Huwei Zhisu Technology Co ltd
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-08-22

Abstract

The invention discloses a hundred million-level parameter optimizing platform, which comprises: the system comprises a parameter space component, a parallel computing framework, a crowd optimization algorithm unit, a deep learning technology component and a model evaluation and result analysis component, wherein the parameter space component, the parallel computing framework, the crowd optimization algorithm unit, the deep learning technology component and the model evaluation and result analysis component are all connected with a master control mechanism, and the parameter space is modeled and predicted by using a deep learning technology so as to further accelerate the parameter searching process, the model evaluation and result analysis component. The invention can quickly and efficiently find the optimal parameters, and the platform uses a series of innovative technologies and algorithms, and has high efficiency, high precision and expandability compared with the existing parameter optimizing method.

Description

Hundred million-level parameter optimizing platform

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a hundred million-level parameter optimizing platform.

Background

Today, with the continuous development of deep learning and artificial intelligence technology, ultra-large scale parameter optimization has become an important problem for many field researches. In many fields, such as natural language processing, computer vision, speech recognition, etc., it has been demonstrated that better performance can be achieved using very large scale neural networks. However, training these very large scale neural networks requires a lot of computational resources and time, and requires optimization of a lot of parameters, which is very difficult.

Traditional parameter optimization methods, such as random gradient descent (SGD), tend to face many challenges when dealing with large-scale data and parameters. For example, training is slow, and problems such as local extremum are easily trapped. Thus, to address these problems, many improved algorithms have been proposed, such as batch normalization, adaptive gradients, etc. However, there are still a number of problems with these algorithms.

The high-dimensional problem means that the dimension of the parameter space is very high, i.e. the number of parameters is large. For high-dimensional problems, the difficulty of parameter optimization is mainly focused on the following aspects:

dimension disaster: as the parameter dimension increases, the size of the parameter space grows exponentially, resulting in the search space becoming very large, thereby increasing the complexity of the parameter search. This problem is commonly referred to as dimension disaster.

Local optimal solution: there are a large number of locally optimal solutions in the high-dimensional parameter space, and the corresponding performance of these optimal solutions is not the globally optimal solution, although it is better. This makes it more difficult to find a globally optimal solution in a high-dimensional parameter space.

The computing resource requirements are high: searching for high-dimensional parameter spaces requires a significant amount of computing resources and time, especially when the parameter space is very large, beyond the capabilities of the current computing resources may be required.

Therefore, the existing parameter optimizing processing technology is still to be improved.

Disclosure of Invention

The invention aims to: in order to overcome the defects, the invention aims to provide a hundred million-level parameter optimizing platform which can quickly and efficiently find the optimal parameters.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a billion-level parameter optimizing platform, comprising:

the parameter space component is responsible for defining the range and the value of a parameter space and the dependency relationship among different parameters;

a parallel computing framework, which utilizes a distributed computing and multi-process concurrency technology to decompose a parameter space into a plurality of subspaces, and the parameter subspaces are distributed to different computing nodes for computing by parallel computing; a crowd optimization algorithm unit and a deep learning technology component are arranged in the parallel computing frame;

the crowd optimization algorithm unit uses a series of advanced crowd optimization algorithms to quickly search optimal parameters, and the algorithms can avoid sinking into a local optimal solution and improve the searching precision and efficiency;

the deep learning technology component is used for modeling and predicting the parameter space by using a deep learning technology so as to further accelerate the parameter searching process;

The model evaluation and result analysis component is responsible for evaluating the model performance corresponding to different parameter combinations, summarizing and analyzing the results so that a user can better understand the search results, and can also provide auxiliary tools such as charts, data reports and the like so that the user can better analyze and interpret the results;

the parameter space component, the parallel computing framework, the crowd optimization algorithm unit, the deep learning technology component and the model evaluation and result analysis component are all connected with a master control mechanism.

In the specific process of the parallel computing framework, load balancing, performance optimization, programming, error processing and fault recovery are required to be considered so as to ensure the efficiency and robustness of the parallel computing framework, and the specific process of the parallel computing framework is as follows: 1): decomposition problem: decomposing a large-scale task into a plurality of small tasks so as to be capable of parallel processing;

2): and (3) task allocation: assigning the subtasks to different processors or computing units so that the processing can be performed simultaneously;

3): running and calculating: each processor or computing unit independently performs its assigned tasks to process multiple tasks in parallel;

4): combining the calculation results: combining the calculation results of each processor or calculation unit to obtain a final calculation result;

5): the task is completed: when all processors or computing units complete their tasks, the entire computing task is completed.

In the parameter optimization process, the parallel computing framework divides the parameter space into a plurality of subspaces, and then distributes different subspaces to different computing nodes for computing, so that the parallel computing framework is an effective parallel computing mode; the method can utilize distributed computing and multi-process concurrency technology, fully utilize computing resources and improve computing efficiency;

the specific implementation mode is as follows: a, dividing subspaces: the parameter space needs to be divided into a plurality of subspaces; the dividing modes can be uniform dividing, random dividing and the like, and the specific method can be selected according to specific conditions; the sub-spaces after division should have the same size and not be heavy and leak-proof so as to ensure the accuracy of the calculation result;

b. distributing computing nodes: different subspaces are required to be distributed to different computing nodes for computation, and communication and cooperation among the computing nodes can be realized by using a distributed computing framework such as MPI and Hadoop; when the computing nodes are distributed, reasonable distribution is carried out according to the computing capacity and the load condition of the computing nodes so as to fully utilize computing resources and improve computing efficiency;

c. Parallel computing: the allocated subspace is calculated in parallel on each computing node, and the parallel computation inside the computing node can be realized by adopting a multi-process concurrency technology such as multithreading, coroutine and the like; in the parallel computing process, attention should be paid to data synchronization and communication, so that the correctness and consistency of computing results are ensured;

d. Summarizing the calculation result: summarizing the results obtained by calculation of each calculation node to obtain an optimal parameter combination, wherein a summarizing function provided by a distributed calculation framework, such as a reduce function of MPI and a MapReduce framework of Hadoop, can be used to realize summarizing of calculation results; in the summarizing process, the problems of data transmission and calculation efficiency should be considered to ensure the accuracy and efficiency of the summarizing result.

The parameter space decomposition in the invention is a system engineering analysis method, and the parameter space of the system can be decomposed into a plurality of subspaces according to a certain rule so as to carry out more detailed and comprehensive study on the system, and the specific decomposition rule and method are as follows:

a. Determining system parameters: firstly, determining parameters of a system, namely determining parameters which have decisive influence on the system behavior;

b. Dividing parameter value fields: for each parameter, the value range, namely the range of possible values of the parameter, needs to be determined;

c. Dividing subspaces: the parameter space is divided into a plurality of subspaces, each subspace containing a portion of the parameter points. There are several ways to divide the subspace, such as:

-homogeneous partitioning method: equally dividing the parameter value range into a plurality of parts, wherein each part corresponds to one subspace;

-hierarchical partitioning: dividing the parameter space layer by layer from high dimension to low dimension, wherein each layer corresponds to one dimension, and each dimension divides the subspace according to a certain rule;

-centre division: dividing a parameter space into a plurality of spherical subspaces by taking a parameter value range central point as a center;

d. Defining a subspace: defining each subspace, wherein natural language and mathematical formula modes can be adopted;

e. Analysis subspace: analyzing each subspace to obtain the behavior characteristics of the system under the subspace;

f. Comprehensive analysis: and integrating analysis results of all subspaces to obtain the overall behavior characteristic of the system.

The specific calculation process of allocating the parameter subspace to different calculation nodes for calculation by the parallel calculation in the invention is as follows: a: firstly, determining a division mode of a parameter space and a parameter subspace, and dividing the parameter space into a plurality of subspaces by adopting a uniform division or random division mode;

b: determining the number and distribution modes of the computing nodes, and determining how many computing nodes are used and how the nodes are distributed to different computers or processors for calculation according to the performance of the system and the requirements of tasks;

c: determining parameter subspaces to be calculated of each computing node, and after determining the parameter subspaces to be processed of each node according to the number of the computing nodes and the allocation mode, allocating the subspaces to different computing nodes;

d: each calculation node calculates according to the parameter subspace allocated by the calculation node, and transmits the calculation result to other nodes, and in the whole calculation process, the different nodes need to communicate and synchronize so as to ensure the accuracy and consistency of the calculation result;

e: and merging the calculation results, wherein after all calculation nodes finish calculation, the calculation results of each node are required to be merged to obtain the final result of the whole parameter space, and data reduction and aggregation are usually required.

The specific division principle of the parameter space into a plurality of subspaces in the invention can be determined according to specific application requirements, and generally comprises the following aspects:

a characteristics of the objective function: if the objective function exhibits different characteristics, such as unimodal, multimodal, locally optimal, etc., in different parameter areas, the parameter space may be partitioned according to these characteristics;

constraint on problem b: in practical application, various constraint conditions, such as feasibility constraint, resource limitation and the like, are often present, a parameter space can be divided according to the various constraint conditions, and all parameter combinations meeting the constraint conditions are divided into the same subspace;

the specific division process can adopt a divide-and-conquer concept, and the parameter space is continuously divided into smaller subspaces until the number of parameter combinations contained in each subspace can be processed and solved more easily, which can be realized by a dichotomy method and a K-D Tree method. It should be noted that in the process of dividing the parameter space and selecting the subspace, the calculation complexity and searching efficiency of the algorithm need to be considered, so as to avoid the problem that the calculation time is too long or the algorithm efficiency is low when the parameter space is too finely divided or the searching times are too many.

The algorithm adopted in the crowd optimization algorithm unit in the invention takes the fast search of the optimal parameters as the requirement, and comprises a genetic algorithm, a particle swarm algorithm, a model-based optimization algorithm and a distributed differential evolution algorithm, and other algorithms can be selected according to the needs of users;

The rotation of a specific algorithm should take into account the following principles: 1): characteristics of the problem: different types of problems require different types of algorithms to solve, so that features and requirements of the problems need to be clarified first, for example, some problems need to optimize continuous functions, and others need to optimize discrete functions;

2): complexity of the algorithm: the complexity of the algorithm is one of important factors for evaluating the advantages and disadvantages of the algorithm, and the algorithm with high complexity can lead to longer calculation time and even can not process large-scale problems;

3) Stability of algorithm: the stability of the algorithm refers to the sensitivity degree of the algorithm to the initial solution and parameters, and the algorithm with high stability can generate similar optimization results under different initial solutions and parameters;

4) Experience has been: the existing experience is one of important basis for selecting the algorithm, and if the similar problem solving experience exists, the previous experience can be used for selecting the proper algorithm;

5) Interpretive nature of the algorithm: the interpretability of the algorithm refers to transparency and understandability of the algorithm calculation process, so that the algorithm calculation process can be better understood and interpreted, and the algorithm can be effectively adjusted and optimized. The deep learning technology component models and predicts the parameter space by using a deep learning technology, and can learn the mutual influence among parameters, so that the characteristics and rules of the parameter space are better understood, and the parameter searching process is further optimized;

The process of modeling the parameter space by the deep learning technology generally uses a neural network model, specifically: and taking each parameter in the parameter space as input, mapping the input into output through a neural network, and obtaining the output which is the corresponding prediction result. The training process of the neural network is to optimize through a large amount of data, so that the error between the predicted result and the actual result is minimized, and the optimal parameter value is obtained;

predicting an unexplored parameter space refers to predicting an unknown parameter value through existing data and a model, and specifically includes: and obtaining a corresponding prediction result by inputting unknown parameters and calculating through a neural network. The prediction judgment standard can be determined according to specific application scenes, error indexes such as Mean Square Error (MSE) or Cross entropy (Cross-entropy) can be used for evaluating the prediction performance of the model, if the prediction error of the model is smaller, the prediction capability of the model is considered to be stronger, and unknown parameter values can be predicted better;

the implementation mode of the deep learning technical component is as follows:

1): data preparation: a group of marked training data is needed to be prepared, wherein the training data comprises model performance data corresponding to different parameter combinations, and the training data has good representativeness and diversity so as to cover different areas and characteristics of a parameter space; meanwhile, the training data needs to be preprocessed and normalized to ensure the reliability and accuracy of the training data;

2): model design: a deep learning model is needed to be designed for learning the characteristics and rules of the parameter space, and can adopt some common neural network structures, such as a convolutional neural network and a cyclic neural network, and can also self-define some special network structures so as to adapt to different application scenes; the deep learning model should include an input layer, a hidden layer, and an output layer, where the input layer is a parameter space and the output layer is model performance data.

3): training a model: training the deep learning model using training data may employ some optimization algorithm, such as a random gradient descent algorithm, a back propagation algorithm, etc., to minimize prediction errors; during the training process, attention is paid to the distribution and cross-validation of training data to prevent the problems of over-fitting and under-fitting;

4): parameter prediction: the parameter space is modeled and predicted by using the trained deep learning model, the model performance of the known parameter combination can be predicted by using the deep learning model, the unexplored parameter space can be predicted by using the deep learning model, and the prediction result can be used for guiding the parameter searching process, so that the searching efficiency and accuracy are improved.

The specific process of data preprocessing and normalization in the step 1) of the implementation mode of the deep learning technical component in the invention is as follows:

1) Data cleaning: deleting repeated, missing and abnormal value data, and ensuring the quality and accuracy of the data;

2) Feature selection: selecting characteristics influencing the prediction result, reducing dimensionality and improving model performance;

3) Feature scaling: scaling the numerical ranges between the various features to the same range avoids adversely affecting training of the model because some feature values are too large or too small. Common feature scaling methods are: normalization (z-score normalization) and min-max scaling (min-maxscale);

4) Data normalization: uniformly scaling the data to a range between 0 and 1 so that the data is comparable; the usual normalization methods are: min-max scaling (min-max scaling) and z-score normalization;

5) Data set partitioning: dividing the data set into a training set, a verification set and a test set for training, adjusting and testing the model;

6) Classification tag conversion: the classification labels are converted into numerical values, so that the machine learning algorithm is convenient to process;

7) Characteristic engineering: the characteristics are further processed, such as combination, decomposition and discretization, so that the performance and the prediction accuracy of the model are improved;

8) Data dimension reduction: and the PCA and other algorithms are used for reducing the high-dimensional data into low-dimensional data, so that the complexity of model training is reduced.

The modeling process of the deep learning in the implementation mode of the deep learning technical component is as follows:

1) Data preprocessing: preprocessing the original data, including data cleaning, data normalization and data segmentation;

2) Model selection: selecting a proper model, such as a convolutional neural network, a cyclic neural network and a depth self-encoder, according to the characteristics and the data structure of the problem;

3) Model construction: building a model structure, wherein the model structure comprises an input layer, a hidden layer, an output layer and an activation function;

4) Parameter initialization: initializing model parameters, so that the model parameters can be more easily adjusted in the training process;

5) Loss function selection: selecting a suitable loss function, such as cross entropy loss, mean square error;

6) Training a model: training the model by using a training set, and adjusting parameters, such as gradient descent and Adam, by adopting an optimization algorithm;

7) Model evaluation: and evaluating the trained model, wherein the model comprises indexes such as accuracy, recall rate, F1 value, AUC and the like.

8) And (3) model tuning: optimizing the model according to the evaluation result, including adjusting parameters, increasing network layers and adopting different optimization algorithms;

9) Model application: applying the trained model to actual problems, such as image classification and speech recognition;

the training process of the deep learning model is as follows:

1) Data preparation: preprocessing data, dividing a training set and a testing set, and ensuring the quality and applicability of the data;

2) Model construction: selecting proper model structures and parameters according to task requirements and data characteristics, and constructing a deep learning model;

3) Loss function definition: and selecting a proper loss function according to task requirements to measure the difference between the predicted result and the real result of the model.

4) Model training: the model is trained using a training set, with model parameters being continually updated by an optimization algorithm (e.g., gradient descent) to minimize the loss function.

5) Model evaluation: and evaluating the model by using the test set, calculating the loss value and the accuracy and other indexes of the model on the test set, and judging the performance and the generalization capability of the model.

6) Parameter adjustment: and according to the model evaluation result, adjusting the model structure and parameters, optimizing the model performance, and checking the generalization capability of the model by the first-level other methods of the cross verification of the training set and the testing set.

7) Model preservation and use: the optimal model parameters are saved for subsequent prediction, classification, clustering, or other tasks.

The technical scheme can be seen that the invention has the following beneficial effects:

1. the invention relates to a hundred million-level parameter optimizing platform, which is a super-large-scale parameter optimizing platform and can quickly and efficiently find the optimal parameters, wherein the platform uses a series of innovative technologies and algorithms, including distributed computing, multi-process concurrence, crowd-sourcing optimization, deep learning and the like; compared with the existing parameter optimizing method, the platform has the following advantages: high efficiency: the platform can utilize a plurality of computers or cloud servers to perform parallel computation, and parameter searching time is greatly shortened.

The precision is high: the platform uses an advanced parameter optimization algorithm, can quickly search the optimal parameters, and improves the performance of the algorithm and the model.

Scalability: the architecture design of the platform has strong expandability, and can meet parameter optimization tasks with different scales and complexity.

2. The parallel computing framework in the invention utilizes a distributed computing and multi-process concurrency technology to decompose a parameter space into a plurality of subspaces, and distributes different subspaces to different computing nodes for computing. Thus, the calculation time can be greatly reduced, and the utilization rate of calculation resources can be improved.

3. The swarm optimization algorithm component uses a series of advanced swarm optimization algorithms, such as genetic algorithms, particle swarm algorithms, etc., to quickly search for optimal parameters. The algorithm can avoid sinking into a local optimal solution, and improves the searching precision and efficiency.

4. The deep learning technique component models and predicts the parameter space using deep learning techniques to further accelerate the parameter search process. The component can learn the mutual influence among parameters, forecast unexplored parameter space, improve search efficiency and model the parameter space by a deep learning technology;

and taking each parameter in the parameter space as input, mapping the input into output through a neural network, and obtaining the output which is the corresponding prediction result. The training process of the neural network is to optimize through a large amount of data, so that the error between the predicted result and the actual result is minimized, and the optimal parameter value is obtained; and predicting unexplored parameter space, and predicting unknown parameter values through existing data and models. The method can obtain a corresponding prediction result by inputting unknown parameter values and calculating through a neural network, and if the prediction error of the model is smaller, the prediction capability of the model is considered to be stronger, and the unknown parameter values can be predicted better.

5. The model evaluation and result analysis component is responsible for evaluating model performance corresponding to different parameter combinations and summarizing and analyzing the results so that the user can better understand the search results. The component can also provide auxiliary tools such as charts, data reports, etc., for the user to better analyze and interpret the results.

Drawings

FIG. 1 is a schematic diagram of a level one parameter optimization platform according to the present invention;

FIG. 2 is an exploded view of a multi-target space in accordance with the present invention;

FIG. 3 is a schematic graph of a boundary crossing method in accordance with the present invention;

FIG. 4 is a schematic graph of a boundary crossing method with penalty term in the present invention;

FIG. 5 is a flowchart of a genetic algorithm according to the present invention;

FIG. 6 is a flow chart of a particle swarm algorithm in the present invention;

FIG. 7 is a flowchart of a differential evolution algorithm according to the present invention;

FIG. 8 is a schematic diagram of the modular connection of the level-one parameter optimization platform of the present invention.

Detailed Description

The invention is further elucidated below in connection with the drawings and the specific embodiments.

Example 1

1. A hundred million-level parameter optimization platform comprising:

In this embodiment, the component is responsible for defining the range and value of the parameter space, and the dependency between different parameters. For example, for a deep learning model, the parameter space may include parameters such as learning rate, batch size, weight decay, and the like. The specific range and value of the parameter space will vary depending on the specific deep learning model and task. Generally, the learning rate can be between 0.001 and 0.1, the batch size can be between 16 and 512, and the weight decay can be between 0.0001 and 0.1.

There will be a certain relation between the different parameters. For example, learning rate and batch size values can affect the convergence rate and generalization ability of the model. Generally speaking, when the learning rate is high, the convergence speed of the model is high, but the model is easy to be over-fitted; at a smaller learning rate, the convergence rate of the model will be slower, but better generalization ability is easily obtained. When the batch size is larger, the updating direction of the model is more accurate, but the calculation complexity and the memory consumption are increased; when the batch size is smaller, the update direction of the model is more random, but the calculation complexity and the memory consumption are reduced. Weight decay may be used to prevent over-fitting, but excessive weight decay may result in model under-fitting. Therefore, the adjustment needs to be performed according to the specific situation, and the most suitable parameter combination is found.

In the specific process of the parallel computing framework in this embodiment, load balancing, performance optimization, programming, error handling and fault recovery need to be considered to ensure efficiency and robustness of the parallel computing framework, and the specific process of the parallel computing framework is as follows: 1): decomposition problem: decomposing a large-scale task into a plurality of small tasks so as to be capable of parallel processing;

It should be noted that distributed computing refers to a process of distributing and coordinating tasks among multiple computer nodes to complete a large-scale computing task.

The method divides the calculation task and data into a plurality of parts, distributes the parts to different computers for calculation, and finally gathers the results. Distributed computing systems have the advantage that computing power can be increased and high availability and fault tolerance can be achieved.

Multiprocess concurrency refers to the simultaneous execution of multiple processes in the same program, each process executing tasks independently of the other and sharing resources. When a plurality of tasks need to be processed simultaneously, the program execution efficiency can be improved in a mode of multi-process concurrency, and the waiting time is reduced. Multiprocess concurrency requires consideration of coordination and synchronization between multiple processes to ensure correctness and stability of the program.

In general, distributed computing and multi-process concurrency are both aimed at improving computing efficiency and program operating efficiency. Distributed computing emphasizes coordinating multiple nodes to cooperate to accomplish tasks, while multiple processes concurrently emphasize coordination and sharing among multiple processes on the same computer. In practical applications, both may be used in combination to achieve more efficient computation and processing.

The parameter space decomposition in this embodiment is a system engineering analysis method, and the parameter space of the system can be decomposed into a plurality of subspaces according to a certain rule, so as to perform more detailed and comprehensive study on the system, and the specific decomposition rule and method are as follows:

The specific calculation process of allocating the parameter subspace to different calculation nodes for calculation in parallel calculation in the embodiment is as follows: a: firstly, determining a division mode of a parameter space and a parameter subspace, and dividing the parameter space into a plurality of subspaces by adopting a uniform division or random division mode;

The specific division principle of the parameter space into a plurality of subspaces in this embodiment may be determined according to specific application requirements, and generally includes the following aspects:

constraint on problem b: in practical application, various constraint conditions, such as feasibility constraint and resource constraint, are often present, the parameter space can be divided according to the various constraint conditions, and all parameter combinations meeting the constraint conditions are divided into the same subspace.

The algorithm adopted in the crowd optimization algorithm unit in the embodiment takes the fast search of the optimal parameters as the requirement, and comprises a genetic algorithm, a particle swarm algorithm, a model-based optimization algorithm and a distributed differential evolution algorithm, and other algorithms can be selected according to the needs of users;

5) Interpretive nature of the algorithm: the interpretability of the algorithm refers to transparency and understandability of the algorithm calculation process, so that the algorithm calculation process can be better understood and interpreted, and the algorithm can be effectively adjusted and optimized.

Genetic algorithm: the genetic algorithm is a heuristic search algorithm and can be applied to parameter optimization. In a distributed computing environment, a population may be divided into multiple sub-populations, and then operations of genetic algorithms, such as selection, interleaving, mutation, etc., may be performed in parallel on different computing nodes to improve search efficiency.

Particle swarm algorithm: the particle swarm optimization is an optimization algorithm based on swarm intelligence, and can be applied to parameter optimization. In a distributed computing environment, a population of particles may be divided into multiple sub-populations and computed in parallel on different compute nodes to speed up the optimization process.

Model-based optimization algorithm: model-based optimization algorithms typically transform the parameter search problem into a function optimization problem, and then perform optimization calculations in a distributed computing environment using efficient optimization algorithms, such as bayesian optimization algorithms, gaussian process regression, and the like.

Parallel hill climbing algorithm: the parallel hill climbing algorithm is an optimization algorithm based on local search and can be applied to parameter optimization. In a distributed computing environment, a search space may be divided into multiple subspaces, and then local search operations are performed in parallel on different computing nodes to improve search efficiency.

Distributed differential evolution algorithm: the distributed differential evolution algorithm is a heuristic optimization algorithm and can be applied to parameter optimization. In a distributed computing environment, a population may be divided into multiple sub-populations and the operations of the differential evolution algorithm, e.g., selection, interleaving, mutation, etc., performed in parallel on different computing nodes to improve search efficiency.

The deep learning technology component in the embodiment models and predicts the parameter space by using a deep learning technology, and can learn the mutual influence among parameters, so that the characteristics and rules of the parameter space are better understood, and the parameter searching process is further optimized;

The implementation mode of the deep learning technical component is as follows:

The specific process of data preprocessing and normalization in step 1) of the implementation manner of the deep learning technical component in this embodiment is as follows:

3) Feature scaling: scaling the numerical ranges between the various features to the same range avoids adversely affecting training of the model because some feature values are too large or too small. Common feature scaling methods are: normalization (z-score normalization) and min-max scaling (min-max scaling);

The modeling process of the deep learning in the implementation manner of the deep learning technical component in this embodiment is as follows:

the training process of the deep learning model is as follows:

8) Model preservation and use: the optimal model parameters are saved for subsequent prediction, classification, clustering, or other tasks.

Example 2

The billion-level parameter optimizing platform in this embodiment has the same composition structure as that in embodiment 1, and the difference is as follows:

the parameter optimization strategy in the parameter optimization platform in this embodiment is described as follows:

in reality, many optimization problems need to be considered simultaneously, such as performance, power consumption and cost must be considered simultaneously in chip design, and benefits and risks must be considered simultaneously in financial activities. Similar problems can be expressed as multi-objective optimization problems of the form below.

(1)

For the multi-objective optimization problem of d objective functions, we assume an objective function vector />For two solutions in the target space, if u, v satisfies (2), then the solution u is said to have a relationship with v that u dominates v.

/> (2)

If there is no objective function vector of any point in the whole parameter space that dominates the objective function vector of a point, then the point is called a Pareto Optimal point (Pareto Optimal) corresponding to the multi-objective optimization problem, and the objective function vectors of all Pareto Optimal point sets form a Pareto front (ParetoFront) corresponding to the multi-objective optimization problem.

The pareto optimal solution which is as many as possible is found out efficiently, the pareto optimal solution approaches to the pareto front, and the pareto optimal solution is a standard for measuring the performance of the multi-target global optimization algorithm. Because the multi-objective optimization problem is not as complex as the single objective problem in which there is a unique numerical standard (objective function value) that measures the merits of all parameters, the dominant relationship must be determined by performing the comparison of equation (2) between points.

1. Multi-target optimization algorithm based on target space decomposition

Currently, two types of multi-objective optimization algorithms based on evolutionary computation mainly exist, one type is NSGA-ii,

NSGA-iii is a representative algorithm based on non-dominant ranking technology, and the algorithm adopts non-dominant ranking to determine the quality of each individual solution in a group, so that the individuals are layered according to a target spatial dominant relationship, unique serial numbers are arranged for each solution, and the non-dominant and ranked individuals are reserved. However, the non-dominant ordering has considerable calculation amount, and the algorithm has the complexity of />N is the number of individuals, and M is the target number.

The other type of multi-objective optimization method based on objective decomposition is represented by MOEA/D, the basic idea is that the dominant relation among individuals is not directly calculated, reference points and a group of weight vectors uniformly distributed in the objective space are utilized to construct reference directions, the multi-objective optimization problem is converted into scalar sub-problems in each reference direction, and a plurality of sub-problems are solved to approach the pareto front of the objective space. Taking 2 target problem as an example, the target space decomposition method is shown in fig. 2.

The boundary crossing method and the boundary crossing method with penalty term are two common target space decomposition modes. The idea of the boundary crossing method is to find the crossing point of the pareto front and the weight vector, and convert the multi-objective optimization problem into a sub-problem in the form of formula (3) by using the weight vector and the reference point.

（3）

The boundary crossing method with penalty term removes the equality constraint in equation (3), its sub-problem form is shown in equation (4), which is easier to implement in practice.

（4）

The optimization of the sub-problems formed by the two methods can both play a role of approaching the intersection of the weight vector and the pareto front, and the two methods can be respectively shown in fig. 3 and fig. 4. Ideally, a reasonable target space reference point and a reference vector which can better cover the pareto front edge are used, and the sub-problem generated in the target space decomposition method is solved to finally obtain the approximation of the pareto front edge, which is the basic idea of the decomposition-based multi-target optimization algorithm. Assuming m optimization objectives, one can be obtained by presetting an integer H and then, for each component in the m-dimensional weights, subtracting Sequentially taking values, combining to generate +.> />And uniformly distributed weight vectors are used for calculating the reference direction vector. 2. Parallel computing training strategy

Initialize: global parameters />Number of working nodes K，Global iteration number T，Communication interval M，Learning rate />Number of differential evolution iterations L

for t = 0 , 1 , ... , T-1 do

Reading current global model parameters

form = 0 , 1 , ... , M-1do

Randomly extracting or online obtaining samples (or small batches) from the training set S />

Updating />

end for

Synchronous communication to obtain parameters on all nodes />。

The specific contents of the various algorithms are as follows:

the genetic algorithm is an optimization algorithm based on biological evolution theory, and a global optimal solution is found by simulating genetic and evolutionary processes. The advantages are that: the method has higher search precision and expandability, and can solve the problem of large-scale high-dimensional optimization

The calculation process of the genetic algorithm generally comprises the following steps:

1) Initializing a population: randomly generating a set of initial solutions, i.e., populations, the population numbers typically being tens to hundreds;

2) And (3) adaptability evaluation: evaluating the fitness value of each individual through a fitness function, wherein the fitness value represents the degree of how good the individual solves the problem;

3) Selection operation: selecting a certain number of individuals as parents of the next generation population, wherein the selection method comprises roulette selection, tournament selection and the like;

4) Crossover operation: performing crossover operation on the selected individuals to generate new individual solutions;

5) Mutation operation: performing mutation operation on the newly generated individuals to generate new solutions;

6) Replacement operation: replacing the original individuals with the newly generated individuals to generate a new population;

7) Judging termination conditions: returning to the optimal solution if the termination condition (such as fitness value, iteration number, etc.) is reached; otherwise, repeating the steps;

the above is a general calculation process of the genetic algorithm, and the specific implementation also needs to make corresponding adjustment and optimization according to different problems.

The particle swarm algorithm is an optimization algorithm based on swarm intelligence, and a global optimal solution is found by simulating the behavior of a biological swarm. The advantages are that: the method has higher convergence rate and search precision, and can overcome the problem of local optimal solution. Application scene: the method is suitable for high-dimensional optimization problems requiring a globally optimal solution, such as machine learning, neural networks and the like.

The particle swarm algorithm is calculated as follows:

1) Initializing a population of particles

Randomly generating a certain number of particles, wherein each particle has a certain position and a certain speed, the position represents a feasible solution of the solution, and the speed represents the moving direction and distance of the particles;

2) Evaluating fitness of particles

Substituting the position of each particle into an objective function, and calculating an fitness score;

3) Updating globally optimal solutions and individually optimal solutions

Comparing the fitness of all particles with the global optimal solution and the individual optimal solution, and updating the optimal solution;

4) Updating speed and position of particles

Recalculating the speed and the position of each particle according to the global optimal solution and the individual optimal solution;

5) Repeating steps 2-4 until the end condition is satisfied

Circularly executing the steps 2) to 4) until reaching the end condition by setting the conditions such as the maximum iteration times or the target fitness value;

6) And outputting an optimal solution, namely outputting the optimal solution in the iterative process, namely, the optimal solution solved by the particle swarm algorithm.

It should be noted that in performing steps 2) and 3), optimization of the fitness function may be required to improve the efficiency and accuracy of the algorithm.

The parallel hill climbing algorithm is a parallel optimization algorithm based on the hill climbing algorithm, and the searching efficiency is improved through parallel searching of a plurality of subprocesses. The advantages are that: the method has higher searching speed and searching precision, and can solve the problem of large-scale high-dimensional optimization.

Application scene: the method is suitable for high-dimensional optimization problems such as optimization parameter adjustment, logistics optimization and the like which need to quickly search local optimal solutions.

The parallel hill climbing method comprises the following calculation processes:

1) Initializing a set of initial solutions, assuming n solutions;

2) Evaluating each solution in parallel to obtain an objective function value of each solution;

3) Selecting an optimal solution according to the objective function value, stopping calculation if the objective function value of the optimal solution meets a termination condition, and outputting the optimal solution; otherwise, go to step 4);

4) Randomly selecting a group of solutions, and slightly changing each solution to obtain a new group of solutions, wherein m solutions are assumed;

5) Evaluating each new solution in parallel to obtain an objective function value of each new solution;

6) Selecting an optimal solution according to the objective function value, if the optimal solution is better than the previous optimal solution, updating the optimal solution, and returning to the step 3); otherwise, returning to the step 4);

the distributed differential evolution algorithm is a distributed optimization algorithm based on the differential evolution algorithm, and the calculation time is effectively reduced by parallel solution on a plurality of calculation nodes. The advantages are that: the method has higher parallelism and expandability, and can solve the problem of large-scale high-dimensional optimization. Disadvantages: the method has high requirement on computing resources, a large number of computing nodes are needed, and the algorithm complexity is high. Application scene: the method is suitable for high-dimensional optimization problems requiring a large amount of computation resources, such as neural network training, image processing, data mining and the like.

The calculation process of the distributed differential evolution algorithm is as follows:

1) Initializing a population

Firstly, initializing a population, determining the attribute number of each individual according to the characteristics of the problem, and randomly generating an initial individual. The population size is generally preset, usually depends on the scale of the problem, and meanwhile, the evolution algebra and mutation factors and crossover factors of each individual need to be set;

2) Mutation operation

The mutation operation is the core operation of a differential evolution algorithm, three different individuals are selected and marked as r1, r2 and r3, wherein r1 and r2 are randomly selected, and r3 is a neighbor individual of the current individual; then calculating a differential vector v=r1-r 2, and then carrying out mutation operation on the differential vector v and the current individual to obtain a mutated individual u=xi+FXv, wherein F is a mutation factor, xi is the current individual, and u is the mutated individual;

3) Crossover operation

Crossover operation is the process of generating a new individual by using a variant individual u and a current individual xi, and a crossover factor CR (which is usually a value between [0,1 ]) is selected to control the generation of the new individual, namely, some genes of the new individual are taken from u, and some genes retain the value of the original individual xi;

4) Selection operation

The selection operation stores newly generated individuals into a population, and selects the optimal individuals as an initial population for the next evolution;

5) Termination condition:

stopping the iteration of the algorithm when a certain iteration number is reached or the fitness value meets a certain requirement;

6) Outputting a result:

and outputting the optimal solution and the corresponding fitness value.

The model-based optimization algorithm is an optimization algorithm based on a mathematical model, and an optimal solution is found by constructing a mathematical model of an objective function and using the optimization algorithm. The advantages are that: the complex high-dimensional problem can be searched efficiently, and the convergence speed and the searching precision are high. Disadvantages: the accuracy requirement on the model is high, and inaccuracy of the model can lead to large result deviation. Application scene: the method is suitable for the problem that the objective function needs to be accurately modeled, such as the optimization problem in the engineering, economic and financial fields.

The calculation process of the model-based optimization algorithm can be divided into the following steps:

1) Determining a model: selecting an appropriate model, such as a linear model, a nonlinear model, etc., according to the actual problem;

2) Establishing an objective function: determining objective functions, such as minimizing error, maximizing profit, etc., based on the actual problem;

3) Determining an optimization algorithm: selecting a proper optimization algorithm, such as gradient descent, genetic algorithm and the like, according to the model and the objective function;

4) Setting initial parameters: setting initial parameter values, such as regression coefficients in linear regression, according to the model and the objective function;

5) And (3) iteration solution: gradually approaching to the optimal solution through continuous iterative optimization algorithm;

6) Judging convergence: judging whether the optimization algorithm is converged or not, and if not, continuing iteration;

7) Outputting a result: after obtaining the optimal solution, outputting results, such as regression coefficients, minimizing errors, maximizing profits, and the like;

in general, the calculation process of the model-based optimization algorithm is to build a model and an objective function according to the actual problem, and obtain an optimal solution and output a result by iteratively solving the optimization algorithm.

The hundred million-level parameter optimizing platform described in this embodiment is not only suitable for very large scale neural networks, but is also a key problem for many model training in the fields of machine learning and artificial intelligence. In practice, researchers and engineers often need to optimize parameters in various models. These models include Support Vector Machines (SVMs), decision trees, logistic regression, random forests, etc. In these models, the purpose of parameter optimization is to best fit given training data by optimizing the parameters of the model, while maintaining generalization ability to new data. The goal of the optimization may be to minimize training errors, maximize prediction accuracy, or other metrics. Parameter optimization is typically achieved by mathematical optimization techniques such as gradient descent, newton's method, etc.

The following are some common application scenarios:

1. machine learning

Machine learning is an important artificial intelligence technology and can be applied to the fields of data mining, image recognition, natural language processing and the like. In machine learning, model parameters need to be optimized to achieve better prediction and classification results. The ultra-large scale parameter optimization technology can help to accelerate training and optimization of the machine learning model, and improve accuracy and efficiency of the model.

The specific application process is as follows:

1) Data preprocessing: firstly, data preprocessing is needed, including data cleaning, data normalization, feature extraction and the like.

2) Model construction: suitable models are selected for construction, such as deep neural networks, support vector machines, decision trees, etc., and loss functions and optimizers are defined.

3) Ultra-large scale parameter optimization: and selecting a proper optimization algorithm for parameter optimization, wherein the common optimization algorithm comprises random gradient descent, random gradient descent with momentum, random gradient descent of self-adaptive learning rate and the like.

4) Model evaluation: the model is evaluated using a validation set or test set, and the primary indicators include accuracy, recall, F1 score, and the like.

5) Parameter adjustment: model performance is improved by adjusting hyper-parameters such as learning rate, batch size, regularization terms, etc.

6) Deployment model: and finally, deploying the optimized model into a production environment.

It should be noted that in practice, very large scale parameter optimization is a time and computation resource consuming process, and requires the use of distributed computing, GPU acceleration, etc. techniques to improve efficiency.

2. Computer vision

Computer vision is an important computer science field, and can be applied to aspects such as image recognition, target tracking, face recognition and the like. In computer vision, image features need to be extracted and classified to realize functions such as image recognition and target tracking. The ultra-large scale parameter optimization technology can help to accelerate the extraction and classification of image features and improve the accuracy and efficiency of a computer vision system.

The application process of the ultra-large scale parameter optimization in the field of computer vision is as follows:

1) Data preparation: first, large-scale image and annotation data, which typically include pixel values of the picture and corresponding annotations, such as annotations for classification, detection, segmentation, etc., need to be prepared.

2) Model design: corresponding model structures, such as convolutional neural networks (Convolutional Neural Network, CNN), cyclic neural networks (RecurrentNeural Network, RNN) and the like, are designed according to specific tasks and data characteristics.

3) Parameter initialization: the model parameters are initialized, typically using a random initialization method.

4) Training a model: the model is trained using large scale annotation data, and parametric optimization is typically performed using an optimization algorithm such as random gradient descent (Stochastic Gradient Descent, SGD).

5) Super-parameter adjustment: for some parameters that need to be set manually, such as learning rate, regularization coefficients, etc., adjustments need to be made to obtain better results.

6) Model evaluation: and evaluating the trained model by using a verification set or a test set, and calculating indexes such as precision, recall rate and the like.

7) Deployment and application: the trained model is deployed into practical application scenes, such as image classification, target detection, face recognition and the like.

During the application process, fine-tuning or retraining of the model may be required to meet the actual requirements. Meanwhile, the problems of interpretability, robustness and the like of the model also need to be considered.

3. Natural language processing

Natural language processing is a technology for converting human language into computer language, and can be applied to the fields of text classification, emotion analysis, machine translation, question-answering systems, speech recognition and the like. In natural language processing, a language model needs to be trained and optimized to achieve better language understanding and generation functions. The ultra-large scale parameter optimization technology can help to accelerate the training and optimization of the language model, and improve the accuracy and efficiency of the natural language processing system.

The following is an application process using very large scale parameter optimization in the field of natural language processing:

1) Data collection and preprocessing: first, a sufficient data set needs to be collected and preprocessed, such as data cleaning, word segmentation, word drying, feature extraction, etc.

2) Model selection: models appropriate for the task are selected and trained from the data set, such as deep neural networks, logistic regression, support vector machines, naive bayes, and the like.

3) Super-parameter selection: super-parameters are parameters in the model that need to be set manually, such as learning rate, iteration number, batch size, regularization coefficient, etc. The optimal super parameters are usually found by cross-validation or the like.

4) Model training: training the model by using the training set, and monitoring and adjusting the training process.

5) Model evaluation: the test set is used to evaluate the trained models, such as indexes of accuracy, recall, and F1 score.

6) Model optimization: the model performance is further improved by adjusting the model parameters and architecture, and steps 3-5 are repeated until the optimal performance is reached.

7) Deployment: the trained model is deployed into a production environment, for example, to build an API interface for other applications to call.

In summary, very large scale parametric optimization is one of the key steps in achieving an optimal natural language processing model, and model performance can be continuously optimized through iterative iterations.

4. Genomics

Genomics is a science of researching biological genes and genomes, and can be applied to aspects of disease diagnosis, drug research and development, gene editing and the like. In genomics, a large amount of genetic data needs to be analyzed and classified to achieve better bioinformatic analysis and prediction functions. The ultra-large scale parameter optimization technology can help to accelerate analysis and classification of gene data and improve accuracy and efficiency of genomics analysis and prediction.

Very large scale parametric optimization techniques can solve a number of problems in genomics, such as gene prediction, gene assembly and gene expression analysis. The following is the application of the very large scale parameter optimization in genomics:

1) Data preprocessing: first, useful information needs to be extracted from genomic data. For example, gene prediction requires conversion of DNA sequences into protein encoded information, gene assembly requires splicing of fragmented DNA sequences into contiguous sequences, and gene expression analysis requires comparison of gene expression levels with genomic annotation information. These pretreatment steps may be accomplished using various genomic tools.

2) Feature selection: very large scale parameter optimization requires selection of features for training the model. In genomics, features can include sequence, amount of expression, tissue specificity, etc. of genes. Feature selection may use statistical methods and machine learning algorithms.

3) Model training: and training a model by using a very large scale parameter optimization algorithm according to the selected characteristics. Typically, the problems in genomics are multi-class classification problems, and therefore algorithms suitable for multi-class classification, such as logistic regression, decision trees, and support vector machines, need to be used.

4) Model evaluation: the trained model performance is evaluated using the test dataset. In genomics, cross-validation is often used to evaluate the performance of models.

5) Model application: the trained models can be applied to various genomics problems. For example, gene prediction models may be used to predict new gene sequences, and gene expression analysis models may be used to interpret differential expression of tissues or diseases.

In summary, very large scale parameter optimization is widely used in genomics, can accelerate the progress of genomics research, and provides useful information and insight.

Notably, there are some differences in parameter optimization for different models. For example, in deep neural networks, the size of the parameters that need to be processed is often very large and there are a number of non-convex problems. These problems typically need to be solved using special algorithms such as random gradient descent, adaptive gradient, etc. While in other models, parameter optimization may be easier to handle because of smaller parameter sizes or simpler optimization problems.

In summary, parameter optimization is an important issue in the fields of machine learning and artificial intelligence, not only for very large scale neural networks, but also for other models. The optimization parameters can help the model to better fit data and improve prediction accuracy, so that the performance and application value of the model are improved.

The foregoing is merely a preferred embodiment of the invention, and it should be noted that modifications could be made by those skilled in the art without departing from the principles of the invention, which modifications would also be considered to be within the scope of the invention.

Claims

1. A hundred million-level parameter optimizing platform is characterized in that: comprising the following steps:

a parallel computing framework, which utilizes a distributed computing and multi-process concurrency technology to decompose a parameter space into a plurality of parameter subspaces, and the parameter subspaces are distributed to different computing nodes for computing by parallel computing;

2. The level one parameter optimization platform of claim 1, wherein: in the specific process of the parallel computing framework, load balancing, performance optimization, programming, error processing and fault recovery are required to be considered so as to ensure the efficiency and robustness of the parallel computing framework, and the specific process of the parallel computing framework is as follows: 1): decomposition problem: decomposing a large-scale task into a plurality of small tasks so as to be capable of parallel processing;

3. The level one parameter optimization platform of claim 2, wherein: in the parameter optimization process, the parallel computing framework divides a parameter space into a plurality of subspaces, and then distributes different subspaces to different computing nodes for computing, so that the parallel computing framework is an effective parallel computing mode; the method can utilize distributed computing and multi-process concurrency technology, fully utilize computing resources and improve computing efficiency;

4. A level one parameter optimization platform as defined in claim 3, wherein: the parameter space decomposition is a system engineering analysis method, and the parameter space of the system can be decomposed into a plurality of subspaces according to a certain rule so as to carry out more detailed and comprehensive study on the system, wherein the specific decomposition rule and method are as follows:

c. Dividing subspaces: dividing the parameter space into a plurality of subspaces, each subspace containing a part of parameter points;

there are several ways to divide the subspace, such as:

5. The level one parameter optimization platform of claim 4, wherein: the specific calculation process of allocating the parameter subspaces to different calculation nodes for calculation by the parallel calculation is as follows: a: firstly, determining a division mode of a parameter space and a parameter subspace, and dividing the parameter space into a plurality of subspaces by adopting a uniform division or random division mode;

6. A level one parameter optimization platform as defined in claim 3, wherein: the specific division principle of the parameter space into a plurality of subspaces can be determined according to specific application requirements, and generally comprises the following aspects:

7. The level one parameter optimization platform of claim 1, wherein: algorithms adopted in the crowd optimization algorithm unit take the requirement of quickly searching optimal parameters, and comprise a genetic algorithm, a particle swarm algorithm, a model-based optimization algorithm and a distributed differential evolution algorithm, and other algorithms can be selected according to the needs of users;

8. The level one parameter optimization platform of claim 1, wherein: the deep learning technology component models and predicts the parameter space by using a deep learning technology, and can learn the mutual influence among parameters, so that the characteristics and rules of the parameter space are better understood, and the parameter searching process is further optimized;

The process of modeling the parameter space by the deep learning technology generally uses a neural network model, specifically: taking each parameter in the parameter space as input, mapping the input into output through a neural network, wherein the output is a corresponding prediction result;

the training process of the neural network is to optimize through a large amount of data, so that the error between the predicted result and the actual result is minimized, and the optimal parameter value is obtained;

predicting an unexplored parameter space refers to predicting an unknown parameter value through existing data and a model, and specifically includes: obtaining a corresponding prediction result by inputting unknown parameters and calculating through a neural network;

the prediction judgment standard can be determined according to specific application scenes, error indexes such as Mean Square Error (MSE) or Cross entropy (Cross-entropy) can be used for evaluating the prediction performance of the model, if the prediction error of the model is smaller, the prediction capability of the model is considered to be stronger, and unknown parameter values can be predicted better;

the implementation mode of the deep learning technical component is as follows:

2): model design: a deep learning model is needed to be designed for learning the characteristics and rules of the parameter space, and can adopt some common neural network structures, such as a convolutional neural network and a cyclic neural network, and can also self-define some special network structures so as to adapt to different application scenes; the deep learning model should include an input layer, a hidden layer and an output layer, wherein the input layer is a parameter space, and the output layer is model performance data;

9. The level one parameter optimization platform of claim 8, wherein: the specific process of data preprocessing and normalization in the step 1) of the implementation mode of the deep learning technology component is as follows:

3) Feature scaling: scaling the numerical ranges among the features to the same range, so as to avoid adverse effects on model training caused by too large or too small values of certain features;

common feature scaling methods are: normalization (z-score normalization) and min-max scaling (min-max scaling);

10. The level one parameter optimization platform of claim 8, wherein: the modeling process of the deep learning in the implementation mode of the deep learning technical component is as follows:

7) Model evaluation: evaluating the trained model, wherein the model comprises indexes such as accuracy, recall rate, F1 value, AUC and the like;

the training process of the deep learning model is as follows:

3) Loss function definition: selecting a proper loss function according to task demands to measure the difference between a model predicted result and a real result;

4) Model training: training the model by using a training set, and continuously updating model parameters through an optimization algorithm (such as gradient descent) so as to minimize a loss function;

5) Model evaluation: evaluating the model by using a test set, calculating the loss value and the accuracy and other indexes of the model on the test set, and judging the performance and the generalization capability of the model;

6) Parameter adjustment: according to the model evaluation result, adjusting the model structure and parameters, optimizing the model performance, and checking the generalization capability of the model by the first-level other methods of the cross verification of the training set and the testing set;

model preservation and use: the optimal model parameters are saved for subsequent prediction, classification, clustering, or other tasks.