CN112633494A

CN112633494A - Automatic neural network structure searching method based on automatic machine learning

Info

Publication number: CN112633494A
Application number: CN202011492102.1A
Authority: CN
Inventors: 陈波; 史特; 左御丁; 王庆先
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-09

Abstract

The invention discloses an automatic neural network structure searching method based on automatic machine learning, which comprises the following steps: s1: determining a search space S based on a cell of a neural network; s2: selecting a specific structure S from the search space S by using a search strategy; s3: and based on the search space S, searching the specific structure S by using a search strategy, and returning the search result to the search strategy to complete the automatic search of the neural network structure. The search method has strong generalization capability, can be applied to various scenes such as computer vision and natural language processing, has obvious application effect particularly in the intelligent customer service industry oriented to the financial field, can obviously reduce the time cost in the modeling process and improve the modeling efficiency.

Description

Automatic neural network structure searching method based on automatic machine learning

Technical Field

The invention belongs to the technical field of neural networks, and particularly relates to an automatic neural network structure searching method based on automatic machine learning.

Background

With the rapid development of artificial intelligence technology, machine learning is widely applied in the fields of natural language processing, computer vision and the like, and the structure of a corresponding neural network is complicated along with the rapid development of the artificial intelligence technology. In the deep learning field, the neural network structure needs to be designed manually under normal conditions, and a large amount of manpower and material resources are consumed in order to obtain a neural network structure with an ideal effect. Therefore, in order to relieve manpower, researches on automatic search of neural network structures are becoming increasingly important to researchers. Based on the above situation, the invention provides an automatic neural network structure searching method based on automatic machine learning.

Disclosure of Invention

The invention aims to solve the problems of overlong searching time of a neural network structure, deep interval of a searching process and performance loss, and provides an automatic searching method of the neural network structure based on automatic machine learning.

The technical scheme of the invention is as follows: an automatic neural network structure searching method based on automatic machine learning comprises the following steps:

s1: determining a search space S by using a cell of a neural network based on a model generation stage of automatic machine learning;

s2: selecting a specific structure S from the search space S by using a search strategy;

s3: and based on the search space S, searching the specific structure S by using a search strategy, and returning the search result to the search strategy to complete the automatic search of the neural network structure.

The invention has the beneficial effects that:

(1) the method has strong generalization capability, can be applied to various scenes such as computer vision and natural language processing, has obvious application effect particularly in the intelligent customer service industry oriented to the financial field, can obviously reduce the time cost in the modeling process and improve the modeling efficiency.

(2) The search time is reduced. In the invention, a unit-based structure is used, the searching is carried out in stages in the searching process, in each stage, the candidate operation with lower grade is deleted according to the grade of the candidate operation, only the candidate operation with higher grade is reserved, and the next stage is entered. This reduces the number of operations so that the search space becomes relatively simple. In addition, the number of epochs set in each stage is reduced to a certain extent. By the method, the usage amount of the GPU video memory is reduced, so that the search time is shortened.

(3) The performance loss problem is solved. In the searching process of each stage, when the scores of the candidate operations are evaluated, the 0-1 loss function is used for replacing the original cross entropy loss function, so that the discrepancies of discretization caused by continuous coding are reduced, the scores of different candidate operations can be closer to 0 or 1 under more equal conditions, and the differences among the scores of the candidate operations are more obvious. The change enables the neural network structure to select the required operation more accurately when selecting from the candidate operations in the searching process, and further enables the performance loss problem of the searched network to be controlled.

(4) The depth spacing problem is solved. The automatic neural network structure searching method based on automatic machine learning provided by the invention carries out the searching process in stages, and solves the problem that the depth has intervals because the network used in the training process is different from the network used in the evaluation process and the difference between the network sizes is overlarge. The staged operation enables the searched network to be closer to the network structure during evaluation in the last stage, the problem of framework overfitting caused by directly using a large network for searching is solved, the searching space is approximated by the mode, and the interval problem existing between depths is relieved to a certain extent.

(5) In the process of controlling candidate operation, a search space regularization and early stop mechanism is used, and if more than two jump connections appear in a cell, the search of the cell is stopped. This directly controls the number of hopping connections, so that the performance of the search is improved.

Further, in step S1, stacking the cell to obtain a search space S; the cell consists of two input nodes, a middle node, an output node and an edge; the edges of the cell represent candidate operations.

The beneficial effects of the further scheme are as follows: in the invention, the cell-based search space has two advantages: firstly, the size of the search space S is effectively reduced; second, model migration is easier to perform.

Further, step S3 includes the following sub-steps:

s31: based on the search space S, carrying out first-stage search on the specific structure S by using a search strategy;

s32: normalizing the edges of the cell to obtain the weight of the edges in the cell;

s33: sorting the weights of the edges in the cell, and screening to obtain a candidate operation with the highest weight;

s34: and according to the candidate operation with the highest weight, sequentially performing the second-stage search and the third-stage search, and returning the search result to the search strategy to complete the automatic search of the neural network structure.

The beneficial effects of the further scheme are as follows: in the invention, a staged method is adopted, namely, the searching process is divided into three stages, the number of cell in each stage is changed in a progressive mode, and the depth of the searching network is gradually increased, so that the searching space is gradually reduced, and the problem of depth interval caused by the fact that the searching is carried out in a network with smaller depth and the testing is carried out in a network with deeper depth is solved.

Further, in step S31, the structure of the first stage search includes 5 cell units, and the candidate operations of all edges in each cell unit in the first stage search respectively include: max _ pool _3x3, avg _ pool _3x3, skip _ connect, sep _ conv _3x3, sep _ conv _5x5, dil _ conv _3x3, dil _ conv _5x5, and none.

The beneficial effects of the further scheme are as follows: in the present invention, the search network in the first stage is small, and only 5 cells exist, but the number of candidate operations on each edge of a cell is the largest, that is, all operations are included.

Further onIn step S32, the weight w of the edge in the cell_i.j(x_i) The calculation formula of (2) is as follows:

wherein,

denotes softmax operation, B_i,jRepresenting the operating space of edge i and edge j, b representing a function with each edge representing a candidate, α_b ^(i,j)Expressing the structural parameters, alpha expressing the weighted matrix of all edges, exp (-) expressing the exponential function operation, b' expressing the derivation of the candidate function, x_iDenotes inode, α_b‘ ^(i,j)The structure parameters after derivation of the candidate parameters are shown, and B represents the operation space of the edge.

The beneficial effects of the further scheme are as follows: in the present invention, the operation of each edge is normalized using the Softmax function among all the candidate operations.

Further, in step S33, the candidate filtering operation is modified by using a 0-1 loss function, which is calculated by the following formula:

totalL＝l_val(ω^*(α),α)+ω_0-1L

ω^*＝argmin_ωl_train(ω,a)

wherein totalL represents the total 0-1 loss function, L represents the loss function, L_valA loss value representing the validation set, a represents a matrix of weight values for all edges, ω represents a matrix of parameters, ω represents a matrix of weights for all edges^*Representing the value of the optimal parameter matrix after the search space is continued, N representing the total number of edges, exp (-) representing the operation of the exponential function, i and j representing the edges, ω_0-1L represents a weight coefficient, B_i,jRepresenting the operation space of edge i and edge j,α_b ^(i,j)expressing the structural parameters, argmin (·) the variable value at which the objective function takes a minimum value, l_trainRepresenting the loss value of the training set, b' representing the derivative of the candidate function, α_b‘ ^(i,j)Representing the structure parameters derived from the candidate parameters.

The beneficial effects of the further scheme are as follows: in the invention, a proper operation is selected from candidate operations according to continuous structural weight scores, but the great difference between the weight scores and 0-1 easily causes the deviation of selection when the candidate operations are selected, so a 0-1 loss function is used for relieving the occurrence of the situation. The 0-1 loss function reduces the difference of continuous coding discretization, makes the difference between weight parameters of different operations more obvious, enlarges the relative difference, and approaches to 0 or 1, so that the operation selection is more accurate and convenient. At the same time, to be able to control the weight of the loss function, ω is used_0-1L weight coefficient, L_valDetermined by ω and α.

Further, in step S33, the skip connection in the process of screening the weight is cut off by the regularized dropout method.

The beneficial effects of the further scheme are as follows: in the invention, as the times of sending all data into the neural network to complete the process of one-time forward calculation and backward propagation (epoch) are increased, the mutual competition among the weights is gradually increased, so that the weight of the final skip-connection is higher and higher, the skip-connection is finally selected, and the excessive skip-connection is equivalent to the occurrence of an overfitting phenomenon in the searching process, so that the automatic searching performance of the neural network structure is influenced. Aiming at overfitting caused by excessive skip-connection, the number of skip-connection is controlled by a method of search space regularization and early stop, and the aim is to improve the search performance.

After the skip-connection operation, a regularization method (dropout) is added to the operation level to partially cut off the skip-connection operation, so that the algorithm searches for other candidate operations. And a stronger dropout is used in the initial training stage, the probability of the dropout is gradually attenuated in the training process, and a lighter dropout is used in the later training stage, so that the final learning of the network structure parameters is not influenced.

Further, in step S3, the search process is controlled by using an early-stopping mechanism, and the specific method thereof is as follows: and if the number of the jump connections in the cell exceeds two, stopping searching.

The beneficial effects of the further scheme are as follows: in the invention, a method for controlling the number of skip-connections intuitively is used in the searching process. If more than two skip-connections occur in a normal cell, the search is stopped.

The skip-connection is controlled through two modes of regularization and early stop for controlling the number of skip-connections, so that the problem of performance reduction caused by less training parameters is solved.

Further, in step S34, the number of operation candidates in the second stage is 5, the number of cell candidates is 11, the number of operation candidates in the third stage is 3, and the number of cell candidates is 17.

The beneficial effects of the further scheme are as follows: in the invention, the number of the cells is increased, but the number of the candidate operations is reduced, and the finally searched network is close to the last deeper evaluation network by gradually increasing the number of the cells.

Drawings

FIG. 1 is a flow chart of an automatic search method for neural network structures based on automatic machine learning;

FIG. 2 is a block diagram of a cell;

fig. 3 is a structural view of the first to third stages.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

Before describing specific embodiments of the present invention, in order to make the solution of the present invention more clear and complete, the definitions of the abbreviations and key terms appearing in the present invention will be explained first:

dil _ conv _3x3 and dil _ conv _5x 5: are all candidate operations in the depth separable convolution.

As shown in fig. 1, the present invention provides an automatic neural network structure search method based on automatic machine learning, which includes the following steps:

In the embodiment of the present invention, as shown in fig. 2, in step S1, cell cells are stacked to obtain a search space S; the cell consists of two input nodes, a middle node, an output node and an edge; the edges of the cell represent candidate operations.

In the invention, the cell-based search space has two advantages: firstly, the size of the search space S is effectively reduced; second, model migration is easier to perform.

In the embodiment of the present invention, as shown in fig. 2, the cell is a directed acyclic graph, each cell is composed of N nodes, and each node is a layer in the neural network. Wherein, the input node: the input nodes of the convolutional network are the outputs of the previous two layers, the input nodes of the cyclic network are the inputs of the current layer and the state of the previous layer, where C_k-1And C_k-2Representing an input node; an intermediate node: each intermediate node is obtained by edge re-summing its predecessors, where N is₀、N₁、N₂And N₃Together forming an intermediate node; an output node: by each intermediate node operating in a vertical connection, where C_kRepresents an output node; side: the edges represent candidate operations, and all edges contain candidate operations, such as convolution of 3 × 3, and the like.

In the embodiment of the present invention, as shown in fig. 1, step S3 includes the following sub-steps:

In the invention, a staged method is adopted, namely, the searching process is divided into three stages, the number of cell in each stage is changed in a progressive mode, and the depth of the searching network is gradually increased, so that the searching space is gradually reduced, and the problem of depth interval caused by the fact that the searching is carried out in a network with smaller depth and the testing is carried out in a network with deeper depth is solved.

In the embodiment of the present invention, as shown in fig. 1, in step S31, the structure of the first-stage search includes 5 cell units, and the candidate operations of all edges in each cell unit in the first-stage search respectively include: max _ pool _3x3, avg _ pool _3x3, skip _ connect, sep _ conv _3x3, sep _ conv _5x5, dil _ conv _3x3, dil _ conv _5x5, and none.

In the present invention, the search network in the first stage is small, and only 5 cells exist, but the number of candidate operations on each edge of a cell is the largest, that is, all operations are included.

In the embodiment of the present invention, as shown in fig. 1, in step S32, the weight w of the edge in the cell is_i.j(x_i) The calculation formula of (2) is as follows:

wherein,

In the present invention, the operation of each edge is normalized using the Softmax function among all the candidate operations.

In the embodiment of the present invention, as shown in fig. 1, in step S33, the candidate screening operation is modified by using a 0-1 loss function, and the calculation formula is as follows:

totalL＝l_val(ω^*(α),α)+ω_0-1L

ω^*＝argmin_ωl_train(ω,a)

wherein totalL represents the total 0-1 loss function, L represents the loss function, L_valA loss value representing the validation set, a represents a matrix of weight values for all edges, ω represents a matrix of parameters, ω represents a matrix of weights for all edges^*Representing the value of the optimal parameter matrix after the search space is continued, N representing the total number of edges, exp (-) representing the operation of the exponential function, i and j representing the edges, ω_0-1L represents a weight coefficient, B_i,jRepresenting the operating space of edge i and edge j, α_b ^(i,j)Expressing the structural parameters, argmin (·) the variable value at which the objective function takes a minimum value, l_trainRepresenting the loss value of the training set, b' representing the derivative of the candidate function, α_b‘ ^(i,j)Representing the structure parameters derived from the candidate parameters.

In the present invention, the appropriate operation is selected among the candidate operations according to the successive structural weight scores, but with a weight between the score of the weight and 0-1A large gap easily leads to a bias in the selection when selecting candidate operations, so a 0-1 penalty function is used to mitigate this. The 0-1 loss function reduces the difference of continuous coding discretization, makes the difference between weight parameters of different operations more obvious, enlarges the relative difference, and approaches to 0 or 1, so that the operation selection is more accurate and convenient. At the same time, to be able to control the weight of the loss function, ω is used_0-1L weight coefficient, L_valDetermined by ω and α.

In the embodiment of the present invention, as shown in fig. 1, in step S33, the jump connection in the process of screening the weights is cut off by the regularized dropout method.

In the invention, as the times of sending all data into the neural network to complete the process of one-time forward calculation and backward propagation (epoch) are increased, the mutual competition among the weights is gradually increased, so that the weight of the final skip-connection is higher and higher, the skip-connection is finally selected, and the excessive skip-connection is equivalent to the occurrence of an overfitting phenomenon in the searching process, so that the automatic searching performance of the neural network structure is influenced. Aiming at overfitting caused by excessive skip-connection, the number of skip-connection is controlled by a method of search space regularization and early stop, and the aim is to improve the search performance.

In the embodiment of the present invention, as shown in fig. 1, in step S3, the early-stopping mechanism is used to control the search process, and the specific method thereof is as follows: and if the number of the jump connections in the cell exceeds two, stopping searching.

In the invention, a method for controlling the number of skip-connections intuitively is used in the searching process. If more than two skip-connections occur in a normal cell, the search is stopped.

In the embodiment of the present invention, as shown in fig. 3, in step S34, the number of candidate operations in the second stage is 5, the number of cell is 11, the number of candidate operations in the third stage is 3, and the number of cell is 17.

In the invention, the number of the cells is increased, but the number of the candidate operations is reduced, and the finally searched network is close to the last deeper evaluation network by gradually increasing the number of the cells.

In the invention, the technical scheme can be applied to the field of automatic machine learning (AutoML). AutoML comprises four stages, namely data preprocessing, feature engineering, model generation and model evaluation. The automatic search of the neural network structure belongs to the core link of model generation.

The success of deep learning in perceptual tasks is mainly attributed to its automation of the feature engineering process: the hierarchical feature extractor is learned from the data in an end-to-end fashion, rather than being designed manually. However, this success has been accompanied by an increasing demand for a network architecture framework, more and more complex neural network architectures being designed by hand. And the manual design of the network structure framework is time-consuming and easy to make mistakes. On the contrary, the automatic neural network structure search can find the structure with the optimal performance in a traversal mode, and on the other hand, the automatic neural network structure search can also break the limitation of human thinking to find the organization mode of the structure which is not thought by human beings. The automatic search of the neural network structure can be regarded as a sub-field of the AutoML, has a cross point with the hyper-parameter optimization and the meta-learning, and is a reasonable development direction of the automatic machine learning. Of course, the invention also reveals the headedness in the deep learning fields such as image and natural language processing, and plays an important role in the fields related to the deep learning, such as intelligent financial wind control, intelligent automobile automatic driving and the like.

The working principle and the process of the invention are as follows: in the present invention, the search strategy selects a specific structure S from a predefined search space S, and the structure is transmitted to a performance evaluation stage for evaluation. And after evaluation, returning the performance evaluation result of the specific structure s to the search strategy, and then guiding the selection of the next structure by the search strategy. In the searching process, a 0-1 loss function is used, so that the differentiation of the candidate operation is more obvious, the accuracy of selecting the candidate operation is further improved, a search space regularization and early stop mechanism is added, and the performance of the searched model is further improved.

The invention has the beneficial effects that:

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. An automatic neural network structure searching method based on automatic machine learning is characterized by comprising the following steps:

2. The automatic neural network structure searching method based on automatic machine learning of claim 1, wherein in step S1, cell cells are stacked to obtain a search space S; the cell consists of two input nodes, a middle node, an output node and an edge; the edges of the cell represent candidate operations.

3. The automatic machine learning-based neural network structure searching method according to claim 2, wherein the step S3 includes the following sub-steps:

4. The automatic search method for neural network structure based on automatic machine learning of claim 3, wherein in step S31, the structure searched in the first stage includes 5 cell units, and the candidate operations included in all edges in each cell unit in the first stage search are: max _ pool _3x3, avg _ pool _3x3, skip _ connect, sep _ conv _3x3, sep _ conv _5x5, dil _ conv _3x3, dil _ conv _5x5, and none.

5. The automatic machine learning-based neural network structure searching method of claim 3,in step S32, the weight w of the edge in the cell_i.j(x_i) The calculation formula of (2) is as follows:

wherein,

6. The automatic machine learning-based neural network structure searching method of claim 3, wherein in step S33, the filtering candidate operation is modified by using a 0-1 loss function, and the formula is:

totalL＝l_val(ω^*(α),α)+ω_0-1L

ω^*＝argmin_ωl_train(ω,α)

wherein totalL represents the total 0-1 loss function, L represents the loss function, L_valA loss value representing the validation set, a represents a matrix of weight values for all edges, ω represents a matrix of parameters, ω represents a matrix of weights for all edges^*Representing the value of the optimal parameter matrix after the search space is continued, N representing the total number of edges, exp (-) representing the operation of the exponential function, i and j representing the edges, ω_0-1L represents a weight coefficient, B_i,jRepresenting operations of edge i and edge jSpace, α_b ^(i,j)Expressing the structural parameters, argmin (·) the variable value at which the objective function takes a minimum value, l_trainRepresenting the loss value of the training set, b' representing the derivative of the candidate function, α_b‘ ^(i,j)Representing the structure parameters derived from the candidate parameters.

7. The automatic machine learning-based neural network structure searching method according to claim 3, wherein in step S33, the jump connection in the process of screening the weight is cut off by a regularized dropout method.

8. The automatic neural network structure searching method based on automatic machine learning of claim 8, wherein in step S33, the early-stopping mechanism is used to control the searching process, and the specific method is as follows: and if the number of the jump connections in the cell exceeds two, stopping searching.

9. The automatic search method for neural network structure based on automatic machine learning of claim 3, wherein in step S34, the number of candidate operations in the second stage is 5, the number of unit cells is 11, the number of candidate operations in the third stage is 3, and the number of unit cells is 17.