WO2023071592A1

WO2023071592A1 - Network structure search method for ultra-large search space, system and medium

Info

Publication number: WO2023071592A1
Application number: PCT/CN2022/119120
Authority: WO
Inventors: 谭明奎; 国雍; 陈耀佛
Original assignee: 华南理工大学; 人工智能与数字经济广东省实验室(广州)
Priority date: 2021-10-27
Filing date: 2022-09-15
Publication date: 2023-05-04
Also published as: CN114065003A

Abstract

A network structure search method for an ultra-large search space, a system and a medium. The method comprises the following steps: constructing a target data set for training a neural network; determining a neural network search space for a target task; when gradually expanding the search space, searching for an efficient network structure by using a reinforcement learning algorithm; and according to the target data set, training the search network structure obtained by searching, so as to obtain a final search network structure. The means of gradually increasing search space can effectively reduce search difficulty, improve search efficiency and performance, and can be widely used in the field of artificial intelligence.

Description

面向超大搜索空间的网络结构搜索方法、***及介质Network structure search method, system and medium for super large search space

技术领域technical field

本发明涉及人工智能领域，尤其涉及一种面向超大搜索空间的网络结构搜索方法、***及介质。The invention relates to the field of artificial intelligence, in particular to a network structure search method, system and medium for a super large search space.

背景技术Background technique

近年来，深度神经网络已被广泛应用在不同的任务中。随着参数数量和网络层数的增加以及GPU的有效利用，神经网络模型在准确率和效率上都有了显著的提升。然而，目前大部分的神经网络都是人工设计的，设计过程依赖丰富的网络结构设计经验，带来高昂的设计成本。这使得深度神经网络模型难以应用到众多现实工程任务中。In recent years, deep neural networks have been widely used in different tasks. With the increase in the number of parameters and the number of network layers and the effective utilization of the GPU, the neural network model has significantly improved in accuracy and efficiency. However, most of the current neural networks are manually designed, and the design process relies on rich experience in network structure design, which brings high design costs. This makes it difficult for deep neural network models to be applied to many real-world engineering tasks.

为了解决上述问题，神经网络搜索尝试自动设计高效的神经网络架构。为了搜索到高效的网络架构，现有方法必须通过采样足够多的网络结构来探索搜索空间。然而，搜索空间往往非常大(往往会包含数十亿候选网络架构)，在有限的计算资源条件下，只能从搜索空间中采样极少部分的网络结构，极大的制约了搜索空间的探索效率。为了找到更好的网络结构，需尽可能的探索整个搜索空间，给定有限的网络结构采样次数，该问题对搜索过程中的采样准确度提出了非常高的要求。To address the above issues, neural network search attempts to automatically design efficient neural network architectures. In order to search for efficient network architectures, existing methods must explore the search space by sampling enough network structures. However, the search space is often very large (often containing billions of candidate network architectures), and under the condition of limited computing resources, only a small part of the network structure can be sampled from the search space, which greatly restricts the exploration of the search space. efficiency. In order to find a better network structure, it is necessary to explore the entire search space as much as possible. Given a limited number of network structure sampling times, this problem places very high requirements on the sampling accuracy during the search process.

发明内容Contents of the invention

为至少一定程度上解决现有技术中存在的技术问题之一，本发明的目的在于提供一种面向超大搜索空间的网络结构搜索方法、***及介质。In order to solve one of the technical problems in the prior art at least to a certain extent, the purpose of the present invention is to provide a network structure search method, system and medium for a super large search space.

本发明所采用的技术方案是：The technical scheme adopted in the present invention is:

一种面向超大搜索空间的网络结构搜索方法，包括以下步骤：A network structure search method for super large search space, comprising the following steps:

构建用于训练神经网络的目标数据集；Construct the target dataset for training the neural network;

确定用于目标任务的神经网络搜索空间；Determine the neural network search space for the target task;

在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构；In the process of gradually expanding the search space, a reinforcement learning algorithm is used to search for an efficient network structure;

根据目标数据集对搜索获得的搜索网络结构进行训练，获得最终的搜索网络结构。The search network structure obtained by searching is trained according to the target data set to obtain the final search network structure.

进一步，所述构建用于训练神经网络的目标数据集，包括：Further, the target data set for training the neural network is constructed, including:

从目标任务场景中收集图片，并对图片进行类别标注以构建成目标数据集；Collect pictures from the target task scene, and classify the pictures to build the target data set;

将标注后的目标数据集划分成训练集、验证集、测试集三个部分。Divide the labeled target data set into three parts: training set, verification set and test set.

进一步，所述确定用于目标任务的神经网络搜索空间，包括：Further, said determining the neural network search space for the target task includes:

将构成深度卷积神经网络模型的计算单元划分为标准计算单元和下采样计算单元；Divide the computing units constituting the deep convolutional neural network model into standard computing units and downsampling computing units;

设定计算单元的搜索空间。Set the search space of the computing unit.

进一步，所述在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构，包括：Further, in the process of gradually expanding the search space, the reinforcement learning algorithm is used to search for an efficient network structure, including:

A1、使用单层双向长短期记忆网络构建元控制器，用于生成网络架构α～π(α,θ)；其中，α是生成的网络结构，π是元控制器学习到的策略，θ是元控制器网络参数；A1. Use a single-layer bidirectional long-short-term memory network to construct a meta-controller to generate a network architecture α~π(α,θ); where α is the generated network structure, π is the strategy learned by the meta-controller, and θ is Meta-controller network parameters;

A2、构建一个面向图像识别任务的超网络模型，所述超网络模型由若干个计算单元堆叠获得；每个计算单元的输入特征与输出特征之间包含多个候选操作；A2. Construct a hypernetwork model oriented to image recognition tasks, the hypernetwork model is obtained by stacking several computing units; each computing unit contains multiple candidate operations between the input feature and the output feature;

A3、根据候选操作构建初始搜索空间，在搜索过程中逐步加入候选操作，在前一步搜索空间的基础上扩增得到新的搜索空间；整个搜索过程包括K个搜索阶段，每个阶段对应不同大小的搜索空间Ω _i； A3. Construct the initial search space according to the candidate operations, gradually add candidate operations during the search process, and expand the new search space on the basis of the previous search space; the whole search process includes K search stages, each stage corresponds to a different size The search space Ω _i ;

A4、使用元控制器在当前搜索空间中生成一个子网络架构α～π(α,θ)，并将子网络结构在超网络模型中对应的候选操作权重w _α激活；并在目标数据集训练超网络模型； A4. Use the meta-controller to generate a sub-network architecture α～π(α,θ) in the current search space, and activate the candidate operation weight w _α corresponding to the sub-network structure in the super-network model; and train on the target data set Hypernetwork model;

A5、在当前的搜索空间中使用元控制器生成一个子网络架构α～π(α,θ)，通过继承超网络模型中对应操作的权重得到子网络模型的权重w _α；在划分的验证数据集测试得到的性能指标R(α,w _α)并将其作为奖励值更新元控制器的权重θ； A5. Use the meta-controller to generate a sub-network architecture α～π(α,θ) in the current search space, and obtain the weight w _α of the sub-network model by inheriting the weight of the corresponding operation in the super-network model; in the divided verification data Set the performance index R(α,w _α ) obtained from the set test and use it as a reward value to update the weight θ of the meta-controller;

A6、重复步骤A3至步骤A5，直到K个候选操作均已加入搜索空间中。A6. Repeat step A3 to step A5 until all K candidate operations have been added to the search space.

进一步，所述根据目标数据集对搜索获得的搜索网络结构进行训练，获得最终的搜索网络结构，包括：Further, the search network structure obtained by searching is trained according to the target data set to obtain the final search network structure, including:

根据训练好的元控制器模型获取高性能的网络结构；Obtain a high-performance network structure based on the trained meta-controller model;

根据目标数据集，对获得的网络结构模型进行训练，获得最终的搜索网络结构模型。According to the target data set, the obtained network structure model is trained to obtain the final search network structure model.

进一步，所述对获得的网络结构进行训练，包括：Further, the training of the obtained network structure includes:

采用随机梯度下降算法对获得的网络结构模型进行训练，直至网络架构模型至收敛。The stochastic gradient descent algorithm is used to train the obtained network structure model until the network structure model converges.

进一步，所述步骤A3具体包括：Further, the step A3 specifically includes:

若如初始搜索空间Ω ₀尚未构建，从所有候选操作中随机选择一种候选操作构建初始搜索空间； If the initial search space Ω ₀ has not yet been constructed, a candidate operation is randomly selected from all candidate operations to construct the initial search space;

若当前搜索空间已经构建，从剩余的候选操作中随机选择一种候选操作加入搜索空间Ω _i-1中，以构建新的搜索空间Ω _i。 If the current search space has been constructed, a candidate operation is randomly selected from the remaining candidate operations and added to the search space Ω _i-1 to construct a new search space Ω _i .

本发明所采用的另一技术方案是：Another technical scheme adopted in the present invention is:

一种面向超大搜索空间的网络结构搜索***，包括：A network structure search system for a super large search space, including:

数据集构建模块，用于构建用于训练神经网络的目标数据集；Dataset building blocks for constructing target datasets for training neural networks;

搜索空间确定模块，用于确定用于目标任务的神经网络搜索空间；A search space determination module is used to determine the neural network search space for the target task;

空间搜索模块，用于在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构；The space search module is used to search for an efficient network structure using a reinforcement learning algorithm in the process of gradually expanding the search space;

模型训练模块，用于根据目标数据集对搜索获得的搜索网络结构进行训练，获得最终的搜索网络结构。The model training module is used to train the search network structure obtained by searching according to the target data set to obtain the final search network structure.

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行，使得所述至少一个处理器实现上所述方法。When the at least one program is executed by the at least one processor, the at least one processor implements the above method.

一种存储介质，其中存储有处理器可执行的程序，所述处理器可执行的程序在由处理器执行时用于执行如上所述方法。A storage medium stores a processor-executable program therein, and the processor-executable program is used to execute the above method when executed by a processor.

本发明的有益效果是：本发明采用的逐步增大搜索空间的方式能够有效地减低搜索难度，提高搜索效率和性能。The beneficial effects of the present invention are: the method of gradually increasing the search space adopted by the present invention can effectively reduce the search difficulty and improve the search efficiency and performance.

附图说明Description of drawings

为了更清楚地说明本发明实施例或者现有技术中的技术方案，下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍，应当理解的是，下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例，对于本领域的技术人员而言，在无需付出创造性劳动的前提下，还可以根据这些附图获取到其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following describes the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the accompanying drawings in the following introduction are only In order to clearly describe some embodiments of the technical solutions of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是本发明实施例中一种面向超大搜索空间的网络结构搜索方法的步骤流程图；Fig. 1 is a flow chart of the steps of a network structure search method for a super large search space in an embodiment of the present invention;

图2是本发明实施例中逐步扩张的网络结构搜索空间示意图；FIG. 2 is a schematic diagram of a gradually expanded network structure search space in an embodiment of the present invention;

图3是标准神经网络搜索方法与本发明实施例所提出的面向超大搜索空间的自动网络结构搜索方法的搜索方式对比示意图。Fig. 3 is a schematic diagram of the comparison of the search methods between the standard neural network search method and the automatic network structure search method for the super large search space proposed by the embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。对于以下实施例中的步骤编号，其仅为了便于阐述说明而设置，对步骤之间的顺序不做任何限定，实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. For the step numbers in the following embodiments, it is only set for the convenience of illustration and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art sexual adjustment.

在本发明的描述中，需要理解的是，涉及到方位描述，例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the orientation descriptions, such as up, down, front, back, left, right, etc. indicated orientations or positional relationships are based on the orientations or positional relationships shown in the drawings, and are only In order to facilitate the description of the present invention and simplify the description, it does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

在本发明的描述中，若干的含义是一个或者多个，多个的含义是两个以上，大于、小于、超过等理解为不包括本数，以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, several means one or more, and multiple means two or more. Greater than, less than, exceeding, etc. are understood as not including the original number, and above, below, within, etc. are understood as including the original number. If the description of the first and second is only for the purpose of distinguishing the technical features, it cannot be understood as indicating or implying the relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the order of the indicated technical features relation.

本发明的描述中，除非另有明确的限定，设置、安装、连接等词语应做广义理解，所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, unless otherwise clearly defined, words such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in the present invention in combination with the specific content of the technical solution.

本发明整体方案如图1所示，第一步需要构建目标数据集和设定目标任务的搜索空间。然后构建一个初始的相对较小的搜索空间并在其中进行搜索，通过逐步增加候选计算操作的方式逐步扩大搜索空间，并且利用之前学到的网络结构在更大的搜索空间中进行更加准确的采样，使得在有限的采样次数下更充分的探索搜索空间。这一方式能够降低搜索过程中的搜索难度，以找到更好的网络结构。最后将搜索得到的网络结构在目标数据集上训练至收敛，即可将该模型部署应用到目标任务中。下面将结合图1对图中每个步骤进行详细说明：The overall scheme of the present invention is shown in Fig. 1, the first step needs to construct the target data set and set the search space of the target task. Then construct an initial relatively small search space and search in it, gradually expand the search space by gradually increasing candidate calculation operations, and use the previously learned network structure to perform more accurate sampling in a larger search space , so that the search space can be more fully explored under a limited number of sampling times. This method can reduce the search difficulty in the search process to find a better network structure. Finally, the network structure obtained from the search is trained on the target data set until convergence, and the model can be deployed and applied to the target task. Each step in the figure will be described in detail below in conjunction with Figure 1:

S1：收集并标注用于训练神经网络的目标数据集。S1: Collect and label the target dataset for training the neural network.

S1-1：从目标任务场景中收集图片并对图片进行类别标注以构建成数据集；S1-1: Collect pictures from the target task scene and classify the pictures to build a data set;

S1-2：将标注后的数据集划分成训练集、验证集、测试集三个部分。S1-2: Divide the labeled data set into three parts: training set, verification set, and test set.

S2：设定用于目标任务的神经网络搜索空间。S2: Set the neural network search space for the target task.

S2-1：对于计算机视觉任务而言，可以将深度卷积神经网络模型的构成计算单元划分为两种类别：标准计算单元和下采样计算单元。这两类计算单元的搜索空间相同，但是标准计算单元会保持输入特征的空间分辨率，而下采样计算单元会将输出特征的空间分辨率降低为输入的一半。通过堆叠若干次计算单元即可构建成一个完整的卷积神经网络。S2-1: For computer vision tasks, the computing units of the deep convolutional neural network model can be divided into two categories: standard computing units and downsampling computing units. The search space of these two types of computing units is the same, but the standard computing unit maintains the spatial resolution of the input features, while the downsampling computing unit reduces the spatial resolution of the output features to half of the input. A complete convolutional neural network can be constructed by stacking computing units several times.

S2-2：设定计算单元的搜索空间：每个计算单元包含7个节点，其中有2个输入节点，4个中间节点，1个输出节点。两个节点之间可以有K＝8种不同的候选操作运算，包括3×3深度可分离卷积、5×5深度可分离卷积、3×3最大池化、3×3平均池化、3×3空洞卷积、5×5空洞卷积、跳跃操作和空操作等操作。每一个卷积运算之后都进行批归一化预算，并使用ReLU激活函数。S2-2: Set the search space of the computing unit: each computing unit contains 7 nodes, including 2 input nodes, 4 intermediate nodes, and 1 output node. There can be K=8 different candidate operations between two nodes, including 3×3 depth separable convolution, 5×5 depth separable convolution, 3×3 maximum pooling, 3×3 average pooling, Operations such as 3×3 hole convolution, 5×5 hole convolution, jump operation and empty operation. After each convolution operation, a batch normalization budget is performed and a ReLU activation function is used.

S3：使用强化学习算法在逐渐扩增搜索空间中搜索高效网络结构。S3: Search for an efficient network structure in a gradually expanding search space using a reinforcement learning algorithm.

S3-1：使用单层双向长短期记忆网络构建一个元控制器，用于生成网络架构α～π(α,θ)，其中α是网络结构，π是控制器学习到的策略。初始化元控制器网络参数θ；S3-1: Construct a meta-controller using a single-layer bidirectional long-short-term memory network to generate a network architecture α∼π(α,θ), where α is the network structure and π is the strategy learned by the controller. Initialize the meta-controller network parameters θ;

S3-2：构建一个面向图像识别任务的超网络模型，其中权重参数记为w。整个超网络模型由若干个计算单元堆叠得到，其中包括标准计算单元和下采样计算单元；如图2所示，每个计算单元(即图中的节点)输入特征与输出特征之间包含多个候选操作(即图中节点间连接线)，只有一个候选操作在网络模型的训练和测试阶段会被激活，并最终用于目标任务；S3-2: Construct a hypernetwork model for image recognition tasks, where the weight parameter is denoted as w. The entire hypernetwork model is obtained by stacking several computing units, including standard computing units and downsampling computing units; as shown in Figure 2, each computing unit (ie, a node in the graph) contains multiple Candidate operations (that is, connecting lines between nodes in the figure), only one candidate operation will be activated during the training and testing phases of the network model, and will eventually be used for the target task;

S3-3：整个搜索过程包括K个搜索阶段，每个阶段对应不同大小的搜索空间Ω _i。如初始搜索空间Ω ₀尚未构建，则从所有候选操作中随机选择一种候选操作构建初始搜索空间。因为无法训练没有参数的超网络模型，所以规定第一次添加的操作必须带参数(例如卷积)。如当前搜索空间已经构建，则从剩余的候选操作中随机选择一种候选操作加入搜索空间Ω _i-1中构建新的搜索空间Ω _i。这一过程示意图参见图2，其中节点间不同颜色的连接线表示不同的候选操作。在初始阶段中，整个搜索空间只有一种候选操作，搜索空间比较小，比较容易在该空间中找到性能优异的网络结构；随着搜索阶段的增加，不断地从剩余地候选操作中随机选择一种加入到搜索空间中构成新的搜索空间； S3-3: The whole search process includes K search stages, and each stage corresponds to a search space Ω _i of a different size. If the initial search space Ω ₀ has not been constructed, a candidate operation is randomly selected from all candidate operations to construct the initial search space. Because it is impossible to train a hypernetwork model without parameters, it is stipulated that the first added operation must have parameters (such as convolution). If the current search space has been constructed, a candidate operation is randomly selected from the remaining candidate operations and added to the search space Ω _i-1 to construct a new search space Ω _i . Refer to Figure 2 for a schematic diagram of this process, where the connecting lines of different colors between nodes represent different candidate operations. In the initial stage, there is only one candidate operation in the entire search space, and the search space is relatively small, so it is easier to find a network structure with excellent performance in this space; as the search stage increases, a random operation is continuously selected from the remaining candidate operations. species are added to the search space to form a new search space;

对新加入的候选操作权重进行预热：为了提高不同计算操作之间竞争的公平性，通过随机采样网络结构训练超级网络，等概率地训练每个计算操作。这样，包含新计算操作的候选网络结构可以获得与其他网络结构相当的性能。通过操作预热方式，搜索过程变得更加稳定，搜索性能也得到显著提升。具体而言，从搜索空间中随机采样一个子网络架构α～p(α,Ω _i)，其中p表示均匀分布，Ω _i表示第i阶段的搜索空间。该子网络模型与超网络模型对应的参数权重是共享的。在随机采样子网络结构之后，通过随机梯度下降算法在划分的训练集上训练生成的子网络模型权重w _α。 Preheat the newly added candidate operation weights: In order to improve the fairness of competition between different computing operations, the super network is trained by randomly sampling the network structure, and each computing operation is trained with equal probability. In this way, candidate network structures that incorporate new computational operations can achieve comparable performance to other network structures. By operating the warm-up method, the search process becomes more stable and the search performance is significantly improved. Specifically, a sub-network architecture α∼p(α,Ω _i ) is randomly sampled from the search space, where p represents a uniform distribution and Ω _i represents the search space of the i-th stage. The parameter weights corresponding to the sub-network model and the super-network model are shared. After randomly sampling the sub-network structure, the generated sub-network model weight w _α is trained on the divided training set by stochastic gradient descent algorithm.

S3-4：训练超网络模型：使用元控制器在当前搜索空间中生成一个子网络架构 α～π(α,θ)，并将其在超网络模型中对应的候选操作权重w _α激活。从划分的训练数据集中选取一个批次的样本数据，通过随机梯度下降算法训练生成的子网络模型。 S3-4: Train the supernetwork model: use the meta-controller to generate a subnetwork architecture α∼π(α,θ) in the current search space, and activate its corresponding candidate operation weight w _α in the supernetwork model. Select a batch of sample data from the divided training data set, and train the generated sub-network model through the stochastic gradient descent algorithm.

S3-5：训练元控制器：在当前的搜索空间中使用元控制器生成一个子网络架构α～π(α,θ)，通过继承超网络模型中对应操作的权重得到子网络模型的权重w _α；从划分的验证数据集中选取一个批次的样本数据，在该样本数据上测试所生成子网络模型在目标任务上的性能指标R(α,w _α)；将测试得到的性能指标R(α,w _α)作为奖励值，通过强化学习策略梯度算法更新元控制器的权重θ； S3-5: Train the meta-controller: use the meta-controller to generate a sub-network architecture α～π(α,θ) in the current search space, and obtain the weight w of the sub-network model by inheriting the weight of the corresponding operation in the super-network model _α ; select a batch of sample data from the divided verification data set, and test the performance index R(α,w _α ) of the generated sub-network model on the target task on the sample data; the performance index R( α, w _α ) as the reward value to update the weight θ of the meta-controller through the reinforcement learning policy gradient algorithm;

S3-6：重复S3-3步骤至S3-5步骤直到K个候选操作均已加入搜索空间中。S3-6: Repeat steps S3-3 to S3-5 until all K candidate operations have been added to the search space.

S4：推理网络结构并在目标数据集上训练该模型。S4: Infer the network structure and train the model on the target dataset.

S4-1：基于一个训练好的元控制器模型，可以推理出高性能的网络结构。给定K个候选操作，选取在最终阶段(即搜索空间最大的阶段)学习得到的策略π(α,θ)作为采样网络结构的最终策略。基于该策略采样10个网络结构，然后选择验证集上分类准确度最高的网络结构作为最终的搜索网络结构。S4-1: Based on a trained meta-controller model, a high-performance network structure can be inferred. Given K candidate operations, select the policy π(α,θ) learned in the final stage (i.e., the stage with the largest search space) as the final policy for sampling the network structure. Based on this strategy, 10 network structures are sampled, and then the network structure with the highest classification accuracy on the verification set is selected as the final search network structure.

S4-2：在划分的训练数据集上利用随机梯度下降算法训练搜索到的网络架构模型至收敛。S4-2: Use the stochastic gradient descent algorithm to train the searched network architecture model to convergence on the divided training data set.

综上所述，本发明实施例的方法与现有技术相比，具有如下有益效果：In summary, compared with the prior art, the method of the embodiment of the present invention has the following beneficial effects:

现有的神经网络搜索方法通常在固定的超大的搜索空间中直接进行搜索，因超大搜索空间带来巨大的搜索难度往往只能找到次优的网络结构。与现有方法不同，本方案采用的逐步增大搜索空间的方式能够有效地减低搜索难度，提高搜索效率和性能。如图3所示，本方案一旦在一个小的搜索空间中发现了一些好的网络结构，通过逐步扩大搜索空间，就更有可能找到一个与前一子空间中找到的最优的网络结构有较高相似度的候选子空间(灰色圈)。因此，在新的子空间中采样有较大可能找到较好的(至少与前一子空间最优网络结构具备相似性能)的网络结构，降低采样到非常差的网络结构而浪费采样机会，以提升采样的精准度。本方案在逐渐扩大的搜索空间中搜索，本质上是一个多阶段搜索过程，候选子空间会随着搜索过程自适应地演化。由于候选子空间在每个搜索阶段都比较小，该方法能够进行更精确的采样，实现在较大的搜索空间中找到高性能的网络结构。Existing neural network search methods usually search directly in a fixed super-large search space. Due to the huge search difficulty brought about by the super-large search space, they can only find suboptimal network structures. Different from the existing methods, the method of gradually increasing the search space adopted by this scheme can effectively reduce the search difficulty and improve the search efficiency and performance. As shown in Figure 3, once this scheme finds some good network structures in a small search space, by gradually expanding the search space, it is more likely to find a network structure that is similar to the optimal network structure found in the previous subspace. Candidate subspaces with higher similarity (gray circles). Therefore, it is more likely to find a better network structure (at least with similar performance to the optimal network structure of the previous subspace) when sampling in the new subspace, and reduce sampling to a very poor network structure and waste sampling opportunities to Improve sampling accuracy. This scheme searches in the gradually expanding search space, which is essentially a multi-stage search process, and the candidate subspace will evolve adaptively with the search process. Since the candidate subspace is relatively small in each search stage, this method can perform more accurate sampling and achieve high-performance network structures in a larger search space.

以下结合实验数据对本实施例方法的有益效果进行展现。The beneficial effects of the method of this embodiment are demonstrated below in combination with experimental data.

本实施例提出的面向超大搜索空间的自动网络结构搜索方法可以降低在初始阶段神经网络搜索的难度，随着搜索阶段的推进，本方案会逐步增大搜索空间，并且利用在上一阶段搜索到的结构辅助下一阶段的搜索。表1和表2分别展示了在CIFAR-10数据集和ImageNet数据集上与已有最好方法的对比结果。应用本方案之后，在两个常用的图像识别数据集上均能降低搜索成本并提升搜索性能。The automatic network structure search method for super-large search space proposed in this embodiment can reduce the difficulty of neural network search in the initial stage. As the search stage progresses, this solution will gradually increase the search space, and use the The structure of assisted the next stage of the search. Table 1 and Table 2 show the comparison results with the best existing methods on the CIFAR-10 dataset and the ImageNet dataset, respectively. After applying this scheme, the search cost can be reduced and the search performance can be improved on two commonly used image recognition datasets.

表1Table 1

表2Table 2

本实施例还提供一种面向超大搜索空间的网络结构搜索***，包括：This embodiment also provides a network structure search system oriented to a super large search space, including:

本实施例的一种面向超大搜索空间的网络结构搜索***，可执行本发明方法实施例所提供的一种面向超大搜索空间的网络结构搜索方法，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。A network structure search system oriented to a super-large search space in this embodiment can execute a network structure search method oriented to a super-large search space provided by the method embodiments of the present invention, and can perform any combination of implementation steps of the method embodiments. Corresponding functions and beneficial effects of the method.

至少一个处理器；at least one processor;

当所述至少一个程序被所述至少一个处理器执行，使得所述至少一个处理器实现图1所示的方法。When the at least one program is executed by the at least one processor, the at least one processor implements the method shown in FIG. 1 .

本实施例的一种面向超大搜索空间的网络结构搜索***，可执行本发明方法实施例所提供的一种面向超大搜索空间的网络结构搜索方法，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。A network structure search system oriented to a very large search space in this embodiment can execute a network structure search method oriented to a very large search space provided by the method embodiments of the present invention, and can perform any combination of implementation steps of the method embodiments, with Corresponding functions and beneficial effects of the method.

本申请实施例还公开了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行图1所示的方法。The embodiment of the present application also discloses a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method shown in FIG. 1 .

本实施例还提供了一种存储介质，存储有可执行本发明方法实施例所提供的一种面向超大搜索空间的网络结构搜索方法的指令或程序，当运行该指令或程序时，可执行方法实施例的任意组合实施步骤，具备该方法相应的功能和有益效果。This embodiment also provides a storage medium, which stores an instruction or program capable of executing a super-large search space-oriented network structure search method provided by the method embodiment of the present invention. When the instruction or program is executed, the method can be executed. Any combination of implementation steps in the embodiments has the corresponding functions and beneficial effects of the method.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that one or more of the described functions and/or features may be integrated into a single physical device and/or unless stated to the contrary. or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions and internal relationships of the various functional blocks in the devices disclosed herein, the actual implementation of the blocks will be within the ordinary skill of the engineer. Accordingly, those skilled in the art can implement the present invention set forth in the claims without undue experimentation using ordinary techniques. It is also to be understood that the particular concepts disclosed are illustrative only and are not intended to limit the scope of the invention which is to be determined by the appended claims and their full scope of equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行***、装置或设备(如基于计算机的***、包括处理器的***或其他可以从指令执行***、装置或设备取指令并执行指令的***)使用，或结合这些指令执行***、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行***、装置或设备或结合这些指令执行***、装置或设备而使用的装置。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment used. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.

计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行***执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

在本说明书的上述描述中，参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the above description of this specification, the description with reference to the terms "one embodiment/example", "another embodiment/example" or "certain embodiments/example" means that the description is described in conjunction with the embodiment or example. A particular feature, structure, material, or characteristic is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施方式，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于上述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present invention. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

一种面向超大搜索空间的网络结构搜索方法，其特征在于，包括以下步骤：A network structure search method oriented to an ultra-large search space, characterized in that it comprises the following steps:

构建用于训练神经网络的目标数据集；Construct the target dataset for training the neural network;

确定用于目标任务的神经网络搜索空间；Determine the neural network search space for the target task;

在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构；In the process of gradually expanding the search space, a reinforcement learning algorithm is used to search for an efficient network structure;

根据目标数据集对搜索获得的搜索网络结构进行训练，获得最终的搜索网络结构。The search network structure obtained by searching is trained according to the target data set to obtain the final search network structure.
根据权利要求1所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述构建用于训练神经网络的目标数据集，包括：A kind of network structure search method facing super large search space according to claim 1, it is characterized in that, described construction is used for training the target data set of neural network, comprises:

从目标任务场景中收集图片，并对图片进行类别标注以构建成目标数据集；Collect pictures from the target task scene, and classify the pictures to build the target data set;

将标注后的目标数据集划分成训练集、验证集、测试集三个部分。Divide the labeled target data set into three parts: training set, verification set and test set.
根据权利要求1所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述确定用于目标任务的神经网络搜索空间，包括：A network structure search method oriented to a super large search space according to claim 1, wherein said determining the neural network search space used for the target task comprises:

将构成深度卷积神经网络模型的计算单元划分为标准计算单元和下采样计算单元；Divide the computing units constituting the deep convolutional neural network model into standard computing units and downsampling computing units;

设定计算单元的搜索空间。Set the search space of the computing unit.
根据权利要求1所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构，包括：A network structure search method for a super large search space according to claim 1, characterized in that, in the process of gradually expanding the search space, using a reinforcement learning algorithm to search for an efficient network structure, including:

A1、使用单层双向长短期记忆网络构建元控制器，用于生成网络架构α～π(α,θ)；其中，α是生成的网络结构，π是元控制器学习到的策略，θ是元控制器网络的参数；A1. Use a single-layer bidirectional long-short-term memory network to construct a meta-controller to generate a network architecture α~π(α,θ); where α is the generated network structure, π is the strategy learned by the meta-controller, and θ is Parameters of the meta-controller network;

A2、构建一个面向图像识别任务的超网络模型，所述超网络模型由若干个计算单元堆叠获得；每个计算单元的输入特征与输出特征之间包含多个候选操作；A2. Construct a hypernetwork model oriented to image recognition tasks, the hypernetwork model is obtained by stacking several computing units; each computing unit contains multiple candidate operations between the input feature and the output feature;

A3、根据候选操作构建初始搜索空间，在搜索过程中逐步加入候选操作，在前一步搜索空间的基础上扩增得到新的搜索空间；整个搜索过程包括K个搜索阶段，每个阶段对应不同大小的搜索空间Ω _i； A3. Construct the initial search space according to the candidate operations, gradually add candidate operations during the search process, and expand the new search space on the basis of the previous search space; the whole search process includes K search stages, each stage corresponds to a different size The search space Ω _i ;

A4、使用元控制器在当前搜索空间中生成一个子网络架构α～π(α,θ)，并将子网络结构在超网络模型中对应的候选操作权重w _α激活；并在目标数据集训练超网络模型； A4. Use the meta-controller to generate a sub-network architecture α～π(α,θ) in the current search space, and activate the candidate operation weight w _α corresponding to the sub-network structure in the super-network model; and train on the target data set Hypernetwork model;

A5、在当前的搜索空间中使用元控制器生成一个子网络架构α～π(α,θ)，通过继承超网络模型中对应操作的权重得到子网络模型的权重w _α；在划分的验证数据集测试得到的性能指标R(α,w _α)并将其作为奖励值更新元控制器的权重θ； A5. Use the meta-controller to generate a sub-network architecture α～π(α,θ) in the current search space, and obtain the weight w _α of the sub-network model by inheriting the weight of the corresponding operation in the super-network model; in the divided verification data Set the performance index R(α,w _α ) obtained from the set test and use it as a reward value to update the weight θ of the meta-controller;

A6、重复步骤A3至步骤A5，直到K个候选操作均已加入搜索空间中。A6. Repeat step A3 to step A5 until all K candidate operations have been added to the search space.
根据权利要求1所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述根据目标数据集对搜索获得的搜索网络结构进行训练，获得最终的搜索网络结构，包括：A kind of network structure search method facing super large search space according to claim 1, it is characterized in that, the described search network structure obtained by searching according to the target data set is trained to obtain the final search network structure, comprising:

根据训练完成的元控制器模型获得高性能的网络结构；Obtain a high-performance network structure based on the trained meta-controller model;

根据目标数据集，对获得的网络结构进行训练，获得最终的搜索网络结构模型。According to the target data set, the obtained network structure is trained to obtain the final search network structure model.
根据权利要求5所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述对获得的网络结构进行训练，包括：A network structure search method oriented to a super large search space according to claim 5, wherein said training the obtained network structure comprises:

采用随机梯度下降算法对获得的网络结构模型进行训练，直至网络架构模型至收敛。The stochastic gradient descent algorithm is used to train the obtained network structure model until the network structure model converges.
根据权利要求4所述的一种面向超大搜索空间的网络结构搜索方法，其特征在于，所述步骤A3具体包括：A network structure search method oriented to an ultra-large search space according to claim 4, wherein said step A3 specifically includes:

若如初始搜索空间Ω ₀尚未构建，从所有候选操作中随机选择一种候选操作构建初始搜索空间； If the initial search space Ω ₀ has not yet been constructed, a candidate operation is randomly selected from all candidate operations to construct the initial search space;

若当前搜索空间已经构建，从剩余的候选操作中随机选择一种候选操作加入搜索空间Ω _i-1中，以构建新的搜索空间Ω _i。 If the current search space has been constructed, a candidate operation is randomly selected from the remaining candidate operations and added to the search space Ω _i-1 to construct a new search space Ω _i .
一种面向超大搜索空间的网络结构搜索***，其特征在于，包括：A network structure search system oriented to an ultra-large search space, characterized in that it includes:

数据集构建模块，用于构建用于训练神经网络的目标数据集；Dataset building blocks for constructing target datasets for training neural networks;

搜索空间确定模块，用于确定用于目标任务的神经网络搜索空间；A search space determination module is used to determine the neural network search space for the target task;

空间搜索模块，用于在逐步扩增搜索空间的过程中，采用强化学习算法搜索高效网络结构；The space search module is used to search for an efficient network structure using a reinforcement learning algorithm in the process of gradually expanding the search space;

模型训练模块，用于根据目标数据集对搜索获得的搜索网络结构模型进行训练，获得最终的搜索网络结构模型。The model training module is used to train the search network structure model obtained by searching according to the target data set to obtain the final search network structure model.
一种面向超大搜索空间的网络结构搜索***，其特征在于，包括：A network structure search system oriented to an ultra-large search space, characterized in that it includes:

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行，使得所述至少一个处理器实现权利要求1-7任一项所述方法。When the at least one program is executed by the at least one processor, the at least one processor implements the method according to any one of claims 1-7.
一种存储介质，其中存储有处理器可执行的程序，其特征在于，所述处理器可执行的程序在由处理器执行时用于执行如权利要求1-7任一项所述方法。A storage medium storing a processor-executable program therein, wherein the processor-executable program is used to execute the method according to any one of claims 1-7 when executed by a processor.