CN111602145A

CN111602145A - Optimization method of convolutional neural network and related product

Info

Publication number: CN111602145A
Application number: CN201880083507.4A
Authority: CN
Inventors: 赵睿哲
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2020-08-28
Also published as: WO2020087254A1

Abstract

A method of optimizing a convolutional neural network and related products, the method comprising: obtaining a pre-training model M; retraining a data set D of the pre-training model M in a specified field to obtain an initial model M₀For the initial model M₀Carrying out replacement layer operation; the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased; reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3; repeatedly executing the replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model. The method has the advantage of low cost.

Description

Optimization method of convolutional neural network and related product

Technical Field

The invention relates to the technical field of communication and artificial intelligence, in particular to an optimization method of a convolutional neural network and a related product.

Background

In recent years, as a machine learning model, a deep convolutional neural network has achieved excellent effects in the fields of computer vision and the like, and even exceeds the average level of human beings in partial tasks, such as image classification and recognition, go games and the like. Convolutional neural networks typically comprise a plurality of convolutional layers interspersed with pooling layers, linear rectifying layers, etc., with one or more fully-connected layers at the top of the network, and a loss function layer at the top for training.

The transfer learning is a development and training method of a machine learning model, and aims to transfer a model M trained in a field A to a field B at low cost through methods such as retraining and the like. The application of the transfer learning technology in the deep convolutional neural network is wide, but the training time of the network is long and the cost is high.

Disclosure of Invention

The embodiment of the invention provides an optimization method of a convolutional neural network and a related product, can simply retrain a trained model, can be applied to the target field, and has the advantage of reducing the cost.

In a first aspect, an embodiment of the present invention provides a method for optimizing a convolutional neural network, where the method includes the following steps:

obtaining a pre-training model M;

retraining a data set D of the pre-training model M in a specified field to obtain an initial model M₀For the initial model M₀Carrying out replacement layer operation;

the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased; reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; meterCalculating a loss value of the third intermediate model M3;

repeatedly executing the replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.

Optionally, the initial model M is determined based on a bipartite graph maximum matching algorithm₀The adaptation of the standard convolutional layer e to be replaced by the high-efficiency convolutional layer specifically comprises:

from an initial model M₀Wherein finding a group convolutional layer containing Ng groups is such that intra-layer connections have a minimum change in importance;

the importance is the L2 norm of all weights in each connection;

optionally, the loss value includes:

wherein Lw is a loss value.

In a second aspect, an apparatus for optimizing a convolutional neural network is provided, the apparatus comprising:

an obtaining unit, configured to obtain a pre-training model M;

a training unit for retraining the data set D of the pre-training model M in the designated field to obtain an initial model M₀；

A replacement unit for replacing the initial model M₀Carrying out replacement layer operation; the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The standard convolution layer e is suitable to be replaced by the high-efficiency convolution layer to determine the standard convolution layere first intermediate model M substituted with efficient convolutional layer₁The effect is increased; reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3;

a selecting unit, configured to control the replacing unit to repeatedly perform a replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.

Optionally, the replacement unit is specifically configured to replace the initial model M with the new model M₀Wherein finding a group convolutional layer containing Ng groups is such that intra-layer connections have a minimum change in importance;

the importance is the L2 norm of all weights in each connection;

optionally, the loss value includes:

wherein Lw is a loss value.

In a third aspect, a computer-readable storage medium is provided, which stores a program for electronic data exchange, wherein the program causes a terminal to execute the method provided in the first aspect.

The embodiment of the invention has the following beneficial effects:

it can be seen that the technical solution of the present application provides a completely new solution for optimizing the convolutional neural network by replacing the convolutional layer. The prior art has difficulty in selecting which convolutional layers need to be replaced and training the replaced model. Optimization schemes that are not based on layer replacement tend to use significant GPU computing resources and training times are often long. By using the scheme, an optimized convolutional neural network model can be obtained within a few hours on the premise of only using one NVidia Titan Xp GPU, so that the time is saved, the efficiency is improved, and the cost is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an optimization method of a convolutional neural network provided in the present application.

Fig. 2 is a schematic diagram of initialization of parameters in a replacement layer provided by the present application.

Fig. 3 is a schematic structural diagram of an optimization apparatus of a convolutional neural network provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of the invention and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The application provides an optimization method of a convolutional neural network, which is based on a convolutional neural network of transfer learning and convolutional layer replacement. The optimization method aims to reduce the resource occupation and the calculation speed of the convolutional neural network model on a specific field D (also called a target field) on the premise of not losing task performance as much as possible. The method accepts inputs of a pre-trained deep convolutional neural network model and a data set of a target field, and outputs of the pre-trained deep convolutional neural network model and the data set of the target field are optimized convolutional neural network models which are trained on the data set of the target field and are subjected to layer replacement, wherein the optimized convolutional neural network models can use the target field D.

As shown in FIG. 1, the input of the method is a pre-training model

I.e., a model that is pre-trained on a large data set, can solve a general problem. As shown in fig. 1, the optimization method includes the following steps:

s101, obtaining a pre-training model M;

step S102, retraining a data set D of the pre-training model M in the designated field to obtain an initial model M₀For the initial model M₀Carrying out replacement layer operation; the replacement layer operation includes the following steps S103 to S106;

step S103, determining an initial model M based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased;

the core problem that needs to be solved by the present application is two: how to select the standard convolutional layers and replacement targets that need to be replaced, and how to train the layer-replaced model on the target dataset.

Because the deep convolutional neural network model often contains dozens of convolutional layers and the selection of replacement is very various, a serious "combinatorial explosion" problem (combinatorial explosion) occurs when an enumeration algorithm is adopted, and the enumeration algorithm greatly increases the overhead, so that the efficiency is low.

The technical scheme of the application uses a method based on a bipartite graph maximum matching algorithm (maximum bipartite matching) to determine which standard convolutional layers are suitable to be replaced by high-efficiency convolutional layers and effect gains brought by the replacement. The problem to be solved by the bipartite graph maximum matching algorithm can be formally described as formula (1).

In the formula, N_gIndicates the number of group convolution in the replacement target, L is the number of layers in the whole network, C_lThe number of channels for inputting data at the l-th layer,

refers to the importance of the connection between the c-th input channel and the f-th output channel of the l-th layer,

it means whether the connection between the c-th input channel and the f-th output channel of the l-th layer should be deleted.

This formula describes an optimization problem, the goal of which is to find the goal of a layer replacement, i.e., a layer comprising N_gGroup convolution layer of group (when N_gEqual to the number of channels, the set of convolutional layers is then a depth separable convolutional layer) so that the change in importance of intra-layer connections is minimal, and the measure of importance is given by equation (2) as the L2 norm of all weights in each connection.

In the formula, the first step is that,

the meaning of (a) is not changed,

finger refers to the connection weight between the c-th input channel and the f-th output channel of the l-th layer,

then the kth element in the weight is referred to.

S104, reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2;

s105, initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3;

step S106, calculating a loss value of the third intermediate model M3;

the calculation method of the loss value may include:

wherein the content of the first and second substances,

the loss value is obtained by summing the loss values of all L layers. The loss value for each layer is taken as a weighted average of two terms: l2 norm of all weights (first term) and L2 norm of the remaining weights after layer replacement (second term).

The value 0 or 1 is taken to determine whether the kth connection between the c-th input channel and the f-th output channel of the l-th layer should be deleted. Lambda and lambda_gThe weight of the weighted average of the two terms.

Step S107, repeatedly executing the replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.

And initializing the layer-replaced model by using the connection of maximum matching of the bipartite graph, wherein the initialized parameters are the original parameters in the pre-trained model. Prior to initialization, the method may additionally perform a parameter reshaping (regularization) to ensure that the initialized model can be trained as soon as possible. This patent also rearranges the channel order of the output results by adding a point convolution layer (position convolution) after the replaced convolution layer. Referring to fig. 2, fig. 2 is a schematic diagram illustrating initialization of parameters in a replacement layer.

The technical scheme of the application provides a brand-new scheme for optimizing the convolutional neural network by replacing the convolutional layer. The prior art has difficulty in selecting which convolutional layers need to be replaced and training the replaced model. Optimization schemes that are not based on layer replacement tend to use significant GPU computing resources and training times are often long. By using the scheme, an optimized convolutional neural network model can be obtained within a few hours on the premise of only using one NVidia Titan Xp GPU.

Referring to fig. 3, fig. 3 provides an apparatus for optimizing a convolutional neural network, the apparatus including:

an obtaining unit 301, configured to obtain a pre-training model M;

a training unit 302, configured to retrain the data set D of the pre-training model M in the designated field to obtain an initial model M₀；

A replacement unit 303 for replacing the initial model M₀Carrying out replacement layer operation; the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased; reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3;

a selecting unit 304, configured to control the replacing unit to repeatedly perform a replacing layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.

Embodiments of the present invention also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the optimization methods of a convolutional neural network as described in the above method embodiments.

Embodiments of the present invention also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the methods for optimization of convolutional neural networks as set forth in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules illustrated are not necessarily required to practice the invention.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above embodiments of the present invention are described in detail, and the principle and the implementation of the present invention are explained by applying specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

A method for optimizing a convolutional neural network, the method comprising the steps of:

obtaining a pre-training model M;

retraining a data set D of the pre-training model M in a specified field to obtain an initial model M₀For the initial model M₀Carrying out replacement layer operation;

the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased; parameterization of the first intermediate model M1Reforming the line to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3;

repeatedly executing the replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.
The method of claim 1, wherein the initial model M is determined based on a bipartite graph maximum matching algorithm₀The adaptation of the standard convolutional layer e to be replaced by the high-efficiency convolutional layer specifically comprises:

from an initial model M₀Wherein finding a group convolutional layer containing Ng groups is such that intra-layer connections have a minimum change in importance;

the importance is the L2 norm of all weights in each connection;
the method of claim 1 or 2, wherein the loss value comprises:

wherein Lw is a loss value.
An apparatus for optimizing a convolutional neural network, the apparatus comprising:

an obtaining unit, configured to obtain a pre-training model M;

a training unit for specifying the pre-training model MRetraining a data set D of the domain to obtain an initial model M₀；

A replacement unit for replacing the initial model M₀Carrying out replacement layer operation; the replacement layer operations include: initial model M is determined based on bipartite graph maximum matching algorithm₀The middle convolutional layer e is suitable to be replaced by the high-efficiency convolutional layer, and the first intermediate model M for replacing the standard convolutional layer e by the high-efficiency convolutional layer is determined₁The effect is increased; reforming the parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3;

a selecting unit, configured to control the replacing unit to repeatedly perform a replacement layer operation to obtain a plurality of third intermediate models M3 and a plurality of loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.
The apparatus of claim 4,

the replacement unit, in particular for use in removing the initial model M₀Wherein finding a group convolutional layer containing Ng groups is such that intra-layer connections have a minimum change in importance;

the importance is the L2 norm of all weights in each connection;
the apparatus of claim 4 or 5, wherein the loss value comprises:

wherein Lw is a loss value.
A computer-readable storage medium storing a program for electronic data exchange, wherein the program causes a terminal to perform the method as provided in any one of claims 1-3.
A computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform the method as provided in any one of claims 1 to 3.