CN109359727B - Method, device and equipment for determining structure of neural network and readable medium - Google Patents

Method, device and equipment for determining structure of neural network and readable medium Download PDF

Info

Publication number
CN109359727B
CN109359727B CN201811494899.1A CN201811494899A CN109359727B CN 109359727 B CN109359727 B CN 109359727B CN 201811494899 A CN201811494899 A CN 201811494899A CN 109359727 B CN109359727 B CN 109359727B
Authority
CN
China
Prior art keywords
network structure
current network
function value
objective function
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811494899.1A
Other languages
Chinese (zh)
Other versions
CN109359727A (en
Inventor
胡耀全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811494899.1A priority Critical patent/CN109359727B/en
Publication of CN109359727A publication Critical patent/CN109359727A/en
Application granted granted Critical
Publication of CN109359727B publication Critical patent/CN109359727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Telephonic Communication Services (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a method, a device, equipment and a readable medium for determining the structure of a neural network, which are used for image processing. The method comprises the following steps: sampling a current network structure through a sampler; converting the network parameters of the current network structure and the format of input data into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and the objective function value of the current network structure is calculated according to the operation duration of the current network structure; adjusting parameters of the sampler according to the objective function value; and returning to execute the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjustment times reach a time threshold value. The embodiment of the disclosure considers that the computing capability of the terminal is limited, and can save the running time and the computing resources of the terminal in the image processing.

Description

Method, device and equipment for determining structure of neural network and readable medium
Technical Field
The disclosed embodiments relate to computer vision technologies, and in particular, to a method, an apparatus, a device, and a readable medium for determining a structure of a neural network.
Background
With the development of computer vision, data such as images and sounds can be processed through a neural network, for example, object detection, object tracking, segmentation and classification can be performed on objects in the images.
With the increase of user requirements and the development of terminal technologies, higher requirements are put on the accuracy and speed of data processing, and a neural network with better processing effect is required. In the prior art, neural networks such as relatively mature RCNN, TOLO, SSD are mostly used for image processing, but the inventor finds that these mature neural networks are not suitable for processing all data in the research process, for example, when some images are processed, these neural networks have poor processing effect, and the required computing resources are large, the running time is long, and the application in terminals with limited computing capability is inconvenient.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, equipment and a readable medium for determining a structure of a neural network, so as to realize automatic search of a network structure, and save the running time and computing resources of a terminal in image processing.
In a first aspect, an embodiment of the present disclosure provides a method for determining a structure of a neural network, which is used for image processing, and includes:
sampling a current network structure through a sampler;
converting the network parameters of the current network structure and the format of input data into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and the objective function value of the current network structure is calculated according to the operation duration of the current network structure;
adjusting parameters of the sampler according to the objective function value;
and returning to execute the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjustment times reach a time threshold value.
In a second aspect, an embodiment of the present disclosure further provides a structure determining apparatus of a neural network, for image processing, including:
the sampling module is used for sampling the current network structure through the sampler;
the computing module is used for converting the network parameters of the current network structure and the format of the input data into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and the objective function value of the current network structure is calculated according to the operation duration of the current network structure;
the adjusting module is used for adjusting the parameters of the sampler according to the objective function values;
and the returning module is used for returning and executing the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjusting times reach a time threshold value.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
one or more processing devices;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processing devices are caused to implement the method for determining a structure of a neural network according to any of the embodiments.
In a fourth aspect, the disclosed embodiments also provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements the method for determining a structure of a neural network according to any one of the embodiments.
In the embodiment of the disclosure, a current network structure is sampled by a sampler, and the network parameters of the current network structure and the format of input data are converted into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and an objective function value of the current network structure is calculated according to the operation duration of the current network structure, so that the quality of the network structure is represented through the objective function value; the parameters of the sampler are adjusted according to the objective function value, the operation of sampling the current network structure through the sampler is executed in a return mode until the objective function value reaches a preset function value and/or the adjustment times reach a time threshold value, so that automatic search of the network structure is realized, and the parameters of the sampler are adjusted repeatedly, so that the sampler can sample the network structure with higher quality, and a better result can be obtained in subsequent use. The embodiment of the disclosure converts the formats of the network parameters and the input data of the current network structure into fixed points, and converts the fixed points into the fixed points in a unified manner, so that the applicability of the neural network for image processing can be improved, the problem that the processing effect is poor when some images are processed is avoided, and the problems that the existing formats of most network parameters and input data are floating point type, so that the required computing resources are large and the running time is long are solved.
Drawings
Fig. 1 is a flowchart of a method for determining a structure of a neural network according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for determining a structure of a neural network according to a second embodiment of the present disclosure;
fig. 3a is a flowchart of a method for determining a structure of a neural network according to a third embodiment of the present disclosure;
fig. 3b is a schematic diagram of predefined network layer coding provided in the third embodiment of the present disclosure;
fig. 3c is a schematic structural diagram of a sampler provided in the third embodiment of the present disclosure;
fig. 3d is a schematic diagram of a network structure provided in the third embodiment of the present disclosure;
fig. 3e is a schematic diagram of another network structure provided in the third embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a structure determining apparatus of a neural network according to a fourth embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only some of the structures relevant to the present disclosure are shown in the drawings, not all of them. In the following embodiments, optional features and examples are provided in each embodiment, and various features described in the embodiments may be combined to form a plurality of alternatives, and each numbered embodiment should not be regarded as only one technical solution.
Example one
Fig. 1 is a flowchart of a method for determining a structure of a neural network according to an embodiment of the present disclosure, where the present embodiment is applicable to a case of determining a structure of a neural network, and the method may be performed by a device for determining a structure of a neural network, where the device may be formed by hardware and/or software and integrated in an electronic device, and the electronic device may be a server or a terminal. With reference to fig. 1, the method provided by the embodiment of the present disclosure specifically includes the following operations:
and S110, sampling the current network structure through a sampler. Execution continues with S120.
The sampler is used for sampling out the network structure of the neural network. Each run of the sampler may sample at least one current network structure.
And S120, calculating an objective function value of the current network structure. Execution continues with S130.
The objective function value is used to characterize the quality of the current network structure. Taking the recognition model as an example, the higher the objective function value is, the more accurate the recognition result of the recognition model is. Taking the classification model as an example, the higher the objective function value, the greater the confidence of the classification model.
Optionally, an objective function is constructed by at least one element, and an objective function value of the current network structure is calculated. The elements for constructing the objective function are determined according to the actual requirements of the network structure, for example, if the network structure needs to operate at a terminal and requires a small occupied space, the elements include the occupied space of the network structure; for another example, where a network structure is required to have a high accuracy, the element includes the accuracy of the network structure.
S130, judging whether the objective function value reaches a preset function value and/or whether the adjusting times reach a time threshold value. If yes, jumping to S140; if not, go to S150.
The present embodiment adopts at least one of the two factors, i.e., the preset function value and the number threshold, as the cutoff condition of the loop. Alternatively, only one of the factors may be used for the determination, and the other factor may be discarded. For example, whether the objective function value reaches a preset function value or whether the adjustment number reaches a number threshold is determined. Optionally, two factors may be combined for determination, for example, it is determined whether the objective function value reaches a preset function value and whether the adjustment number reaches a number threshold, and if one of the objective function value and the adjustment number is determined to be yes, the process goes to S140; if the two factors are judged to be negative, namely the objective function value does not reach the preset function value and the adjusting times does not reach the times threshold value, the S150 is skipped.
In one embodiment, if the objective function value reaches the preset function value, it indicates that the current network structure meets the actual requirement, and then the current network structure is determined as the final network structure for subsequent use. On the contrary, if the target function does not reach the preset function value, the current network structure does not meet the actual requirement, the parameters of the sampler can be adjusted according to the target function value, and the next network structure is continuously sampled; or whether the adjustment times reach the time threshold value can be continuously judged. This is because, in some cases, after many times of adjustment, the objective function value still cannot reach the preset function value, and in order to save time and computational resources, if the number of times of adjustment reaches the number threshold, the current network structure is determined as the final network structure. And if the adjusting times do not reach the times threshold value, adjusting the parameters of the sampler according to the objective function value, and continuously sampling the next network structure.
And S140, determining the current network structure and finishing the operation.
And S150, adjusting the parameters of the sampler according to the objective function value. Return to execution S110.
Optionally, if the preset function value is a smaller value, that is, the smaller the objective function value is, the higher the quality of the current network structure is, the parameter of the sampler is adjusted by minimizing the objective function value; if the preset function value is a larger value, that is, the larger the objective function value is, the higher the quality of the current network structure is, the parameter of the sampler is adjusted by maximizing the objective function value. Alternatively, the parameter adjustment method includes, but is not limited to, a strategic gradient algorithm, a gradient descent method, and the like.
By adjusting the parameters of the sampler, the objective function value of the next network structure is closer to the preset function value than that of the current network structure, so that the network structure with higher quality can be sampled.
In the embodiment of the disclosure, a sampler is used for sampling a current network structure, and an objective function value of the current network structure is calculated, so that the quality of the network structure is represented by the objective function value; the parameters of the sampler are adjusted according to the objective function value, the operation of sampling the current network structure through the sampler is executed in a return mode until the objective function value reaches a preset function value and/or the adjustment times reach a time threshold value, so that automatic search of the network structure is achieved, and the parameters of the sampler are adjusted repeatedly, so that the sampler can sample the network structure with higher quality, and better results can be obtained in subsequent use. In a specific application scenario, the embodiment of the present disclosure is not limited to a fixed network structure, but obtains a new network structure with higher quality through an objective function value, and is applicable to data processing in almost various scenarios.
Example two
Fig. 2 is a flowchart of a method for determining a structure of a neural network according to a second embodiment of the present disclosure. In this embodiment, each optional implementation manner of the above embodiment is further optimized, and optionally, the operation "calculating an objective function value of the current network structure" is refined to "calculating the accuracy and/or the running duration of the current network structure; and obtaining an objective function value' according to the accuracy and/or the running time of the current network structure, thereby sampling the network structure with higher accuracy or shorter running time. Optionally, after the operation "adjusting the parameters of the sampler according to the objective function value", additionally "initializing the network parameters of the current network structure; and calculating the network parameters of the current network structure through the data set, thereby not only obtaining the network structure, but also obtaining proper network parameters, further obtaining a final neural network, and directly applying the final neural network to subsequent data processing. With reference to fig. 2, the method provided in this embodiment specifically includes the following operations:
and S210, sampling the current network structure through a sampler.
And S220, calculating the accuracy and/or the running time of the current network structure.
Optionally, the accuracy, or the running time, or the accuracy and the running time of the current network structure are calculated. The accuracy refers to the accuracy of the verification set, and the run-time length refers to the run-time length of the target device, such as a terminal.
For the accuracy of calculating the current network structure, firstly, training the network parameters of the current network structure by adopting a training set; next, the accuracy of the network structure is calculated using the validation set. Optionally, the entire training set is divided into two parts, one part, e.g. 70%, being used as the training set and the other part, e.g. 30%, being used as the validation set. Optionally, for the case that there are more than two network structures, the training set is used to train the network parameters of each network structure. Then, the accuracy of each network structure is calculated by using the verification set, for example, the accuracy of three network structures is respectively 80%, 85% and 90%, and in some application scenarios, the accuracy calculated by the verification set may also be referred to as a score. Further, the highest accuracy is selected from more than two accuracies, or the accuracy for calculating the objective function value is obtained by averaging the more than two accuracies. If the highest accuracy is selected, only the network structure and the corresponding network parameters corresponding to the highest accuracy are reserved, and other network structures and the corresponding network parameters are deleted.
In some cases, the network parameters obtained from the training set cannot achieve a high accuracy, and further optimization is required. Optionally, for each current network structure, before the verification set is used to calculate the accuracy of the current network structure, firstly, the verification set is used to calculate a loss value (loss) of the current network structure, and if the network structure is a classification model, the loss value is obtained through a cross entropy between an output value and a reference value; and if the network structure is a regression model, obtaining the loss value through the Euclidean distance between the output value and the reference value. Then, adjusting the network parameters according to the loss values, specifically, adjusting the network parameters by minimizing the loss values so that the loss values are lower than the loss threshold; and returning to executing the calculation operation of the loss value until the loss value is lower than the loss threshold value.
For calculating the operation duration of the current network structure, optionally, the current network structure is operated at the terminal, and the operation duration of the current network structure is obtained. Alternatively, the network parameters of the current network fabric may be initial values or default values. And selecting any current network structure or the current network structure with the highest accuracy as well as the corresponding network parameters and input data to provide to the terminal so that the terminal can process the input data through the current network structure and the network parameters.
In most cases, the formats of the network parameters and the input data are floating point type, which requires larger computing resources and longer running time, and in order to save the running time and the computing resources of the terminal in consideration of the limited operational capability of the terminal, the formats of the network parameters and the input data of the current network structure are converted into fixed points before the current network structure is run by the terminal and the running time of the current network structure is obtained; the network structure, the network parameters in the fixed-point format, and the input data are provided to the terminal.
Specifically, input data, output data and network parameters of each layer in the current network structure are obtained, and the maximum data range-2 including the input data, the output data and the network parameters of each layer is determined-fl~+2flAnd obtaining the index fl. Dividing the network parameters and input data of the current network architecture by 2flAnd rounding the obtained quotient, converting the quotient into a fixed-point format and providing the fixed-point format to the terminal. Optionally, the terminal selects the shortest operation duration or the average operation duration comprehensively as the operation durations for calculating the objective function value after the current network structure is respectively repeated for multiple times.
Optionally, training the network parameters of the current network structure by adopting a training set for calculating the accuracy and the running time of the current network structure; calculating the accuracy of the network structure by adopting a verification set; and operating the current network structure at the terminal and acquiring the operation duration of the current network structure. For details, the above description of "calculating the accuracy of the current network structure" and "calculating the operation duration of the current network structure" is described. The difference from the above description is that when the terminal runs the current network structure, the network parameters of the current network structure are obtained by using the training set, or are continuously adjusted according to the loss value of the network structure. The network parameters obtained through the training set are generally floating point type, and before the terminal operates the current network structure and obtains the operation duration of the current network structure, the network parameters of the current network structure and the format of the input data are converted into fixed points; the network structure, the network parameters in the fixed-point format, and the input data are provided to the terminal.
And S230, obtaining an objective function value according to the accuracy and/or the running time of the current network structure.
In an alternative embodiment, the inverse of the running duration or the accuracy of the current network structure is directly used as the objective function value. The shorter the running time of the current network structure is, or the higher the accuracy is, the higher the objective function value is. And it is further desirable to adjust the parameters of the sampler by maximizing the value of the objective function. Otherwise, the inverse of the accuracy of the current network structure or the operation duration is directly used as the objective function value, and the parameters of the sampler need to be adjusted by minimizing the objective function value.
In another alternative, the objective function value q (m) of the current network structure m is calculated according to formula (1).
Figure GDA0003326986880000101
Wherein acc (m) is the accuracy of the current network structure m, T (m) is the operation duration of the current network structure m, r is a preset index, and T is a constant, which represents the operation duration threshold. Optionally, the value of r is detailed in formula (2).
Figure GDA0003326986880000102
It can be seen that the smaller t (m), the larger acc (m), and the larger q (m), the more parameters of the sampler need to be adjusted by maximizing the value of the objective function. If r is in the form of a piecewise function, the curve of Q (m) along with t (m) is not smooth enough, and r is set to be a constant value. After a plurality of tests, when r is-0.07, the speed of determining the network structure is high, the accuracy is high, and the running time is short.
S240, judging whether the target function value reaches a preset function value and/or whether the adjustment times reach a time threshold value. If yes, jumping to S250; if not, go to S260.
S250, initializing the network parameters of the current network structure; execution continues with S251.
And S251, calculating the network parameters of the current network structure through the data set.
The validation set and the training set described in S220 are divided from the whole training set, and the purpose is to calculate the accuracy and loss value of the network structure, so as to select a better network structure, and therefore, the requirements on the validation set and the training set are not high. However, after determining the better network structure, it is necessary to calculate more suitable network parameters again in order to improve the network accuracy. Thus, the network parameters are initialized, e.g., set to initial values. Then, training the network parameters of the current network structure by adopting the whole training set; and then, calculating the accuracy of the network structure by adopting a verification set, wherein the verification set is formed by sample data and tags in the current application scene so as to verify the accuracy of the neural network in the current application scene.
And S260, adjusting the parameters of the sampler according to the objective function value, and returning to execute S210.
In the embodiment of the disclosure, the accuracy and/or the running time of the current network structure is calculated; and obtaining the objective function value according to the accuracy and/or the operation time of the current network structure, thereby sampling the network structure with higher accuracy or shorter operation time. Initializing the network parameters of the current network structure; and calculating the network parameters of the current network structure through the data set, thereby not only obtaining the network structure, but also obtaining the appropriate network parameters, further obtaining the final neural network, directly applying the final neural network to the subsequent data processing, and improving the accuracy of the neural network.
EXAMPLE III
Fig. 3a is a flowchart of a method for determining a structure of a neural network according to a third embodiment of the present disclosure. In this embodiment, each optional implementation of the foregoing embodiments is further optimized, and optionally, "sampling out the current network structure by using a sampler" is optimized to "inputting a predefined network layer code into the sampler, so as to obtain a current network structure code; the method for sampling the network structure is provided. With reference to fig. 3a, the method provided in this embodiment specifically includes the following operations:
and S310, inputting the predefined network layer code into a sampler to obtain the current network structure code.
A network element comprises at least one network layer, such as a convolutional layer, a pooling layer, and a connection layer. If the stacking times of the network units are N, the N network units are connected in sequence, and the connection relations of the network layers included in the N network units and the network layers are the same. N network elements are connected in series to form a block, some network structures include only one block, some network structures include at least two blocks, and each block includes a plurality of network elements connected in series, and the possible blocks include different numbers of network elements and different network layers, and in summary, the current network structure includes: the network layer in the network unit, the connection relation of the network layer and the stacking times of the network unit.
In this embodiment, the network structure is sampled by a network layer encoder and a sampler. In one example, fig. 3b shows predefined network layer codes, including 1 × 1 convolutional layer code, 3 × 3 convolutional layer code, 5 × 5 convolutional layer code, pooling layer code, connection layer code, and the number of times of stacking N of network units (cells), etc., where N is a preset value, e.g., 3, 5, etc.
And inputting the predefined network layer codes into the value sampler to obtain a plurality of interconnected network structure subcodes so as to form the current network structure code. Wherein the current network structure encoding comprises: the coding of the network layer in the network element, the connection relation of the network layer and the stacking times of the network element.
As shown in fig. 3c, the sampler comprises a plurality of series-connected long-short term memory LSTM networks, and optionally, the parameters of each LSTM network (i.e. the parameters of the sampler) may be the same or different, and the output end of each LSTM network is connected to the output layer. Based on the method, firstly, predefined network layer codes are input into a first LSTM network, and each network structure sub-code is obtained from an output layer connected with each LSTM network; optionally, the output layer is a softmax layer for selecting a largest network structure sub-code from the output content. With reference to fig. 3c, the operation result a is obtained in the first LSTM network of the network layer coding input value shown in fig. 3b1=[0.9 0.2 0.3 0.5 0.1 0]TThe largest network structure subcode 0.9, i.e. the 1 x 1 convolutional layer code, is selected via the output layer of the first LSTM connection. Then A is mixed1Inputting to a second LSTM network to obtain an operation result A2=[0 0.8 0.3 0.5 0.1 0]TThe largest network fabric subcode 0.8, i.e. the 3 x 3 convolutional layer code, is selected via the output layer of the second LSTM connection. Then A is mixed2Inputting the data into a third LSTM network to obtain an operation result A3=[0 0.8 0.3 0.5 0.1 0.9]TThe largest network structure sub-code 0.9 is selected by the third LSTM connected output layer, i.e. the number of stacks is such that the network unit formed by the first 1 × 1 convolutional layer code and the 3 × 3 convolutional layer code is stacked N times, e.g. 2 times. Then, A is added3Inputting the data into a fourth LSTM network to obtain an operation result A4=[0 0.8 0.3 0.5 0.1 0.1]TThe largest network fabric sub-code 0.8, i.e. the 3 x 3 convolutional layer code, is selected via the output layer of the fourth LSTM connection. Then, A is added4Inputting the result into a fifth LSTM network to obtain an operation result A5=[0 0.6 0.3 0.5 0.1 0.9]TSelecting the largest network structure subcode 0.9, namely the stacking times, through the output layer of the fifth LSTM connection, and enabling the first network structure subcode after the previous stacking times to the last network structure subcode before the current stacking times to form a network unit, and enabling the network unit to be connected with the output layer of the fifth LSTM connectionThe network elements are stacked N times, e.g. 2 times. In this example, the current network element includes only 3 × 3 convolutional layer coding, then the 3 × 3 convolutional layer coding is stacked 2 times. It can be seen that in this example, the current network architecture comprises two modules, the first module comprising 2 network elements and the second module comprising 2 network elements.
Next, as shown in fig. 3c, the network structure subcodes are sorted according to the serial order of the LSTM networks to obtain the current network structure code. Wherein the network structure sub-code comprises: the coding of the network layer in the network element or the stacking times of the network elements.
And S320, constructing the current network structure according to the current network structure code.
In an alternative embodiment, the current network structure is directly constructed according to the current network structure code, that is, the current network structure code is replaced with the corresponding network layer, and the corresponding network layer is stacked N times, as shown in fig. 3 d.
In another alternative embodiment, to reduce the amount of data, an initial network structure is constructed based on the current network structure encoding; and inserting a down-sampling layer into the initial network structure to form the current network structure. Optionally, since the convolution layer has a large amount of operations, a downsampling layer is inserted before the convolution layer, for example, a downsampling layer is respectively inserted into the header of the first network unit of each module, and stride is made to be 2, so as to gradually reduce the data amount, and at the same time, ensure the accuracy of the network structure, as shown in fig. 3 e.
And S330, calculating an objective function value of the current network structure.
S340, judging whether the objective function value reaches a preset function value and/or whether the adjustment times reach a time threshold value. If yes, jumping to S350; if not, go to S360.
And S350, determining the current network structure and finishing the operation.
And S360, adjusting the parameters of the sampler according to the objective function value. Return to execution S310.
In the embodiment of the disclosure, a predefined network layer code is input into a sampler to obtain a current network structure code; constructing a current network structure according to the current network structure code, and providing a sampling method of the network structure; moreover, the layer-by-layer sampling method conforms to the multi-layer structure of the neural network, and for example, a better network structure is sampled.
Example four
Fig. 4 is a schematic structural diagram of a structure determining apparatus of a neural network according to a fourth embodiment of the present disclosure, including: a first acquisition module 41, a second acquisition module 42, an interpolation module 43 and a feature acquisition module 44.
A sampling module 41, configured to sample a current network structure through a sampler;
a calculation module 42, configured to calculate an objective function value of the current network structure;
an adjusting module 43, configured to adjust a parameter of the sampler according to the objective function value;
and a returning module 44, configured to return to execute an operation of sampling the current network structure by using the sampler until the objective function value reaches a preset function value, and/or the adjustment number reaches a number threshold.
In the embodiment of the disclosure, a sampler is used for sampling a current network structure, and an objective function value of the current network structure is calculated, so that the quality of the network structure is represented by the objective function value; the parameters of the sampler are adjusted according to the objective function value, the operation of sampling the current network structure through the sampler is executed in a return mode until the objective function value reaches a preset function value and/or the adjustment times reach a time threshold value, so that automatic search of the network structure is achieved, and the parameters of the sampler are adjusted repeatedly, so that the sampler can sample the network structure with higher quality, and better results can be obtained in subsequent use. In a specific application scenario, the embodiment of the present disclosure is not limited to a fixed network structure, but obtains a new network structure with higher quality through an objective function value, and is applicable to data processing in almost various scenarios.
Optionally, when calculating the objective function value of the current network structure, the calculating module 42 is specifically configured to: calculating the accuracy and/or the running time of the current network structure; and obtaining the objective function value according to the accuracy and/or the running time of the current network structure.
Optionally, the calculating module 42 is specifically configured to, when calculating the accuracy and runtime length of the current network structure: training the network parameters of the current network structure by adopting a training set; calculating the accuracy of the network structure by adopting a verification set; and operating the current network structure at the terminal and acquiring the operation duration of the current network structure.
Optionally, the apparatus further includes a network parameter adjusting module, configured to calculate a loss value of the current network structure by using the validation set before calculating an accuracy of the current network structure by using the validation set; adjusting network parameters according to the loss value; and returning to executing the calculation operation of the loss value until the loss value is lower than the loss threshold value.
Optionally, the apparatus further includes a vertex conversion module, configured to convert a network parameter of the current network structure and a format of the input data into a fixed point before the terminal runs the current network structure and obtains a running duration of the current network structure; the network structure, the network parameters in the fixed-point format, and the input data are provided to the terminal.
Optionally, when the calculation module 42 obtains the objective function value according to the accuracy and/or the operation duration of the current network structure, it is specifically configured to: according to the formula
Figure GDA0003326986880000151
Calculating an objective function value Q (m) of the current network structure m; wherein acc (m) is the accuracy of the current network structure m, T (m) is the running time of the current network structure m, T is a constant, and r is a preset index.
Optionally, when the current network structure is sampled by a sampler, the sampling module 41 is specifically configured to: inputting a predefined network layer code into a sampler to obtain a current network structure code; constructing a current network structure according to the current network structure code; wherein the current network structure encoding comprises: the coding of the network layer in the network element, the connection relation of the network layer and the stacking times of the network element, the current network structure comprises: the network layer in the network unit, the connection relation of the network layer and the stacking times of the network unit.
Optionally, the sampler comprises a plurality of series connected long short term memory LSTM networks, each having an output connected to the output layer. When inputting the predefined network layer code into the sampler to obtain the current network structure code, the sampling module 41 is specifically configured to: inputting predefined network layer codes into a first LSTM network, and acquiring each network structure sub-code from an output layer connected with each LSTM network; sequencing the network structure subcodes according to the serial sequence of the LSTM networks to obtain the current network structure code; wherein the network structure sub-code comprises: the coding of the network layer in the network element or the stacking times of the network elements.
Optionally, when constructing the current network structure according to the current network structure code, the sampling module 41 is specifically configured to: constructing an initial network structure according to the current network structure code; and inserting a down-sampling layer into the initial network structure to form the current network structure.
Optionally, the apparatus further includes a network parameter calculation module, configured to initialize a network parameter of the current network structure after returning to perform an operation of sampling the current network structure by the sampler until the objective function value reaches a preset function value and/or the adjustment number reaches a number threshold; and calculating the network parameters of the current network structure through the data set.
The device for determining the structure of the neural network provided by the embodiment of the disclosure can execute the method for determining the structure of the neural network provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like, or various forms of servers such as a stand-alone server or a server cluster. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a read-only memory device (ROM)502 or a program loaded from a storage device 505 into a random access memory device (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing a method of displaying an operable control. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 505, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory device (RAM), a read-only memory device (ROM), an erasable programmable read-only memory device (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory device (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the processing device, cause the electronic device to: sampling a current network structure through a sampler; calculating an objective function value of the current network structure; adjusting parameters of the sampler according to the objective function value; and returning to execute the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjustment times reach a time threshold value.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not in some cases form a limitation on the module itself, and for example, a sampling module may also be described as a "sampling module of the current network architecture".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (11)

1. A method for determining a structure of a neural network, used for image processing, comprising:
sampling a current network structure through a sampler;
converting the network parameters of the current network structure and the format of input data into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and the objective function value of the current network structure is calculated according to the operation duration of the current network structure; the objective function value is used for representing the quality of the network structure, and the objective function is constructed by at least one element, wherein the element for constructing the objective function is determined according to the actual requirement of the network structure;
adjusting parameters of the sampler according to the objective function value;
and returning to execute the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjustment times reach a time threshold value.
2. The method of claim 1, wherein calculating an objective function value for the current network architecture comprises:
training the network parameters of the current network structure by adopting a training set;
calculating the accuracy of the current network structure by adopting a verification set;
operating a current network structure at a terminal, and acquiring the operation duration of the current network structure;
and obtaining an objective function value according to the accuracy and the running time of the current network structure.
3. The method of claim 2, further comprising, prior to said employing the validation set to calculate an accuracy of the current network structure:
calculating the loss value of the current network structure by adopting a verification set;
adjusting network parameters according to the loss value;
and returning to executing the calculation operation of the loss value until the loss value is lower than the loss threshold value.
4. The method of claim 2, wherein obtaining the objective function value according to the accuracy and the operation duration of the current network structure comprises:
according to the formula
Figure FDA0003326986870000021
Calculating an objective function value Q (m) of the current network structure m;
wherein acc (m) is the accuracy of the current network structure m, T (m) is the running time of the current network structure m, T is a constant, and r is a preset index.
5. The method of claim 1, wherein sampling out the current network structure by a sampler comprises:
inputting a predefined network layer code into a sampler to obtain a current network structure code;
constructing a current network structure according to the current network structure code;
wherein the current network structure encoding comprises: the coding of the network layer, the connection relation of the network layer and the stacking times of the network elements in the network elements, the current network structure comprises: the network layer in the network unit, the connection relation of the network layer and the stacking times of the network unit.
6. The method of claim 5, wherein the sampler comprises a plurality of series connected Long Short Term Memory (LSTM) networks, each having an output connected to an output layer;
inputting the predefined network layer code into the sampler to obtain the current network structure code, including:
inputting predefined network layer codes into a first LSTM network, and acquiring each network structure sub-code from an output layer connected with each LSTM network;
sequencing the network structure subcodes according to the serial sequence of the LSTM networks to obtain the current network structure code;
wherein the network structure sub-encoding comprises: the coding of the network layer in the network element or the stacking times of the network elements.
7. The method of claim 5, wherein said constructing the current network structure according to the current network structure coding comprises:
constructing an initial network structure according to the current network structure code;
and inserting a down-sampling layer into the initial network structure to form the current network structure.
8. The method according to any one of claims 1 to 7, wherein after the operation of sampling out the current network structure by the sampler is performed back until the objective function value reaches a preset function value and/or the number of times of adjustment reaches a number threshold, the method further comprises:
initializing network parameters of a current network structure;
and calculating the network parameters of the current network structure through the data set.
9. A structure determination apparatus of a neural network, which is used for image processing, comprising:
the sampling module is used for sampling the current network structure through the sampler;
the computing module is used for converting the network parameters of the current network structure and the format of the input data into fixed points; providing the network structure, the network parameters in the fixed-point format and the input data to the terminal; the method comprises the steps that a current network structure operates at a terminal, the operation duration of the current network structure is obtained, and the objective function value of the current network structure is calculated according to the operation duration of the current network structure; the objective function value is used for representing the quality of the network structure, and the objective function is constructed by at least one element, wherein the element for constructing the objective function is determined according to the actual requirement of the network structure;
the adjusting module is used for adjusting the parameters of the sampler according to the objective function values;
and the returning module is used for returning and executing the operation of sampling the current network structure by the sampler until the target function value reaches a preset function value and/or the adjusting times reach a time threshold value.
10. An electronic device, characterized in that the electronic device comprises:
one or more processing devices;
a storage device for storing one or more programs,
when executed by the one or more processing devices, cause the one or more processing devices to implement the method of determining the structure of a neural network as claimed in any one of claims 1-8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processing device, carries out a method of structure determination of a neural network as claimed in any one of claims 1 to 8.
CN201811494899.1A 2018-12-07 2018-12-07 Method, device and equipment for determining structure of neural network and readable medium Active CN109359727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811494899.1A CN109359727B (en) 2018-12-07 2018-12-07 Method, device and equipment for determining structure of neural network and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811494899.1A CN109359727B (en) 2018-12-07 2018-12-07 Method, device and equipment for determining structure of neural network and readable medium

Publications (2)

Publication Number Publication Date
CN109359727A CN109359727A (en) 2019-02-19
CN109359727B true CN109359727B (en) 2022-01-11

Family

ID=65331758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811494899.1A Active CN109359727B (en) 2018-12-07 2018-12-07 Method, device and equipment for determining structure of neural network and readable medium

Country Status (1)

Country Link
CN (1) CN109359727B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905880B (en) * 2019-03-22 2020-05-29 苏州浪潮智能科技有限公司 Network partitioning method, system, electronic device and storage medium
CN110084172B (en) * 2019-04-23 2022-07-29 北京字节跳动网络技术有限公司 Character recognition method and device and electronic equipment
CN112283889A (en) * 2020-10-10 2021-01-29 广东美的暖通设备有限公司 Method, device and equipment for controlling pre-starting time of air conditioner and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
WO2017142397A1 (en) * 2016-02-19 2017-08-24 Scyfer B.V. Device and method for generating a group equivariant convolutional neural network
CN108228325A (en) * 2017-10-31 2018-06-29 深圳市商汤科技有限公司 Application management method and device, electronic equipment, computer storage media
CN108229647A (en) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 The generation method and device of neural network structure, electronic equipment, storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390370B2 (en) * 2012-08-28 2016-07-12 International Business Machines Corporation Training deep neural network acoustic models using distributed hessian-free optimization
US11720795B2 (en) * 2014-11-26 2023-08-08 Canary Capital Llc Neural network structure and a method thereto
CN108009625B (en) * 2016-11-01 2020-11-06 赛灵思公司 Fine adjustment method and device after artificial neural network fixed point
CN107480770B (en) * 2017-07-27 2020-07-28 中国科学院自动化研究所 Neural network quantization and compression method and device capable of adjusting quantization bit width
CN107909583B (en) * 2017-11-08 2020-01-10 维沃移动通信有限公司 Image processing method and device and terminal
CN108564165B (en) * 2018-03-13 2024-01-23 上海交通大学 Method and system for optimizing convolutional neural network by fixed point
CN108921210B (en) * 2018-06-26 2021-03-26 南京信息工程大学 Cloud classification method based on convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
WO2017142397A1 (en) * 2016-02-19 2017-08-24 Scyfer B.V. Device and method for generating a group equivariant convolutional neural network
CN108229647A (en) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 The generation method and device of neural network structure, electronic equipment, storage medium
CN108228325A (en) * 2017-10-31 2018-06-29 深圳市商汤科技有限公司 Application management method and device, electronic equipment, computer storage media

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BlockQNN: Efficient Block-wise Neural Network Architecture Generation;Zhao Zhong等;《arXiv》;20180819;第1-14页 *
GPU-based parallel optimization of immune convolutional neural network and embedded system;Tao Gong等;《Engineering Applications of Artificial Intelligence》;20170630;第62卷;第384-395页 *
Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks;Darryl D. Lin等;《arXiv》;20160708;第1-5页 *
基于扩展神经网络的非线性不确定***自适应控制设计研究;陈浩广;《中国博士学位论文全文数据库 信息科技辑》;20180815(第08期);第I140-18页 *
浮点傅里叶变换硬件架构综合研究;冯淦;《中国硕士学位论文全文数据库 信息科技辑》;20180131(第01期);第I138-37页 *

Also Published As

Publication number Publication date
CN109359727A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN111476309B (en) Image processing method, model training method, device, equipment and readable medium
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
CN110189246B (en) Image stylization generation method and device and electronic equipment
CN109359727B (en) Method, device and equipment for determining structure of neural network and readable medium
CN110009059B (en) Method and apparatus for generating a model
CN113362811B (en) Training method of voice recognition model, voice recognition method and device
CN112258512A (en) Point cloud segmentation method, device, equipment and storage medium
CN112149699B (en) Method and device for generating model and method and device for identifying image
CN113327599B (en) Voice recognition method, device, medium and electronic equipment
CN113033580B (en) Image processing method, device, storage medium and electronic equipment
CN113256339B (en) Resource release method and device, storage medium and electronic equipment
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN113140012A (en) Image processing method, image processing apparatus, image processing medium, and electronic device
CN110197459B (en) Image stylization generation method and device and electronic equipment
CN116090543A (en) Model compression method and device, computer readable medium and electronic equipment
CN115937020A (en) Image processing method, apparatus, device, medium, and program product
CN113593527B (en) Method and device for generating acoustic features, training voice model and recognizing voice
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN114648712B (en) Video classification method, device, electronic equipment and computer readable storage medium
CN111626044B (en) Text generation method, text generation device, electronic equipment and computer readable storage medium
CN110704679B (en) Video classification method and device and electronic equipment
CN110188833B (en) Method and apparatus for training a model
CN115099323B (en) Content group determination method, device, medium and electronic equipment
CN112926735B (en) Method, device, framework, medium and equipment for updating deep reinforcement learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant