CN110942090A

CN110942090A - Model training method, image processing method, device, electronic equipment and storage medium

Info

Publication number: CN110942090A
Application number: CN201911097179.6A
Authority: CN
Inventors: 刘泽春
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-11-11
Filing date: 2019-11-11
Publication date: 2020-03-31
Anticipated expiration: 2039-11-11
Also published as: CN110942090B

Abstract

The invention discloses a model training method, an image processing method, a device, an electronic device and a storage medium, wherein the model training method comprises the following steps: acquiring an initial network model and a target limiting condition; under the target limiting condition, based on a preset alternate updating mode, initial network structure hyper-parameters and conventional parameters of the initial network model, carrying out iterative updating on the network structure hyper-parameters and the conventional parameters of the network model until convergence, and obtaining a target network model, wherein the preset alternate updating mode comprises the following steps: and carrying out T times of iterative updating of the network structure hyper-parameter based on a preset evolution strategy every time S times of iterative updating of the conventional parameter is carried out. By implementing the method, when the image processing model for realizing the specific purpose is trained, the network structure hyper-parameters in the network model can be automatically updated iteratively based on the preset evolution strategy, so that the cost of the model training process is reduced, the model training efficiency is improved, the cost of the whole image processing process is reduced, and the image processing efficiency is improved.

Description

Model training method, image processing method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for model training and image processing, an electronic device, and a storage medium.

Background

With the development of science and technology and the continuous progress of human society, image processing is widely applied in various industries, and the use of image processing technology becomes more important. At present, in some image processing scenarios, an image processing model for realizing a specific purpose is usually trained based on an initial network model, and an image to be processed is processed by the image processing model, wherein the model training process mainly involves updating of network structure hyper-parameters and conventional parameters of the model.

In the prior art, when updating the network structure hyper-parameters of the network model, the updating is mainly realized manually, so that the cost of the model training process is higher, the efficiency is lower, and the cost of the whole image processing process is higher and the efficiency is lower.

Disclosure of Invention

The embodiment of the invention provides a model training method, an image processing method, a model training device, an image processing device, electronic equipment and a storage medium, and aims to solve the technical problems of high cost and low efficiency in an image processing process in the prior art.

According to a first aspect of the invention, a method of model training is disclosed, the method comprising:

acquiring a preset initial network model and a target limiting condition, wherein the preset initial network model comprises initial network structure hyper-parameters and initial conventional parameters;

under the target limiting condition, carrying out iterative updating on the network structure hyperparameter and the conventional parameters of the network model based on a preset alternate updating mode, the initial network structure hyperparameter and the initial conventional parameters until the network structure hyperparameter and the conventional parameters of the network model are converged to obtain a target network model;

wherein the preset alternate updating mode comprises: and carrying out iterative updating on the network structure hyperparameter T times based on a preset evolution strategy every time S times of iterative updating of the conventional parameters is carried out, wherein T is less than S, and the target network model comprises the target network structure hyperparameter and the target conventional parameters.

Optionally, as an embodiment, the updating process of the network structure hyper-parameter includes:

extracting initial network structure hyper-parameter a of the preset initial network model₁、a₂,…,a_NWherein a is_iFor the ith initial network structure hyper-parameter in the preset initial network model, i is more than or equal to 1 and less than or equal to N, and N is the number of the initial network structure hyper-parameters in the preset initial network model;

based on the a₁、a₂,…,a_NGenerating a network structure hyper-parameter vector p, wherein p ═ a₁,a₂,…,a_N)；

Generating a directional derivative g for updating a network structure hyper-parameter based on the p_PWherein, in the step (A),

loss is a preset loss function, and △ p is a disturbance quantity;

obtaining a set of perturbation quantities { △ p₁,△p₂,…,△p_MWherein, △ p_jJ is more than or equal to 1 and less than or equal to M, and M is the number of disturbance quantities in the disturbance quantity set;

based on the { △ p₁,△p₂,…,△p_MAnd said g_PGenerating a set of directional derivatives { g }_P1,g_P2,…,g_PMAnd (c) the step of (c) in which,

g_Pjis the jth directional derivative in the set of directional derivatives;

based on preset gradient descent algorithm to { g_P1,g_P2,…,g_PMProcessing to obtain p relative to the { △ p }₁,△p₂,…,△p_MSet of update directions { D }₁,D₂,…,D_MIn which D is_jIs said p is opposite to said △ p_jThe update direction of (2);

based on the { D₁,D₂,…,D_MAnd determining the structural hyper-parameters of the target network.

Optionally, as an embodiment, the base is based on the { D₁,D₂,…,D_MAnd determining the target network structure hyperparameter, which comprises the following steps:

determining the { D₁,D₂,…,D_MUpdating directions meeting preset conditions in the database;

and calculating the mean value of the network structure hyper-parameters corresponding to the updating directions meeting the preset conditions, and determining the mean value as the target network structure hyper-parameters.

Optionally, as an embodiment, the network structure hyper-parameter includes: the number of output channels of each layer in the network model, the resolution of the input image of the network model and the network depth of the network model.

Optionally, as an embodiment, the target defining condition includes any one of:

the total parameter number of the network model is lower than a preset number threshold, the total calculated amount of the network model is lower than a preset calculated amount threshold, and the running time of the network model on the specific equipment is lower than a preset time threshold.

According to the second aspect of the present invention, an image processing method is also disclosed, which is used for performing image processing based on the target network model obtained by the training of the model training method, and the method includes:

receiving an image to be processed;

converting the image to be processed into input data matched with the target network model;

inputting the input data into the target network model for processing to obtain an output result of the target network model;

and determining the output result of the target network model as the image processing result of the image to be processed.

According to a third aspect of the present invention, there is also disclosed a model training apparatus, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a preset initial network model and a target limiting condition, and the preset initial network model comprises initial network structure hyper-parameters and initial conventional parameters;

the training module is used for carrying out iterative updating on the network structure hyperparameter and the conventional parameter of the network model based on a preset alternate updating mode, the initial network structure hyperparameter and the initial conventional parameter under the target limiting condition until the network structure hyperparameter and the conventional parameter of the network model are converged to obtain a target network model;

Optionally, as an embodiment, the training module includes: a network structure hyper-parameter updating sub-module, wherein the network structure hyper-parameter updating sub-module comprises:

an extraction unit, configured to extract an initial network structure hyper-parameter a of the preset initial network model₁、a₂,…,a_NWherein a is_iFor the ith initial network structure hyper-parameter in the preset initial network model, i is more than or equal to 1 and less than or equal to N, and N is the number of the initial network structure hyper-parameters in the preset initial network model;

a first generation unit for generating a first signal based on the a₁、a₂,…,a_NGenerating a network structure hyper-parameter vector p, wherein p ═ a₁,a₂,…,a_N)；

A second generating unit for generating a directional derivative g for updating the network structure hyper-parameter based on the p_PWherein, in the step (A),

loss is a preset loss function, and △ p is a disturbance quantity;

an acquisition unit for acquiring the disturbance amount set { △ p₁,△p₂,…,△p_MWherein, △ p_jJ is more than or equal to 1 and less than or equal to M, and M is the number of disturbance quantities in the disturbance quantity set;

a third generating unit for generating a third frequency band based on the { △ p₁,△p₂,…,△p_MAnd said g_PGenerating a set of directional derivatives { g }_P1,g_P2,…,g_PMAnd (c) the step of (c) in which,

g_Pjis the jth directional derivative in the set of directional derivatives;

a processing unit for aligning the { g based on a preset gradient descent algorithm_P1,g_P2,…,g_PMProcessing to obtain p relative to the { △ p }₁,△p₂,…,△p_MSet of update directions { D }₁,D₂,…,D_MIn which D is_jIs said p is opposite to said △ p_jThe update direction of (2);

a determination unit for determining { D }₁,D₂,…,D_MAnd determining the structural hyper-parameters of the target network.

Optionally, as an embodiment, the determining unit includes:

a first determining subunit for determining the { D₁,D₂,…,D_MUpdating directions meeting preset conditions in the database;

the calculating subunit is used for calculating the mean value of the network structure hyperparameters corresponding to the updating directions meeting the preset conditions;

and the second determining subunit is used for determining the average value as the target network structure hyperparameter.

According to a fourth aspect of the present invention, an image processing apparatus is further disclosed, which is configured to perform image processing based on a target network model trained by any one of the above model training apparatuses, and includes:

the receiving module is used for receiving the image to be processed;

the input module is used for converting the image to be processed into input data matched with the target network model;

the processing module is used for inputting the input data into the target network model for processing to obtain an output result of the target network model;

and the determining module is used for determining the output result of the target network model as the image processing result of the image to be processed.

According to a fifth aspect of the present invention, there is also disclosed an electronic apparatus comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of any of the above-described model training methods or the computer program, when executed by the processor, implementing the steps of any of the above-described image processing methods.

According to a sixth aspect of the present invention, there is also disclosed a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above-described model training methods, or which, when executed by a processor, implements the steps of any of the above-described image processing methods.

In the embodiment of the invention, when the image processing model for realizing the specific purpose is trained based on the initial network model and the given limiting conditions, the network structure hyper-parameters in the network model can be automatically updated iteratively based on the preset evolution strategy, so that the cost of the model training process is reduced, the model training efficiency is improved, the cost of the whole image processing process is further reduced, and the image processing efficiency is improved. In addition, the network structure hyper-parameters and the conventional parameters in the network model are alternately updated, so that the precision of the final model obtained by training can be ensured, and the image processing precision and efficiency are further improved.

Drawings

FIG. 1 is a flow diagram of a model training method of one embodiment of the present invention;

FIG. 2 is a flow diagram of a network fabric hyper-parameter update process according to an embodiment of the present invention;

FIG. 3 is a flow diagram of an image processing method of one embodiment of the invention;

FIG. 4 is a block diagram of a model training apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of the image processing apparatus according to the embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

At present, in some image processing scenarios, an image processing model for realizing a specific purpose is usually trained based on an initial network model, and an image to be processed is processed by means of the image processing model, wherein the model training process mainly involves updating of network structure hyper-parameters and conventional parameters of the model.

In the prior art, when updating the network structure hyper-parameters of the network model, the updating is mainly realized manually, and specifically includes the following two modes: the first is to design the network structure hyper-parameters according to manual experience, and the second is to prune the channels of each layer of the network by using pruning algorithm, which also needs to manually determine the number of the channels of each layer. Therefore, the model training process in the prior art is high in cost and low in efficiency, and further the whole image processing process is high in cost and low in efficiency.

In order to solve the above technical problem, embodiments of the present invention provide a model training method, an image processing apparatus, an electronic device, and a storage medium.

First, a model training method provided by the embodiment of the present invention is described below.

It should be noted that the model training method provided by the embodiment of the present invention is applicable to an electronic device, and in practical application, the electronic device may include: mobile terminals such as smart phones, tablet computers, personal digital assistants, etc. may also include: computer devices such as a notebook/desktop computer, a desktop computer, and a server, which are not limited in the embodiments of the present invention.

FIG. 1 is a flow chart of a model training method according to an embodiment of the present invention, which, as shown in FIG. 1, may include the steps of: step 101 and step 102, wherein,

in step 101, a preset initial network model and a target definition condition are obtained, wherein the preset initial network model includes an initial network structure hyper-parameter and an initial conventional parameter.

In one example, the initial network model may be a network model that the user is using, and the target constraints may be constraints given by the user.

In the embodiment of the present invention, the conventional parameter is a weight coefficient in the network model, and the network structure hyper-parameter may include: the number of output channels of each layer in the network model, the resolution of the input image of the network model and the network depth of the network model. In addition, in view of continuous improvement and proposal of machine learning related algorithms, a network model constructed based on the related algorithms is in a wide range, and in this case, the network structure hyper-parameters are not limited to the above three hyper-parameters, but may also include other hyper-parameters, which is not limited in this embodiment of the present invention.

In the embodiment of the present invention, the target definition condition may include any one of: the total parameter number of the network model is lower than a preset number threshold, the total calculated amount of the network model is lower than a preset calculated amount threshold, and the running time of the network model on the specific equipment is lower than a preset time threshold. In addition, with the upgrade of the hardware configuration and the software system of the electronic device, the computing capability of the electronic device is stronger, and at the same time, the performance requirement of the user on the model is higher, in this case, the target limitation condition is not limited to the above three limitation conditions, but may be other limitation conditions, which is not limited in the embodiment of the present invention.

In step 102, under a target limiting condition, iteratively updating the network structure hyperparameters and the conventional parameters of the network model based on a preset alternate updating mode, the initial network structure hyperparameters and the initial conventional parameters until the network structure hyperparameters and the conventional parameters of the network model are converged to obtain a target network model; wherein, presetting the alternate update mode includes: and carrying out iterative updating on the network structure hyperparameter T times based on a preset evolution strategy every time S times of iterative updating of the conventional parameters is carried out, wherein T is less than S, and the target network model comprises the target network structure hyperparameter and the target conventional parameters.

In the embodiment of the present invention, in consideration of the fact that the number of the network structure hyper-parameters in the network model is smaller than that of the conventional parameters, and the convergence is faster when performing the iterative update, more iterations may be used to update the conventional parameters in the network structure, and preferably, for a given initial network model, after 2000 iterations of updating the conventional parameters, the network structure hyper-parameters are updated for 10 iterations, where S takes 2000 and T takes 10.

In the embodiment of the present invention, when the conventional parameters in the network model are updated, any one of the related techniques may be adopted for updating, for example, the training samples may be obtained, the feature data of the training samples may be extracted, the initial network model may be trained based on the feature data of the training samples, and the conventional parameters may be updated through a training process. It should be noted that, according to the usage of the target network model, a training sample for model training may be selected, and corresponding feature data is extracted, for example, when the target network model is used for class identification, the feature data is a class label of the training sample.

In view of the fact that the updating of the network structure hyper-parameters is manually realized, the cost is high, the network model is a whole, the number of output channels of each layer, the resolution of input images and the network depth are cooperatively considered, and the existing algorithm cannot be considered in a coordinated manner, so that the existing algorithm cannot obtain a better network structure hyper-parameter combination.

That is, for a given initial network model, the network structure is determined (i.e., the connection mode is determined), the operation type of each layer is also determined (e.g., whether the operation type is 1x1 convolution or 3x3 convolution), and the optimal network structure hyper-parameter combination satisfying the target constraint condition is found by adjusting the network structure hyper-parameters (the number of output channels of each layer, the resolution of the input image and the depth of the network) through the evolution strategy.

Accordingly, in an embodiment provided by the present invention, as shown in fig. 2, fig. 2 is a flowchart of a network structure hyper-parameter updating process according to an embodiment of the present invention, and may include the following steps: step 201, step 202, step 203, step 204, step 205, step 206 and step 207, wherein,

in step 201, an initial network structure hyper-parameter a of a preset initial network model is extracted₁、a₂,…,a_N(ii) a Wherein, a_iI is more than or equal to 1 and less than or equal to N, and N is the number of the initial network structure hyper-parameters in the preset initial network model.

In the embodiment of the invention, when the network structure hyper-parameters are updated, all the network structure hyper-parameters in the preset initial network model are extracted, namely all the initial network structure hyper-parameters are extracted.

In one example, the initial network model is a network model constructed based on mobilenetv1, and if there are 13 convolutional layers in mobilenetv1, the number of output channels of each layer of the 13 convolutional layers, the resolution of the input image of the entire network model, and the total depth of the network model are extracted in this step, that is, 15 network structure hyper-parameters are extracted.

In step 202, a hyper-parameter a is determined based on the initial network architecture₁、a₂,…,a_NGenerating a network structure hyper-parameter vector p, wherein p ═ a₁,a₂,…,a_N)。

In the embodiment of the invention, after all the network structure hyper-parameters in the preset initial network model are extracted, the extracted network structure hyper-parameters are constructed into a vector p.

In step 203, a directional derivative g for updating the network structure hyper-parameter is generated based on the network structure hyper-parameter vector p_P(ii) a Wherein the content of the first and second substances,

loss is a predetermined loss function, and △ p is the disturbance quantity.

In the embodiment of the invention, the optimal values of the network structure hyper-parameters can be obtained by a gradient descent algorithm in a derivation mode, but considering that the network structure hyper-parameters can not be directly derived like conventional parameter values in a network model, the derivative g of the network structure hyper-parameters needs to be constructed_PSpecifically, the derivative g of the network structure hyper-parameter can be constructed by using a derivative definitional formula_P。

In step 204, a set of perturbation quantities { △ p is obtained₁,△p₂,…,△p_MWherein, △ p_jJ is more than or equal to 1 and less than or equal to M, and M is the number of disturbance quantities in the disturbance quantity set.

In the embodiment of the invention, the derivative g of the network structure hyper-parameter is obtained_PThen, some slight disturbance quantity { △ p can be added to the initial network structure hyper-parameter₁,△p₂,…,△p_MIs deviated from the original value p, and the influence of the disturbance is considered, wherein the disturbance amount △ p_jFor a vector, different perturbation amounts represent different update directions.

Preferably, in the embodiment of the present invention, in order to ensure reliability of the update direction, 100 different disturbance quantities may be randomly selected, that is, a disturbance quantity set is a disturbance quantity set{△p₁,△p₂,…,△p₁₀₀}。

In step 205, based on the disturbance quantity set { △ p₁,△p₂,…,△p_MAnd the directional derivatives g_PGenerating a set of directional derivatives { g }_P1,g_P2,…,g_PM}；

In the embodiment of the present invention, the first and second substrates,

g_Pjis the jth directional derivative in the set of directional derivatives.

In the embodiment of the invention, slight perturbation △ p is added to the initial network structure hyper-parameter₁,△p₂,…,△p_MAfter the disturbance is detected, the direction derivative corresponding to each disturbance can be obtained, and then the direction derivative is collected to a direction derivative set { g }_P1,g_P2,…,g_PM}。

In one example, if the disturbance variable set includes 100 disturbance variables { △ p₁,△p₂,…,△p₁₀₀And then 100 directional derivatives g are also included in the directional derivative set_P1,g_P2,…,g_P100}。

In step 206, a set of directional derivatives { g } is calculated based on a predetermined gradient descent algorithm_P1,g_P2,…,g_PMProcessing to obtain a network structure hyper-parameter vector p relative to { △ p₁,△p₂,…,△p_MSet of update directions { D }₁,D₂,…,D_M}; wherein D is_jIs p relative to △ p_jThe update direction of (2).

In the embodiment of the invention, a direction derivative set { g is obtained_P1,g_P2,…,g_PMAfter the calculation, a gradient descent algorithm is adopted to obtain a network structure hyperparameter vector p relative to { △ p₁,△p₂,…,△p_MSet of update directions { D }₁,D₂,…,D_M}。

In step 207, based on the set of update directions { D }₁,D₂,…,D_MDetermining the structural hyper-parameters of the target network。

In an embodiment of the present invention, a part of update directions meeting the condition may be selected to perform a mean operation, so as to obtain a more reliable gradient descent direction, and further obtain a more reliable target network structure hyper-parameter, where the step 207 may include the following steps (not shown in the figure): step 2071 and step 2071, wherein,

in step 2071, { D₁,D₂,…,D_MUpdating directions meeting preset conditions in the database;

in step 2072, a mean value of the network structure hyper-parameters corresponding to the update direction satisfying the preset condition is calculated, and the mean value is determined as the target network structure hyper-parameter.

In one example, the update direction satisfying the preset condition includes: d₁，D₂And D₃Due to D₁The corresponding disturbance amount is △ p₁，D₂The corresponding disturbance amount is △ p₂，D₃The corresponding disturbance amount is △ p₃Thus D₁The corresponding network structure hyper-parameter is (p + △ p)₁)，D₂The corresponding network structure hyper-parameter is (p + △ p)₂)，D₃The corresponding network structure hyper-parameter is (p + △ p)₃) The target network structure hyper-parameter is { p + (△ p)₁+△p₂+△p₃)/3}。

In another embodiment of the present invention, the step 207 may include the following steps (not shown in the figure): step 2073 and step 2074, wherein,

in step 2073, { D₁,D₂,…,D_MSpecific update direction in (1);

in step 2074, the mean value of the network structure hyperparameters corresponding to the specific update direction is determined as the target network structure hyperparameters.

In one example, the satisfaction of a particular update direction is D4, since D₄The corresponding disturbance amount is △ p₄Thus D₄The corresponding network structure hyper-parameter is (p + △ p)₄) The target network structure hyperparameter is (p + △)p₄)。

Therefore, in the embodiment of the invention, the network structure hyper-parameter can be automatically learned, so that a proper network structure hyper-parameter combination can be found at low cost under the condition of giving a network model and limitation, and the precision is ensured.

In the embodiment of the present invention, the target network model may be used for any one of the following purposes: the method comprises the steps of determining a class to which an image to be processed belongs, identifying a human face in the image to be processed, detecting a specific object in the image to be processed, segmenting the specific object in the image to be processed, and generating a new image, wherein the new image has similar characteristics with the image to be processed.

It can be seen from the above embodiments that, in this embodiment, when an image processing model for realizing a specific purpose is trained based on an initial network model and given defining conditions, a network structure hyper-parameter in the network model can be automatically updated iteratively based on a preset evolution strategy, so that the cost of the model training process is reduced, the model training efficiency is improved, the cost of the whole image processing process is reduced, and the image processing efficiency is improved. In addition, the network structure hyper-parameters and the conventional parameters in the network model are alternately updated, so that the precision of the final model obtained by training can be ensured, and the image processing precision and efficiency are further improved.

Next, an image processing method provided by an embodiment of the present invention will be described.

It should be noted that the image processing method provided by the embodiment of the present invention is applicable to an electronic device, and in practical application, the electronic device may include: mobile terminals such as smart phones, tablet computers, personal digital assistants, etc. may also include: computer devices such as a notebook/desktop computer, a desktop computer, and a server, which are not limited in the embodiments of the present invention.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present invention, which performs image processing based on the target network model in the embodiment shown in fig. 1, and as shown in fig. 3, the method may include the following steps: step 301, step 302, step 303 and step 304, wherein,

in step 301, an image to be processed is received.

In step 302, the image to be processed is converted into input data matching the target network model.

In view of different network models with different applications, the format requirements of the input data are different, for example, some network models require the input data to be an image with a specific resolution, some network models require the input data to be an image with a specific size, some network models require the input data to be a feature map of an original image, and the like.

In step 303, the input data is input to the target network model for processing, and an output result of the target network model is obtained.

In step 304, the output result of the target network model is determined as the image processing result of the image to be processed.

In the embodiment of the invention, when the target network model is a network model used for determining the category to which the image to be processed belongs, the image processing result of the image to be processed is the category to which the image to be processed belongs; for example, it is identified whether the image to be processed belongs to an animal or a plant, or which breed of animal the image to be processed belongs to, or the like.

In the embodiment of the invention, when the target network model is a network model for identifying the face in the image to be processed, the image processing result of the image to be processed is the face identification result of the image to be processed; for example, whether the face in the image to be processed is the face of a specific user or not is identified, or some feature of the face in the image to be processed is identified.

In the embodiment of the invention, when the target network model is a network model for detecting a specific object in an image to be processed, the image processing result of the image to be processed is the detection result of the specific object in the image to be processed; for example, defects of products in the image to be processed are detected, or a specific user in the image to be processed is detected.

In the embodiment of the invention, when the target network model is a network model for segmenting a specific object in an image to be processed, the image processing result of the image to be processed is the segmentation result of the specific object in the image to be processed; for example, lane lines in the image to be processed are segmented.

In the embodiment of the invention, when the target network model is a network model used for generating a new image, the image processing result of the image to be processed is the new image, wherein the new image has the characteristics similar to the image to be processed; for example, when the image to be processed is a character image, a new character similar to the character style is generated.

Fig. 4 is a block diagram of a model training apparatus according to an embodiment of the present invention, and as shown in fig. 4, the model training apparatus 400 may include: an acquisition module 401 and a training module 402, wherein,

an obtaining module 401, configured to obtain a preset initial network model and a target limiting condition, where the preset initial network model includes an initial network structure hyper-parameter and an initial conventional parameter;

a training module 402, configured to perform iterative update on the network structure hyper-parameters and the conventional parameters of the network model based on a preset alternate update mode, the initial network structure hyper-parameters, and the initial conventional parameters under the target limiting condition until the network structure hyper-parameters and the conventional parameters of the network model are both converged, so as to obtain a target network model;

Optionally, as an embodiment, the training module 402 may include: a network structure hyper-parameter updating sub-module, wherein the network structure hyper-parameter updating sub-module may include:

loss is a preset loss function, and △ p is a disturbance quantity;

an acquisition unit for acquiring the disturbance amountSet { △ p₁,△p₂,…,△p_MWherein, △ p_jJ is more than or equal to 1 and less than or equal to M, and M is the number of disturbance quantities in the disturbance quantity set;

g_Pjis the jth directional derivative in the set of directional derivatives;

Optionally, as an embodiment, the determining unit may include:

Optionally, as an embodiment, the network structure hyper-parameter may include: the number of output channels of each layer in the network model, the resolution of the input image of the network model and the network depth of the network model.

Optionally, as an embodiment, the target network model may be used for any one of the following purposes:

the method comprises the steps of determining a class to which the image to be processed belongs, identifying a human face in the image to be processed, detecting a specific object in the image to be processed, segmenting the specific object in the image to be processed, and generating a new image, wherein the new image has similar characteristics with the image to be processed.

Optionally, as an embodiment, the target defining condition may include any one of:

Fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the image processing apparatus 500 may include: a receiving module 501, an input module 502, a processing module 503, and a determining module 504, wherein,

a receiving module 501, configured to receive an image to be processed;

an input module 502, configured to convert the image to be processed into input data matched with the target network model;

the processing module 503 is configured to input the input data to the target network model for processing, so as to obtain an output result of the target network model;

a determining module 504, configured to determine an output result of the target network model as an image processing result of the image to be processed.

Optionally, as an embodiment, the image processing result may include any one of:

the image segmentation method comprises the steps of classifying the image to be processed, identifying the face of the image to be processed, detecting a specific object in the image to be processed, segmenting the specific object in the image to be processed, and obtaining a new image, wherein the new image has the similar characteristic with the image to be processed.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

According to still another embodiment of the present invention, there is also provided an electronic apparatus including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the model training method according to any of the embodiments described above.

According to still another embodiment of the present invention, there is also provided an electronic apparatus including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the image processing method according to any of the embodiments described above.

According to yet another embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the model training method according to any one of the above embodiments.

According to still another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the image processing method according to any one of the above-mentioned embodiments.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The model training method, the image processing device, the electronic device and the storage medium provided by the invention are described in detail, specific examples are applied in the text to explain the principle and the implementation of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein the updating procedure of the network configuration hyper-parameter comprises:

loss is a preset loss function, and △ p is a disturbance quantity;

g_Pjis the jth directional derivative in the set of directional derivatives;

3. The method of claim 2, wherein the base is based on the { D₁,D₂,…,D_MAnd determining the target network structure hyperparameter, which comprises the following steps:

4. The method of claim 1, wherein the network fabric hyper-parameter comprises: the number of output channels of each layer in the network model, the resolution of the input image of the network model and the network depth of the network model.

5. The method of claim 1, wherein the target qualification comprises any of:

6. An image processing method for image processing based on the target network model of any one of claims 1 to 5, the method comprising:

receiving an image to be processed;

7. A model training apparatus, the apparatus comprising:

8. An image processing apparatus for performing image processing based on the target network model of claim 7, the apparatus comprising:

the receiving module is used for receiving the image to be processed;

9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the model training method according to any one of claims 1 to 5 or the computer program, when executed by the processor, implementing the steps in the image processing method according to claim 6.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps in the model training method as defined in any one of claims 1 to 5, or which computer program, when being executed by the processor, carries out the steps in the image processing method as defined in claim 6.