CN110598727A

CN110598727A - Model construction method based on transfer learning, image identification method and device thereof

Info

Publication number: CN110598727A
Application number: CN201910656794.XA
Authority: CN
Inventors: 尉桦; 邵新庆; 刘强
Original assignee: Shenzhen Liwei Zhilian Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen Liwei Zhilian Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-12-20
Anticipated expiration: 2039-07-19
Also published as: CN110598727B

Abstract

The model construction method based on the transfer learning comprises the following steps: training by using a first training set through a deep neural network to obtain a first learning model; performing parameter fine adjustment on each network layer of the first learning model one by using a second training set until the result of the loss function of the first learning model does not decline or the loss function of the first learning model passes through all network layers, and constructing to obtain a second learning model; combining the first training set and the second training set, and training through the second learning model to obtain a third learning model; the third learning model is used for extracting feature information of the image of the object with the first class of features and/or the second class of features. Because the first training set and the second training set are utilized when the model is constructed, the constructed second learning model can have the characteristic learning capacity of the two training sets and can achieve a better convergence effect.

Description

Model construction method based on transfer learning, image identification method and device thereof

Technical Field

The invention relates to the technical field of image processing, in particular to a model construction method based on transfer learning, an image identification method and an image identification device.

Background

In recent years, deep learning has been receiving more and more attention and has been successfully applied in many fields. The deep learning algorithm can learn high-level features from mass data, so that the deep learning has the advantages over the traditional machine learning.

However, data dependency is one of the most serious problems in deep learning. In contrast to traditional machine learning methods, deep learning is extremely dependent on large-scale training data, as it requires a large amount of data to understand the underlying data patterns. Because the production of a large number of training data sets is generally limited, the training network of a specific training set is different after the data sets are replaced under the same task, for example, for the data set of a face image, the Asian face data set and the European face data set are different, and the European face data set and the African face data set are also different; therefore, the universality of the learning model trained on a single data set is often limited.

Currently, an effective method for solving the problem is transfer learning, but in an actual neural network training process, some strong fluctuation situations are often caused due to too large differences existing between different data sets, and a learning model is difficult to converge, so that only some objects with small characteristic differences can be recognized by using the same learning model, but objects with large difference characteristics cannot be recognized, the application range of the learning model is limited, and finally the experience effect of object recognition is poor.

Disclosure of Invention

The invention mainly solves the technical problem of how to improve the transfer learning capability of the existing image recognition model so as to enhance the experience effect of object recognition.

According to a first aspect, an embodiment provides a model building method based on transfer learning, which is characterized by comprising the following steps:

training by using a first training set through a deep neural network to obtain a first learning model; the first training set comprises a plurality of images of objects having a first class of features;

utilizing a second training set to perform parameter fine adjustment on each network layer of the first learning model one by one until the result of the loss function of the first learning model does not decline or the loss function of the first learning model passes through all the network layers, and constructing to obtain a second learning model; the second training set comprises a plurality of images of objects having a second type of feature;

combining the first training set and the second training set, and training through the second learning model to obtain a third learning model; the third learning model is used for extracting feature information of the image of the object with the first class of features and/or the second class of features.

The method for constructing the first learning model comprises the following steps of utilizing a first training set to finely adjust parameters of network layers of the first learning model one by one until the result of a loss function of the first learning model does not decline or the loss function of the first learning model passes through all the network layers, and constructing a first learning model, wherein the method comprises the following steps: selecting the front k x m layers of each network layer in the first learning model, inputting the second training set into the front k x m layers, finely adjusting the parameters of the front k x m layers, and calculating the result of the loss function corresponding to the front k x m layers to perform one-time iterative calculation; wherein m is a positive integer greater than 1, k is an iteration number and k is 1,2,3 …; performing multiple iterative calculations on the front k × m layers according to the value change of k until the results of the loss functions corresponding to the front k × m layers do not decrease or go through all network layers; and constructing to obtain a second learning model by using the parameters of the first learning model obtained by the last iterative computation.

At each iteration of the calculation, the result of calculating the loss function corresponding to the first k × m layers is:

calculating the loss amount of transfer learning

lossexc＝||NET(A)′layer(k*m)-NET(A)layer(k*m)||²

NET represents the model of the initial deep neural network, a represents the first data set, B represents the second data set, NET (a) represents the first learning model, NET (a) layer (km) represents the output at layer k × m when B is input to NET (a); net (a) ' represents the network model after fine-tuning the parameters of the first k × m layers of net (a), net (a) ' layer (km) represents the output quantity of the k × m layers when B is input to net (a) ';

calculating loss of network model NET (A)'

Or the like, or, alternatively,

wherein, the former Losscls is a loss function of cross entropy, and the latter Losscls is a loss function of Euclidean distance; y represents the output at the last network layer when B is input to net (a)',a data tag representing y;

calculating the result of the loss function corresponding to the first k x m layers of NET (A)

Loss＝α×Losscls+(1-α)×Lossexc

Wherein α represents a weight coefficient, and α ∈ [0,1 ].

The performing multiple iterative computations on the front k × m layers according to the value change of k until the result of the loss function corresponding to the front k × m layers does not decrease or traverses all network layers includes:

when k respectively takes values of k 1,2 and 3 …, respectively calculating to obtain the Loss values corresponding to the front k m layers;

determining that the result of the loss function corresponding to said front k × m layers does not decrease when the value of loss is continuously equal to the same value or close to 0;

determining to traverse all network layers when the first k x m layers are equal to or greater than a total number of network layers in the first learning model.

The combining the first training set and the second training set, and obtaining a third learning model through the training of the second learning model, includes: selecting the front h x n layers of each network layer in the second learning model, inputting the first training set and the second training set into the front h x n layers in a combined mode, finely adjusting the parameters of the front h x n layers, and calculating the result of the loss function corresponding to the front h x n layers to perform one-time iterative calculation; wherein n is a positive integer greater than 1, h is an iteration number and h is 1,2,3 …; carrying out multiple iterative calculations on the front h x n layers according to the value change of h until the results of the loss functions corresponding to the front h x n layers do not decrease or go through all network layers; and constructing to obtain a third learning model by using the parameters of the second learning model obtained in the last iterative computation.

At each iteration of calculation, the result of calculating the loss function corresponding to the first h × n layers is:

representing the second learning model by NET (B), representing the network model after fine-tuning the parameters of the first h x n layer of NET (B) by NET (B)', and calculating the result of the loss function corresponding to the first h x n layer of NET (B)

Or the like, or, alternatively,

wherein, the former Losscls 'is a loss function of cross entropy, and the latter Losscls' is a loss function of Euclidean distance; y 'represents the output at the last network layer when B is input to net (B)',a data tag representing y'.

According to a second aspect, an embodiment provides an image recognition method, comprising the steps of:

acquiring an image of an object to be detected, wherein the object to be detected is an object with a first type of characteristics and/or a second type of characteristics;

extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model; the third learning model is obtained by the model construction method according to the first aspect;

and identifying the object to be detected according to the extracted characteristic information.

According to a third aspect, an embodiment provides an image recognition apparatus comprising:

the image acquisition unit is used for acquiring an image of an object to be detected, wherein the object to be detected is an object with a first type of characteristics and/or a second type of characteristics;

the characteristic extraction unit is used for extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model; the third learning model is obtained by the model construction method according to the first aspect;

and the object identification unit is used for identifying the object to be detected according to the extracted characteristic information.

The image recognition device further comprises a model construction unit connected with the feature extraction unit, wherein the model construction unit comprises:

the first training module is used for utilizing a first training set to obtain a first learning model through deep neural network training; the first training set comprises a plurality of images of objects having a first class of features;

the second training module is used for carrying out parameter fine adjustment on each network layer of the first learning model one by utilizing a second training set until the result of the loss function of the first learning model does not decline or the loss function of the first learning model passes through all the network layers, and a second learning model is constructed; the second training set comprises a plurality of images of objects having a second type of feature;

and the third training module is used for combining the first training set and the second training set and obtaining a third learning model through the training of the second learning model.

According to a fourth aspect, an embodiment provides a computer readable storage medium comprising a program executable by a processor to implement the method of the first or second aspect described above.

The beneficial effect of this application is:

according to the model construction method based on the transfer learning, the image identification method and the device thereof of the embodiment, the model construction method based on the transfer learning comprises the following steps: training by using a first training set through a deep neural network to obtain a first learning model; utilizing a second training set to perform parameter fine adjustment on each network layer of the first learning model one by one until the result of the loss function of the first learning model does not fall or the loss function of the first learning model passes through all the network layers, and constructing to obtain a second learning model; combining the first training set and the second training set, and training through the second learning model to obtain a third learning model; the third learning model is used for extracting feature information of the image of the object with the first class of features and/or the second class of features. The image recognition method comprises the following steps: acquiring an image of an object to be detected, wherein the object to be detected is an object with a first type of characteristics and/or a second type of characteristics; extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model; the third learning model is obtained by a model construction method based on transfer learning; and identifying the object to be detected according to the extracted characteristic information. On the first hand, when the model is constructed, not only the first training set but also the second training set are utilized, so that the constructed second learning model can have the characteristic learning capability of the two training sets; in the second aspect, because the parameters of each network layer of the first learning model are finely adjusted one by one when the second learning model is constructed, the first learning model gradually converges under the action of the second training set, and the effectiveness of the convergence process can be ensured through the calculation result of the loss function of the first learning model; in the third aspect, after the second learning model is obtained, the first training set and the second training set are used for training the second learning model together, so that the second learning model is further optimized, and a third learning model with a better training effect is obtained; in a fourth aspect, the technical scheme can perform effective transfer learning on the first learning model and the second training set by means of a deep learning network, and can solve the problems of large fluctuation and difficulty in convergence in the model training process while increasing the model universality in the training of a plurality of different data sets; in the fifth aspect, the image recognition method claimed in the present invention adopts the pre-constructed third learning model to extract the feature information in the image of the object to be detected, so that when the third learning model is used, not only the first type of features on the object to be detected can be extracted, but also the second type of features on the object to be detected can be extracted, thereby enhancing the image recognition capability, and using one learning model can perform image recognition on objects with different types of features. In a sixth aspect, in the iterative model training process, when a new training set is needed to optimize an old model, the old model can be optimized by referring to the specific construction process of the second learning model or the third learning model in the model construction method of the technical scheme, so that the new model is not only suitable for the new data set, but also can maintain the characteristics of the old data set.

Drawings

FIG. 1 is a flow chart of a model construction method based on transfer learning in the present application;

FIG. 2 is a flow chart of constructing a second learning model;

fig. 3 is a schematic diagram of the principle of calculating the loss function corresponding to the first k × m layers of the first learning model;

FIG. 4 is a flow chart of constructing a third learning model;

FIG. 5 is a flow chart of an image recognition method of the present application;

FIG. 6 is a schematic structural diagram of an image recognition apparatus according to the present application;

fig. 7 is a schematic structural diagram of a model building unit.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The first embodiment,

Referring to fig. 1, the present application provides a model building method based on transfer learning, which includes steps S100-S300, which are described below.

And S100, utilizing the first training set to obtain a first learning model through deep neural network training.

It should be noted that the Deep neural network may be a Long/short term memory network (LSTM), a Recurrent Neural Network (RNN), a Generative Adaptive Network (GAN), a Convolutional neural network (DCNN), or a Deep Convolutional inverse mapping network (DCIGN), which is not limited herein.

It should be noted that the first training set herein includes a plurality of images of an object having a first type of feature, for example, the first training set includes a plurality of facial images of asians, where the first type of feature corresponding to the asians may include feature information of yellow skin color, black eyes, black hair, diamond-shaped face contour, and the like. For another example, the first training set includes a plurality of outline images of a car, wherein the first class of features corresponding to the car may include feature information on a long car head, a short car tail, four wheels, an inclined front windshield, a low chassis, and the like.

Those skilled in the art will appreciate that the first training set may also include a plurality of images of objects such as flowers, trees, animals, buildings, paintings, etc., which are not to be classed one by one for ease of presentation.

And step S200, utilizing a second training set to perform parameter fine adjustment on each network layer of the first learning model one by one until the result of the loss function of the first learning model does not decline or the loss function of the first learning model passes through all the network layers, and constructing to obtain a second learning model.

It should be noted that the second training set here includes a plurality of images of the object having the second type of feature. For example, when the first training set is a plurality of facial images of asians, then the second training set may be a plurality of facial images of europeans, where the second class of features corresponding to europeans may include feature information in terms of white skin color, blue eyes, blond hair, square face contours, and so on.

In a specific embodiment, see FIG. 2, step S200 may include steps S210-S270, described below, respectively.

Step S210, selecting the first k × m layers of each network layer in the first learning model, where m is a positive integer greater than 1, k is an iteration number, and k is 1,2, and 3 ….

It is understood that the first learning model obtained from the deep neural network often has more network layers arranged in sequence, for example, for the first learning model with 100 layers in total, the first 10 layers may be selected in the first iteration, the first 20 layers may be selected in the second iteration, and so on, until the tenth iteration, the first 100 layers (i.e., all network layers) may be selected.

Step S220, inputting the second training set into the first k × m layers, fine-tuning parameters of the first k × m layers, and calculating a result of a loss function corresponding to the first k × m layers to perform an iterative calculation.

In this embodiment, referring to fig. 3, at each iteration of the calculation, the result of calculating the loss function corresponding to the first k × m layers is:

(1) calculating the loss amount of transfer learning

Lossexc＝||NET(A)′layer(k*m)-NET(A)layer(k*m)||²

NET represents the model of the initial deep neural network, a represents the first data set, B represents the second data set, NET (a) represents the first learning model, NET (a) layer (k m) represents the output quantity at k m layer when B is input to NET (a); net (a) ' indicates a network model after fine-tuning parameters of the first k × m layers of net (a), and net (a) ' layer (km) indicates an output amount at the k × m layers when B is input to net (a) '.

(2) Calculating loss of network model NET (A)'

Or the like, or, alternatively,

wherein, the former Losscls is a loss function of cross entropy (usually used in the case of classification task), and the latter Losscls is a loss function of euclidean distance (usually used in the case of regression task); in addition, y represents the output quantity at the last network layer when B is input to NET (A)',a data tag representing y.

(3) Calculating the result of the loss function corresponding to the first k x m layers of NET (A)

Loss＝α×Losscls+(1-α)×Lossexc

Wherein α represents a weight coefficient, and α ∈ [0,1 ]. The weight coefficient α can be adjusted according to the actual needs of the user, and preferably, the weight coefficient α is set to 0.9 in the present technical solution, so as to coordinate the ratio of the two loss amounts.

In step S230, it is determined whether all network layers have been traversed, if yes, the process proceeds to step S250, otherwise, the process proceeds to step S240.

In step S240, it is determined whether the result of the loss function corresponding to the first k × m layers is no longer decreasing, if yes, the process proceeds to step S250, otherwise, the process proceeds to step S270.

In step S250, the iterative computation ends.

And step S260, constructing and obtaining a second learning model by using the parameters of the first learning model obtained by the last iterative computation.

It will be appreciated that after each iteration the parameters of the first k × m layers of the first learning model net (a) are fine-tuned so that the first learning model forms an updated parameter learning model, and the last formed updated parameter learning model can be used as the second learning model net (b).

In step S270, when the result of the Loss function Loss corresponding to the first k × m layers of the first learning model continues to decrease and does not go through all network layers, the iterative computation needs to be continued, so that the iteration number k is increased by 1. Then, those skilled in the art will understand that after performing multiple iterative calculations on the first k × m layers according to the value change (1, 2,3, …) of k, the iterative calculations may be completed until the result of the loss function corresponding to the first k × m layers does not decrease or all network layers are traversed, and the process proceeds to step S250.

In addition, in this specific embodiment, through the cooperation of steps S210 to S240 and step S270, the first k × m layers of the first learning model may be iteratively calculated for multiple times according to the value change of k until the result of the loss function corresponding to the first k × m layers does not decrease or the loss function goes through all network layers, which may be specifically understood as:

(1) when k is 1,2, and 3 …, the Loss value corresponding to the first k × m layers is calculated.

(2) When the value of Loss is continuously equal to the same value (e.g., 0.3) or close to 0 (e.g., all values less than 0.1), it is determined that the result of Loss function Loss corresponding to the first k × m layers no longer decreases.

(3) When the first k × m layers are equal to or greater than the total number of network layers in the first learning model, it is determined that all network layers are traversed.

And step S300, combining the first training set and the second training set, and training through the second learning model to obtain a third learning model. In the present application, the third learning model is used for extracting feature information of an image of an object having the first-class features and/or the second-class features, so as to perform subsequent identification work on the corresponding object.

In a specific embodiment, see FIG. 4, the step S300 may include steps S310-S370, described below, respectively.

Step S310, select the first h × n layers of each network layer in the second learning model, where n is a positive integer greater than 1, h is an iteration number, and h is 1,2,3 ….

It can be understood that the second learning model obtained by performing the transfer learning or the deep learning according to the deep neural network often has a plurality of network layers arranged in sequence, and the total number of the network layers is consistent with the total number of the first learning model. For example, the second learning model and the first learning model both have a network structure of 100 layers, so when performing iterative computation on the second learning model, the first 10 layers thereof may be selected in the first iteration, the first 20 layers thereof may be selected in the second iteration, and so on, and the first 100 layers thereof (i.e., all network layers) may be selected until the tenth iteration.

N in this embodiment may be the same as or different from m in step S210, and is not limited herein.

Step S320, inputting the first training set and the second training set into the first h × n layers of the second learning model, fine-tuning the parameters of the first h × n layers, and calculating the result of the loss function corresponding to the first h × n layers to perform an iterative calculation.

In this embodiment, in each iterative computation, the result of computing the loss function corresponding to the first h × n layers is:

Or the like, or, alternatively,

wherein, the former Losscls 'is a loss function of cross entropy (usually used in the case of classification task), and the latter Losscls' is a loss function of euclidean distance (usually used in the case of regression task); in addition, y 'represents the output quantity at the last network layer when B is input to NET (B)',a data tag representing y'.

In step S330, it is determined whether all network layers have been traversed, and if so, the process proceeds to step S350, otherwise, the process proceeds to step S340.

In step S340, it is determined whether the result of the loss function corresponding to the first h × n layers is no longer decreasing, if yes, the process proceeds to step S350, otherwise, the process proceeds to step S370.

In step S350, the iterative computation ends.

And step S360, constructing and obtaining a third learning model by using the parameters of the second learning model obtained in the last iterative computation.

It will be appreciated that after each iteration the parameters of the first h x n layers of the second learning model NET (B) are fine-tuned so that the second learning model forms an updated parameter learning model, and the last formed updated parameter learning model can be used as the third learning model NET (a, B).

In step S370, when the result of the Loss function Loss' corresponding to the first h × n layers of the second learning model continues to decrease and does not go through all network layers, the iterative computation needs to be continued, and the iteration number h is increased by 1. Those skilled in the art will understand that after performing multiple iterative calculations on the first h × n layer according to the value change (1, 2,3 …) of h, the iterative calculations are completed until the results of the loss functions corresponding to the first h × n layer no longer decrease or all network layers are traversed, and the process proceeds to step S260.

In addition, in this specific embodiment, through the cooperation of steps S310 to S340 and step S370, the first h × n layer of the second learning model net (b) may be iteratively calculated for multiple times according to the value change of h until the result of the loss function corresponding to the first h × n layer does not decrease or passes through all network layers, which may be specifically understood as:

(1) when h respectively takes the values of k 1,2 and 3 …, respectively calculating to obtain the value of Loss' corresponding to the previous h x n layer;

(2) when the value of Loss' is continuously equal to the same value or close to 0, it is determined that the result of the Loss function corresponding to the first h x n layers is no longer decreasing.

(3) When the first h x n layers are equal to or greater than the total number of network layers in the second learning model, it is determined that all network layers are traversed.

Those skilled in the art will understand that when the model is constructed through the above steps S100-S300, the following effects can be achieved: (1) not only the first training set but also the second training set are utilized, so that the constructed second learning model can have the characteristic learning capability of the two training sets; (2) through the step S200, when the second learning model is constructed, parameter fine adjustment is carried out on each network layer of the first learning model one by one, so that the first learning model gradually converges under the action of the second training set, and the effectiveness of the convergence process can be further ensured through the calculation result of the loss function of the first learning model; (3) after the second learning model is obtained in step S300, the first training set and the second training set are used to train the second learning model together, which is beneficial to further optimizing the second learning model, so as to obtain a third learning model with better training effect; (4) the model construction method can effectively transfer and learn the first learning model and the second training set by means of the deep learning network, and can solve the problems of large fluctuation and difficult convergence in the model training process while increasing the model universality in the training of a plurality of different data sets; (5) in the iterative model training process, when a new training set is needed to optimize an old model, for example, a second training set is used to optimize a first learning model or a first training set and a second training set are used to optimize a second learning model, then the specific construction process of the second learning model or a third learning model in the model construction method of the technical scheme can be referred to, so that the new model is not only suitable for the new data set, but also can maintain the characteristics of the old data set.

Example II,

Referring to fig. 5, on the basis of the model construction method disclosed in the first embodiment, the present application further discloses an image recognition method, which includes steps S410-S430, which are described below respectively.

Step S410, an image of an object to be detected is obtained, where the object to be detected is an object having a first type of feature and/or a second type of feature.

For example, if the object to be detected is an asian person, the first class of features corresponding to the asian person may include feature information on yellow skin color, black eyes, black hair, diamond-shaped face contour, and the like. If the object to be detected is a european person, the second type of feature corresponding to the european person may include feature information in terms of white skin color, blue eyes, blond hair, square face contour, and the like. If the object to be detected is a bloody race in Eurasia, his (her) face may include several pieces of feature information of the first kind of features and the second kind of features.

And step S420, extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model.

It should be noted that the third learning model is obtained by the model construction method disclosed in the first embodiment, and details are not described here.

It should be noted that the technical means for performing feature information (such as feature vectors) on an image according to an already established learning model is widely applied to current image processing work, and a skilled person can perform the work without creative labor, so detailed description thereof is omitted here.

And step S430, identifying the object to be detected according to the characteristic information extracted in the step S420.

For example, if the object to be detected is a Chinese person, some facial feature information of the Chinese person can be well extracted according to the established third learning model, so that the facial feature information is matched in the database through big data operation, when the matching result exceeds a standard threshold value, the face of the Chinese person is considered to be highly similar to the matched face in the database, and the two faces are judged to correspond to the same person, so that the face recognition effect is achieved. Since such data query and matching process belongs to the prior art, detailed description thereof is omitted here.

As can be understood by those skilled in the art, in the image recognition method disclosed in steps S410-S430, since the pre-constructed third learning model is used to extract the feature information in the image of the object to be detected, when the third learning model is used, not only the first type of features on the object to be detected but also the second type of features on the object to be detected can be extracted, so that the image recognition capability is enhanced, and objects with different types of features can be image-recognized by using one learning model. In addition, the image recognition method not only improves the transfer learning capability of the image recognition model, but also can enhance the experience effect of object recognition.

Example III,

Referring to fig. 6, on the basis of the image recognition method disclosed in the second embodiment, correspondingly, the present application also discloses an image recognition apparatus 1, which mainly includes an image acquisition unit 11, a feature extraction unit 12 and an object recognition unit 13, which are respectively described below.

The image acquiring unit 11 is configured to acquire an image of an object to be detected, where the object to be detected is an object having a first type of feature and/or a second type of feature. Specifically, the image acquisition unit 11 may acquire an image of the object to be detected by means of an image pickup apparatus such as a camera, a video camera, or the like, even a media video. For specific functions of the image obtaining unit 11, reference may be made to step S410 in the second embodiment, which is not described herein again.

The feature extraction unit 12 is configured to extract feature information in an image of an object to be detected according to a pre-constructed third learning model NET (a, B). The third learning model is obtained by the model construction method disclosed in the first embodiment. For specific functions of the feature extraction unit 12, reference may be made to step S420 in the second embodiment, which is not described herein again.

The object recognition unit 13 is connected to the feature extraction unit 12, and is configured to recognize the object to be detected according to the extracted feature information. For specific functions of the object identification unit 13, reference may be made to step S430 in embodiment two, which is not described herein again.

Further, referring to fig. 6, the image recognition apparatus 1 of the present embodiment further includes a model construction unit 14 connected to the feature extraction unit. In one embodiment, see FIG. 7, the model building unit 14 includes a first training module 141, a second training module 142, and a third training module 143, each described below.

The first training module 141 is configured to obtain a first learning model through deep neural network training using the first training set; wherein the first training set comprises a plurality of images of objects having a first type of feature.

The second training module 142 is connected to the first training module 141, and configured to perform parameter fine-tuning on each network layer of the first learning model one by using a second training set until a result of a loss function of the first learning model does not decrease or the loss function passes through all network layers, so as to construct and obtain a second learning model; wherein the second training set comprises a plurality of images of objects having a second type of feature;

the third training module 143 is connected to the second training module 142, and is configured to combine the first training set and the second training set, and obtain a third learning model through training of the second learning model.

For specific functions of the first training module 141, the second training module 142, and the third training module 143, reference may be made to step S100, step S200, and step S300 in the first embodiment, which is not described herein again.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. The model construction method based on the transfer learning is characterized by comprising the following steps:

2. The model building method according to claim 1, wherein the building of the second learning model by using the second training set to perform parameter fine-tuning on each network layer of the first learning model one by one until the result of the loss function of the first learning model does not decrease or the result goes through all network layers comprises:

selecting the front k x m layers of each network layer in the first learning model, inputting the second training set into the front k x m layers, finely adjusting the parameters of the front k x m layers, and calculating the result of the loss function corresponding to the front k x m layers to perform one-time iterative calculation; wherein m is a positive integer greater than 1, k is an iteration number and k is 1,2,3 …;

performing multiple iterative calculations on the front k × m layers according to the value change of k until the results of the loss functions corresponding to the front k × m layers do not decrease or go through all network layers;

and constructing to obtain a second learning model by using the parameters of the first learning model obtained by the last iterative computation.

3. The model building method of claim 2, wherein at each iteration of the calculation, the result of calculating the loss function corresponding to the first k x m layers is:

calculating the loss amount of transfer learning

Lossexc＝||NET(A)′layer(k*m)-NET(A)layer(k*m)||²

calculating loss of network model NET (A)'

Or the like, or, alternatively,

Loss＝α×Losscls+(1-α)×Lossexc

Wherein α represents a weight coefficient, and α ∈ [0,1 ].

4. The model building method of claim 3, wherein the iteratively calculating the first k × m layers for a plurality of times according to the value change of k until the result of the loss function corresponding to the first k × m layers does not decrease or traverses all network layers comprises:

5. The model building method of any one of claims 1-4, wherein said combining the first training set and the second training set, training with the second learning model to obtain a third learning model, comprises:

selecting the front h x n layers of each network layer in the second learning model, inputting the first training set and the second training set into the front h x n layers in a combined mode, finely adjusting the parameters of the front h x n layers, and calculating the result of the loss function corresponding to the front h x n layers to perform one-time iterative calculation; wherein n is a positive integer greater than 1, h is an iteration number and h is 1,2,3 …;

carrying out multiple iterative calculations on the front h x n layers according to the value change of h until the results of the loss functions corresponding to the front h x n layers do not decrease or go through all network layers;

and constructing to obtain a third learning model by using the parameters of the second learning model obtained in the last iterative computation.

6. The model building method of claim 5, wherein at each iteration of computing, the result of computing the loss function for the first h x n layers is:

Or the like, or, alternatively,

wherein, the former Losscls 'is a loss function of cross entropy, and the latter Losscls' is a loss function of Euclidean distance; y 'represents the last network layer when B is input to NET (B)'The output of (a) is set,a data tag representing y'.

7. The image recognition method is characterized by comprising the following steps:

extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model; the third learning model is obtained by the model construction method according to any one of claims 1 to 6;

8. An image recognition apparatus, comprising:

the characteristic extraction unit is used for extracting characteristic information in the image of the object to be detected according to a pre-constructed third learning model; the third learning model is obtained by the model construction method according to any one of claims 1 to 6;

9. The image recognition apparatus according to claim 8, further comprising a model construction unit connected to the feature extraction unit, the model construction unit including:

10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-7.