CN115239946A

CN115239946A - Small sample transfer learning training and target detection method, device, equipment and medium

Info

Publication number: CN115239946A
Application number: CN202210755570.6A
Authority: CN
Inventors: 何良雨; 崔健; 刘彤
Original assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Current assignee: Fengrui Lingchuang Zhuhai Technology Co ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-10-25
Anticipated expiration: 2042-06-30
Also published as: CN115239946B

Abstract

The invention provides a method, a device, equipment and a medium for small sample transfer learning training and target detection, wherein the training method comprises the following steps: acquiring a training data set and a detection network model; training a detection network model by adopting the training data set, wherein in the training process: acquiring an auxiliary category loss balance coefficient, an auxiliary confidence coefficient loss balance coefficient and an auxiliary coordinate error balance coefficient; obtaining a coordinate error value, a confidence coefficient error value and a classification error value by the auxiliary category loss balance coefficient, the auxiliary confidence coefficient loss balance coefficient and the auxiliary coordinate error balance coefficient to obtain a focus loss function; and carrying out sample transfer learning on the detection network model according to the focus loss function, and optimizing parameters of the detection network model to obtain a pre-training detection network model. According to the technical scheme provided by the invention, the accuracy of the obtained pre-training detection model can be improved when the detection network model is trained by adopting small sample data.

Description

Small sample transfer learning training and target detection method, device, equipment and medium

Technical Field

The invention relates to the technical field of target detection, in particular to a method, a device, equipment and a medium for small sample transfer learning training and target detection.

Background

In the field of intelligent manufacturing, in order to ensure the quality of products, the products are generally required to be detected in the production process or after finishing so as to judge whether the requirements of leaving factories are met. Particularly in the high-end manufacturing field, such as the fields of semiconductors, ultra-precision optical mirrors, high-purity glass and the like, the quality requirement on the product is very high, and if defects exist on the surface or inside of the product, the performance of the product is affected, and even the product cannot be used directly. Therefore, the defect detection of the product is an essential link in the product manufacturing field.

In recent years, with the development and progress of artificial intelligence deep learning technology, deep learning algorithms are increasingly applied in the field of process material defect detection, and a good detection effect is achieved. The application of the deep learning algorithm in the field of process material defect detection refers to the steps of obtaining a detection network model through deep learning algorithm training, then capturing a defect target in a detection image of a product by adopting the detection network model, and identifying defects on the surface of the product.

However, the current target detection algorithm based on artificial intelligence deep learning depends on a large number of training data sets with label information, so that the deep learning method has a single application scene and poor generalization capability. With the improvement of the manufacturing process, particularly in the high-end manufacturing field of semiconductors and the like, the defective rate of products is remarkably reduced, and a large number of defect samples are often difficult to obtain for training the neural network, so that how to train a high-accuracy pre-training detection network model by using less sample data (namely, small sample data) is an important technical problem to be solved at present.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a medium for small sample transfer learning training and target detection, which are used for solving the technical problem that the accuracy of an obtained pre-training detection network model is low due to the small quantity of sample data in a training data set.

In order to solve at least the above technical problems, the present invention provides the following technical solutions:

a small sample transfer learning training method comprises the following steps:

acquiring a training data set and a detection network model;

training the detection network model by adopting the training data set, wherein in the training process:

acquiring an auxiliary category loss balance coefficient, an auxiliary confidence coefficient loss balance coefficient and an auxiliary coordinate error balance coefficient;

obtaining a coordinate error value according to the central coordinate error, the boundary frame error and the error balance coefficient of the unit;

obtaining a confidence coefficient error value according to whether a detection target exists in the prediction frame or not and the auxiliary confidence coefficient loss balance coefficient;

obtaining a classification error value according to the target real category probability value, the target quality and the auxiliary classification loss balance coefficient;

obtaining a focus loss function value according to the coordinate error value, the confidence coefficient error value and the classification error value;

and carrying out sample transfer learning on the detection network model according to the focus loss function value, and optimizing parameters of the detection network model to obtain a pre-training detection network model.

According to an embodiment of the present invention, the performing sample transfer learning on the detection network model according to the focus loss function includes:

according to the focus loss function, increasing gradient descent for multiple times among each convolution layer of the detection network model, and optimizing the network weight in the detection network model;

and calculating the average value of each gradient loss according to each network weight in the detection network model, and performing gradient descent processing on the detection network model according to the focus loss function.

A small sample transfer learning target detection method comprises the following steps:

acquiring a detection image;

inputting the detection image into a pre-training detection network model obtained by the small sample transfer learning training method in any one of the embodiments;

and the pre-training detection network model obtains a target detection result according to the detection image.

According to an embodiment of the present invention, the pre-training detection network model has a plurality of convolutional layers, and corresponding asymmetric adaptive feature enhancement layers are disposed between adjacent convolutional layers, wherein the first convolutional layer outputs a low-order feature map, and the last convolutional layer outputs a high-order feature map;

the pre-training detection network model obtaining a target detection result according to the detection image comprises:

the convolution layer in the pre-training detection network model extracts the features of the detection image, the asymmetric adaptive feature enhancement layer performs adaptive feature enhancement processing on the features extracted by the previous convolution layer, and the feature enhancement processing result is input into the next convolution layer;

and the pre-training detection network model fuses the low-order characteristic graph and the high-order characteristic graph to obtain a fused characteristic image, and identifies the fused characteristic image to obtain a target detection result.

According to an embodiment of the present invention, the adaptive feature enhancement layer performs adaptive feature enhancement processing on the feature extracted by the previous convolution layer, and includes:

carrying out asymmetric multi-scale feature extraction on the features extracted by the previous convolution layer to obtain multi-scale features;

performing channel feature enhancement processing on the multi-scale features to obtain enhanced multi-scale features;

and carrying out feature fusion on the enhanced multi-scale features to obtain the feature enhancement processing result.

According to an embodiment of the present invention, the performing channel feature enhancement processing on the multi-scale feature to obtain an enhanced multi-scale feature includes:

carrying out global average pooling treatment on the multi-scale features to obtain a global average pooling value on each channel;

carrying out characteristic value excitation processing on the average pooling values on each channel to obtain weight values endowed by the pooling values of each channel passing through the characteristic diagram;

and calibrating the multi-scale features according to the weight values to obtain the enhanced multi-scale features.

A small sample transfer learning training device, comprising:

the data acquisition module is used for acquiring a training data set and a detection network model;

a model training module for training the detection network model using the training data set, the model training module comprising:

the auxiliary loss coefficient acquisition unit is used for acquiring an auxiliary category loss balance coefficient, an auxiliary confidence coefficient loss balance coefficient and an auxiliary coordinate error balance coefficient;

the coordinate error value acquisition unit is used for obtaining a coordinate error value according to the central coordinate error, the boundary frame error and the error balance coefficient of the unit;

the confidence coefficient error value obtaining unit is used for obtaining a confidence coefficient error value according to whether a detection target exists in the prediction frame or not and the auxiliary confidence coefficient loss balance coefficient;

the classification error value obtaining unit is used for obtaining a classification error value according to the target real category probability value, the target quality and the auxiliary classification loss balance coefficient;

a focus loss function obtaining unit, configured to obtain a focus loss function value according to the coordinate error value, the confidence error value, and the classification error value;

and the transfer learning training unit is used for carrying out sample transfer learning on the detection network model according to the focus loss function value, optimizing the parameters of the detection network model and obtaining a pre-training detection network model.

A small sample transfer learning target detection device includes:

the detection image acquisition module is used for acquiring a detection image;

a detection result obtaining module, configured to input the detection image into a pre-training detection network model obtained by the small sample transfer learning training method according to any one of the embodiments; and the pre-training detection network model obtains a target detection result according to the detection image.

A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the training method for small sample transfer learning according to any one of the above embodiments or the target detection method for small sample transfer learning according to any one of the above embodiments when executing the computer program.

A computer-readable storage medium, which stores a computer program, when being executed by a processor, to implement the small sample transfer learning training method according to any one of the above embodiments or the small sample transfer learning target detection method according to any one of the above embodiments.

According to the technical scheme provided by the invention, in the process of training the detection network model by adopting the training data set, the focus loss function is obtained according to the coordinate error value, the confidence degree error value and the classification error value, and when the coordinate error value, the confidence degree error value and the classification error value are obtained, the auxiliary coordinate error balance coefficient, the auxiliary confidence degree loss balance coefficient and the auxiliary category loss balance coefficient are respectively introduced, so that the corresponding loss weight value is given to the sample in the training data set, the detection training network model can focus on the sample data in the training data set and the similar point of the detection training network model parameter, and the parameter migration of the detection network model is better realized. Therefore, by adopting the technical scheme provided by the invention, the accuracy of the pre-trained detection network model obtained by training can be improved under the condition that a small amount of sample data is adopted to train the detection network model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a flow chart of a small sample transfer learning training method according to an embodiment of the present invention;

FIG. 2 is a flowchart of sample transfer learning of a detection network model according to a focus loss function according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of sample transfer learning according to an embodiment of the invention;

FIG. 4 is a flowchart of a small sample transfer learning target detection method according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for pre-training a detection network model to obtain a target detection result according to a detection image according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a pre-training network according to an embodiment of the present invention;

FIG. 7 is a flow chart of a method for adaptive feature enhancement processing by an asymmetric adaptive feature enhancement layer according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of multi-scale feature extraction of a feature map according to an embodiment of the present invention;

FIG. 9 is a flow diagram of a channel feature enhancement process for multi-scale features according to an embodiment of the present invention;

FIG. 10 is a diagram of a small sample transfer learning training apparatus according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a small sample migration learning target detection apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In an embodiment of the present invention, a small sample transfer learning training method is provided, which is used for training a detection network model by using small sample data to obtain a pre-training detection network model with high accuracy. The small sample transfer learning training method in this embodiment is described in detail below with reference to the flow illustrated in fig. 1.

As shown in fig. 1, the small sample transfer learning training method provided in this embodiment includes:

step S101: acquiring a training data set and a detection network model;

step S102: training the detection network model by adopting a training data set, wherein in the training process:

obtaining a confidence coefficient error value according to whether a detection target exists in the prediction frame and the auxiliary confidence coefficient loss balance coefficient;

obtaining a classification error value according to the target real class probability value, the target quality and the auxiliary classification loss balance coefficient;

As an example, the training data set obtained in step S101 stores sample data, which is an image of the object to be tested provided with a label, for example, when the inspection network model is an inspection network for identifying a semiconductor defect, the sample data is an image of a semiconductor, and the semiconductor is provided with a label indicating a semiconductor defect in the image. The detection network model is used for detecting whether the detected object has defects according to the image of the detected object, the network model is provided with a plurality of convolution layers, and each convolution layer can extract the characteristics of the image of the detected object so as to identify the defects of the detected object.

In this example, the training data set and the detection network model may be obtained through data transmission or data unloading, for example, the training data set and the detection network model may be stored in a removable storage medium such as a usb disk or a mobile hard disk, and then the training data set and the detection network model may be obtained from the removable storage medium through data replication; or storing the training data set and the detection network model on a host or a server, then establishing communication connection with the host or the server, and obtaining the training data set and the detection network model in a network data transmission mode.

As an example, the auxiliary class loss balance coefficient, the auxiliary confidence coefficient loss balance coefficient, and the auxiliary coordinate error balance coefficient in step S102 are all corresponding preset values, and are set by a technician according to experience or experimental tests, and when a classification loss function, a confidence coefficient loss function, and a coordinate loss function are obtained, the auxiliary class loss balance coefficient, the auxiliary confidence coefficient loss balance coefficient, and the auxiliary coordinate error balance coefficient may be respectively introduced to assign corresponding loss weight values to samples in a training data set, so that the detection training network model can focus on similar points of sample data in the training data set and parameters of the detection training network model.

As an example, in acquiring the coordinate error value, the center coordinates of each cell and the distance from the border to the center point are first acquired, and then the coordinate error value is calculated according to the following error loss function:

in the above formula (1), (x, y) is the center coordinate of each cell, (w, h) is the distance of the border frame edge from the center point, p _i () Is the distance between the predicted value and the true value corresponding to (x, y, w, h). As described above

The auxiliary coordinate error balance coefficient is tested by experiments to be auxiliary coordinate error balance coefficient

The value of (d) is set to 1. When the distance between the predicted value and the true value of the center coordinate corresponding to a certain point is small, the point is indicated to be easy to fit a boundary box, p _i () Has small value, is passed through

Making the loss value smaller after calculation; when the distance between the predicted value and the true value of the center coordinate corresponding to a certain point is larger, the point is indicated to be difficult to fit a bounding box, p _i () Greater value, through

The loss value is reduced by a small amount after calculation, so that an optimal boundary box loss weight value is given to each sample, the boundary box loss proportion of the sample difficult to detect is improved, and the training effect is improved.

As an example, when calculating the confidence error, it is first determined whether a detection target exists in the prediction frame of the unit, and then the confidence error value is calculated according to the following confidence error function:

in the above-mentioned formula (2),

if there is no defect target in the jth prediction frame of the ith unit cell

Is 0, if there is a defect target in the jth prediction box of the ith cell

Is 1. Lambda _noobj To weigh the confidence loss when there is a defect object in the jth prediction box of the ith cell, λ can be used in this example _noobj The value of (d) is set to 0.5.C _i And obtaining the confidence coefficient according to the coincidence proportion value of the prediction frame and the real frame.

To assist in the confidence loss balancing coefficients, in this example

The value is set to 1. Confidence of a certain point C _i If the confidence coefficient is greater than the set confidence coefficient, the point is easy to judge the target and passes through

The loss value is made smaller after calculation, when the point is divided into a value C _i When the target is smaller, the point is difficult to distinguish, and the target passes through

The loss value after calculation is not reduced by a large extent. By this exampleThe confidence error value in the method endows each sample with an optimal confidence loss weight value, improves the confidence loss proportion of the sample with difficulty in detection, and improves the training effect.

As an example, when calculating the classification error value, first obtaining a probability value that the detected object belongs to a real category and a defect object exists in the prediction box, and then obtaining the classification error value through the following classification error function:

in the above formula (3), p _i (c) A probability value for detecting that the target belongs to a real category;

defining as the ith unit cell, if there is a defect target in the jth prediction box, and if so, determining whether there is a defect target in the jth prediction box

The value of (1) is 1, otherwise the value of (0) is obtained; b is the number of bounding boxes in each grid; the jth bounding box in each mesh;

to assist the class loss balance coefficient, in this example, one would

The value is set to 2. When a certain point scores a value p _i (c) When the score is larger than the set score, the point is easy to classify, and the process is

The point loss value can be made smaller after calculation; when a certain point scores a value p _i (c) Smaller, it indicates that the point is more difficult to classify, by

The loss value after calculation is not reduced by a large extent, andand each sample is endowed with an optimal class loss weight value, so that the loss value proportion of the sample difficult to detect is improved, and the training effect of few samples is facilitated.

As an example, after obtaining the coordinate error value, the confidence error value, and the classification error value, the calculation result of the focus loss function may be obtained by the following calculation formula:

in this example, a focus loss function is obtained according to the coordinate error value, the confidence error value and the classification error value, and loss weight values can be given to different sample data, so that the detection network model in the training process can focus on similar points of parameters of a small number of new samples and an initial detection network model, sample transfer learning of the detection network model is facilitated, and the defect detection capability of the detection network model under a small number of sample data is improved.

In summary, in the technical scheme of this embodiment, when the coordinate error value, the confidence error value, and the classification error value are obtained, an auxiliary coordinate error balance coefficient, an auxiliary confidence loss balance coefficient, and an auxiliary category loss balance coefficient are respectively introduced, so that a corresponding loss weight value is assigned to a sample in the training data set, so that the detection training network model can focus on a similar point between the sample data in the training data set and a parameter of the detection training network model, and the parameter migration of the detection training network model is better achieved. Therefore, by adopting the technical scheme provided by the invention, the accuracy of the pre-trained detection network model obtained by training can be improved under the condition that a small amount of sample data is adopted to train the detection network model.

In an embodiment, the process of performing sample migration learning on the detection network model according to the focus loss function in step S102 is shown in fig. 2, and includes:

step S121: according to the focus loss function, increasing multiple gradient drops among each convolution layer of the detection network model, and optimizing the network weight in the detection network model;

step S122: and calculating the average value of each gradient loss according to each network weight in the detection network model, and performing gradient descent processing on the detection network model according to the focus loss function.

As an example, as shown in fig. 3, the above-described internal gradient update includes: and increasing gradient descent for multiple times among the convolution layers of the detection network model, and updating parameters by increasing gradient descent for multiple times among each convolution layer to optimize the network weight in the detection network model. When the gradient is decreased, the calculation formula is as follows:

wherein, alpha is the learning rate,

for using the parameter theta in the learning task T _i The value of the focal loss function on (d),

the gradient of the focus loss function value with respect to the parameter theta.

As one example, the external gradient update includes: and calculating the average value of each gradient loss according to each network weight in the pre-training detection network model, and then performing gradient descent processing on the pre-training detection network model again. The external gradient update formula is:

wherein beta is the learning rate, I is the total number of learning tasks, P (T) is the task distribution of the meta-learning training task, and L is the task T _i In model f _θ The loss of θ' is a parameter before update.

In the setting mode of the embodiment, the internal gradient update is performed for each task, so as to find the optimal parameter of each task; and the external gradient updating uses the focus loss function value calculated after the internal gradient updating to perform gradient updating, and updates the randomly initialized detection network model parameters by calculating the gradient relative to the optimal parameters in each new task.

In one embodiment, a small sample transfer learning target detection method is provided, and a flow of the method is shown in fig. 4, and includes:

step S201: acquiring a detection image;

step S202: inputting the detection image into a pre-training detection network model, and obtaining a target detection result by the pre-training detection network model according to the detection image; in this embodiment, the pre-training detection network model is obtained by the small sample transfer learning training method according to any one of the above embodiments.

As an example, the detection image obtained in step S201 may be obtained by a machine vision system to collect information on a surface of a product to be detected, such as glass, a component, and an optical lens, or may be received from another device in an information interaction manner. For example, an execution main body of the small sample transfer learning target detection method provided by the application is a machine vision system, and then an image acquisition device in the machine vision system can detect the surface of a product to be detected to obtain a detection image. For another example, the executing main body of the small sample transfer learning target detection method provided by the application is the upper computer, so that the image acquisition device in the machine vision system can detect the surface of the product to be detected to obtain a detection image, and then the detection image is sent to the upper computer through the communication connection between the machine vision system and the upper computer.

As an example, the pre-trained detection network model in step S202 is a network model obtained by the small sample transfer learning training method in each of the above embodiments, and has an advantage of high accuracy, so that in this example, the accuracy of the target detection result can be improved by processing the detection image with the pre-trained detection network model.

In one embodiment, the pre-training detection network model has a plurality of convolutional layers, corresponding asymmetric adaptive feature enhancement layers are arranged between adjacent convolutional layers, a feature map output by the first convolutional layer is used as a low-level feature map, and a feature map output by the last convolutional layer is used as a high-level feature map. In this embodiment, a method for pre-training a detection network model to obtain a target detection result according to a detection image is shown in fig. 5, and includes:

step S221: the method comprises the steps that a convolution layer in a pre-training detection network model extracts features of a detection image, an asymmetric adaptive feature enhancement layer performs adaptive feature enhancement processing on the features extracted by the previous convolution layer, and the feature enhancement processing result is input into the next convolution layer;

step S222: fusing the obtained low-order characteristic diagram and the high-order characteristic diagram by the pre-training detection network model to obtain a fused characteristic image;

step S223: and identifying the fusion characteristic image to obtain a target detection result.

As an example, since the defect target is composed of low-order features such as contours and textures, and shallow features with high fine granularity are important for the pre-training detection network model, the pre-training detection network model in this example is a full convolutional neural network framework built by convolutional layers with consecutive convolutional kernel sizes of 1 × 1 and 3 × 3, as shown in table 1.

TABLE 1

As can be seen from table 1, the pre-training detection network model in this example has 8 convolutional layers, two convolution kernels are provided between each convolutional layer, the sizes of the convolution kernels are 3 × 3 and 1 × 1, and in the forward propagation process of the pre-training detection network model, the convolution kernel with the size of 3 × 3 can implement dimension transformation of the feature map, so that loss of feature information is reduced, high resolution of convolutional output is ensured, and detection accuracy is improved. Feature fusion is carried out on feature maps on different channels with the same depth in the forward propagation process by a convolution kernel with the size of 1 x 1, and the complexity of the model is reduced through dimension compression. In order to improve the detection capability of the defects, in the example, a corresponding asymmetric adaptive feature enhancement layer is arranged between every two adjacent convolutional layers in the pre-training detection network model, and asymmetric adaptive feature enhancement processing is performed on a feature map obtained by every convolutional layer.

In this example, the feature map output by the 8 th convolutional layer of the pre-trained detection network model is taken as the high-level feature map, the feature map output by the 5 th convolutional layer is taken as the low-level feature map, as shown in FIG. 6, where H and W are the length and width of the input feature image of the asymmetric adaptive feature enhancement layer, C is the number of channels of the input feature map of the asymmetric adaptive feature enhancement layer, D is the number of channels of the output feature map, a "-" indicates a channel multiplication operation,

a signature graph stitching operation is shown.

Therefore, the high-order feature map in this example is a feature map obtained by performing 32-fold downsampling detection on the detection image, and the low-order feature map is a feature map obtained by performing 16-fold downsampling detection.

As an example, in step S221, each convolution layer in the pre-trained detection network model performs convolution processing on the detection image to extract features, and then inputs the extracted features into its next asymmetric adaptive feature enhancement layer, and the asymmetric adaptive feature enhancement layer performs adaptive feature enhancement processing on the obtained features, and then inputs the feature enhancement processing result into the next convolution layer. For example, the 1 st convolutional layer performs feature extraction on the detection image, then inputs the extracted features to an asymmetric adaptive feature enhancement layer between the 1 st convolutional layer and the 2 nd convolutional layer, and the asymmetric adaptive feature enhancement layer performs asymmetric adaptive feature enhancement on the features extracted by the 1 st convolutional layer to obtain corresponding feature enhancement processing results, and then inputs the feature enhancement processing results to the 2 nd convolutional layer.

In one example, when the pre-training detection network model fuses the low-order feature map and the high-order feature map in step S222, the high-order feature map is first up-sampled by 2 times, and then the up-sampled processing result is fused with the low-order feature map to obtain a fused feature image, for example, the low-order feature map and the high-order feature map may be convolved by a convolution network to fuse the low-order feature map and the high-order feature map.

In one example, the fused feature image is identified in step S223, and a model with a pattern recognition function may be used to identify a defect object on the fused feature image, so as to obtain the shape, size and type (such as scratch, gap, etc.) of the defect object.

In summary, in this embodiment, the pre-trained detection network model has a plurality of convolutional layers, and a corresponding asymmetric adaptive feature enhancement layer is disposed between adjacent convolutional layers, where the convolutional layers can extract the detection image, and the asymmetric adaptive feature enhancement layer can perform adaptive feature enhancement processing on the features extracted by the convolutional layers, so as to increase the recognition degree of the features in the detection image; and the pre-training detection network model also fuses the low-order characteristic diagram and the high-order characteristic diagram, and performs characteristic extraction on the fused characteristic image, so that the extraction capability of high and fine granularity can be increased, the accuracy of the pre-training detection network model is increased, and the accuracy of a target detection result is improved.

In an embodiment, as shown in fig. 7, the performing, by the asymmetric adaptive feature enhancement layer in step S221, an adaptive feature enhancement on the feature extracted from the previous convolutional layer, and inputting the result of the feature enhancement into the next convolutional layer includes:

step S231: carrying out asymmetric multi-scale feature extraction on the features extracted by the previous convolution layer to obtain multi-scale features;

step S232: performing channel feature enhancement processing on the multi-scale features to obtain enhanced multi-scale features;

step S233: and performing feature fusion on the enhanced multi-scale features to obtain a feature enhancement processing result, and inputting the feature enhancement processing result into the next volume of lamination.

As an example, in step S231, asymmetric multi-scale feature extraction is performed on the feature extracted by the previous convolution layer by using a multi-branch network, so as to obtain a multi-scale feature. For example, when the asymmetric adaptive feature enhancement layer is located between the 1 st convolutional layer and the 2 nd convolutional layer, the multi-branch network performs asymmetric multi-scale feature extraction on the features extracted by the 1 st convolutional layer to obtain multi-scale features.

In this example, three network branches with convolution kernel sizes of 3 × 3, 1 × 3 and 3 × 1 may be set in the multi-branch network, as shown in fig. 8, and each branch network obtains texture features under different receptive fields, and this process may be expressed by the following formula:

x' _i ＝F _3×3 *(x _i-1 )+F _1×3 *(x _i-1 )+F _3×1 *(x _i-1 ) (7)

wherein x is _i-1 Input feature value of asymmetric volume block, F _3×3 、F _1×3 、F _3×1 Denotes a 3 × 3 convolution kernel, a 1 × 3 convolution kernel, and a 3 × 1 convolution kernel, respectively, 'denotes a convolution operation, x' _i And representing feature information obtained by performing feature fusion on features on each channel by using convolution operation.

As an example, the channel feature enhancement processing performed on the multi-scale features in step S202 is feature information x 'obtained by feature fusion of features on the respective channels obtained by the branch network' _i Performing channel feature enhancement processing to increase definition of target texture and improve accuracy of target detection to obtain enhanced multi-scale features U _n 。

As an example, in step S203, when performing feature fusion on the enhanced multi-scale features, the enhanced feature map may be subjected to batch normalization processing and Relu activation processing:

wherein X _n Is the output of the asymmetric adaptive feature enhancement module, U _n Is the input to the asymmetric adaptive feature enhancement module. δ is the activation function of ReLU, var () and E () represent the variance function and expectation of the inputs. Mu.s _n The denominator is not 0, and a very small positive real number is set. Gamma ray _n And epsilon _n Is two trainable parameters of BN layer, and the normalization result can be gamma _n Scaling is performed with epsilon _n Translation is performed.

In an embodiment, as shown in fig. 9, the performing, in step S232, channel feature enhancement processing on the multi-scale feature to obtain an enhanced multi-scale feature includes:

step S241: carrying out global average pooling operation on the multi-scale features to obtain global average pooling values on each channel;

step S242: exciting the global average pooling value on each channel by adopting a sigmoid activation function to obtain the weight given by the pooling value of the feature map on each channel;

step S243: and calibrating the multi-scale features according to the weight to obtain the multi-scale features after the features are enhanced.

As an example, in step S241, when performing a global average pooling operation on the multi-scale features, an adopted formula is as follows:

wherein, X' _(ij) Is characteristic diagram x' _i Characteristic values of each position on the plane, M and N are respectively the length and the width of the characteristic graph, i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to N, and channel statistical information Z belongs to R by compressing the dimension of space MxN ⁿ 。

As an example, in step S212, a sigmoid activation function is used to perform excitation operation on the global average pooling value on each channel, so as to obtain a weight given to the pooling value of the feature map on each channel, where the calculation formula is as follows:

in the formula beta _k Pooled value Z expressed as a feature map on the k-th channel _k The assigned weight value, sigma, represents sigmoid function, delta is relu activation function, w is full connection layer, w (delta. Z) _k ) Represents a reaction of Z _k A non-linear full join operation is performed.

As an example, calibrating the multi-scale features according to the weights to obtain the feature-enhanced multi-scale features, where an adopted formula is as follows:

U _n ＝X' _(i,j) ·β _k (11)

in an embodiment, a small sample transfer learning training device is provided, which corresponds to the sample transfer learning training method in the above embodiment one to one. As shown in fig. 10, the apparatus includes a detection image acquisition 11 and a detection result acquisition module 12. The detailed description of each functional module is as follows:

the data acquisition module 11 is used for acquiring a training data set and a detection network model;

model training module 12 is used for adopting the training data set to train the detection network model, and model training module 12 includes:

an auxiliary loss coefficient obtaining unit 121, configured to obtain an auxiliary category loss balance coefficient, an auxiliary confidence loss balance coefficient, and an auxiliary coordinate error balance coefficient;

a coordinate error value obtaining unit 122, configured to obtain a coordinate error value according to a central coordinate error, a bounding box error, and an error balance coefficient of the unit;

a confidence error value obtaining unit 123, configured to obtain a confidence error value according to whether the detection target and the auxiliary confidence loss balance coefficient exist in the prediction frame;

a classification error value obtaining unit 124, configured to obtain a classification error value according to the target real category probability value, the target quality, and the auxiliary classification loss balance coefficient;

a focus loss function obtaining unit 125, configured to obtain a focus loss function according to the coordinate error value, the confidence error value, and the classification error value;

and the transfer learning training unit 126 is configured to perform sample transfer learning on the detection network model according to the focus loss function, and optimize parameters of the detection network model to obtain a pre-training detection network model.

For specific limitations of the small sample migration learning training device, reference may be made to the above limitations of the small sample migration learning training method, which is not described herein again. All or part of each module in the small sample transfer learning training device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a small sample transfer learning target detection device is provided, and the device corresponds to the sample transfer learning target detection method in the foregoing embodiment one to one. As shown in fig. 11, the apparatus includes a detection image acquisition 21 and a detection result acquisition module 22. The detailed description of each functional module is as follows:

the detection image obtaining module 21 is configured to obtain a detection image;

the detection result obtaining module 22 is configured to input a detection image into a pre-training detection network model, where the pre-training detection network model is obtained by the small sample transfer learning training method in the embodiment; and the pre-training detection network model obtains a target detection result according to the detection image.

For specific limitations of the small sample migration learning target detection device, reference may be made to the above limitations of the small sample migration learning target detection method, which are not described herein again. All or part of the modules in the small sample transfer learning target detection device can be realized by software, hardware and the combination of the software and the hardware. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing and executing a small sample transfer learning training method or a small sample transfer learning target detection method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a small sample transfer learning training method or a small sample transfer learning target detection method.

In an embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implements the steps of the small sample transfer learning training method in the above-described embodiment, for example, steps S101 to S102 shown in fig. 1, or steps in fig. 2, when executing the computer program, and the processor implements the functions of the modules in the small sample transfer learning training apparatus in the above-described embodiment, for example, the functions of modules 11 to 12 shown in fig. 10. To avoid repetition, the description is omitted here.

In an embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor when executing the computer program implements the steps of the small sample migration learning object detection method in the above-described embodiment, for example, steps S221 to S223 shown in fig. 5, or steps in fig. 7 and 9, and the processor when executing the computer program implements the functions of the modules in the small sample migration learning object detection apparatus in the above-described embodiment, for example, the functions of modules 21 to 22 shown in fig. 11. To avoid repetition, the description is omitted here.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the small sample transfer learning training method in the above-described method embodiment, for example, the steps S101 to S102 shown in fig. 1, or the steps in fig. 2, and when the processor executes the computer program, the processor implements the functions of the modules in the small sample transfer learning training apparatus in the above-described embodiment, for example, the functions of the modules 11 to 12 shown in fig. 10. To avoid repetition, further description is omitted here.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, the computer program when being executed by a processor implements the steps of the small sample migratory learning target detection method in the above-mentioned embodiment, for example, the steps S221 to S223 shown in fig. 5, or the steps in fig. 7 and 9, and the processor when executing the computer program implements the functions of the modules in the small sample migratory learning target detection apparatus in the above-mentioned embodiment, for example, the functions of the modules 21 to 22 shown in fig. 11. To avoid repetition, further description is omitted here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambuS (RambuS) direct RAM (RDRAM), direct RambuS Dynamic RAM (DRDRAM), and RambuS Dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above.

The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A small sample transfer learning training method is characterized by comprising the following steps:

acquiring a training data set and a detection network model;

2. The small sample transfer learning training method according to claim 1, wherein the performing sample transfer learning on the detection network model according to the focus loss function includes:

3. A small sample transfer learning target detection method is characterized by comprising the following steps:

acquiring a detection image;

inputting the detection image into a pre-training detection network model obtained by the small sample transfer learning training method of claim 1 or 2;

4. The small sample transfer learning target detection method of claim 3, wherein the pre-trained detection network model has a plurality of convolutional layers, and corresponding asymmetric adaptive feature enhancement layers are disposed between adjacent convolutional layers, wherein the first convolutional layer outputs a low-order feature map, and the last convolutional layer outputs a high-order feature map;

the pre-training detection network model obtaining a target detection result according to the detection image comprises the following steps:

5. The small sample transfer learning target detection method of claim 4, wherein the asymmetric adaptive feature enhancement layer performs adaptive feature enhancement on the features extracted by the previous convolution layer, and comprises:

performing asymmetric multi-scale feature extraction on the features extracted by the previous convolution layer to obtain multi-scale features;

6. The method for detecting the small sample transfer learning target according to claim 5, wherein the performing channel feature enhancement processing on the multi-scale features to obtain enhanced multi-scale features comprises:

carrying out characteristic value excitation processing on the average pooling values on the channels to obtain weight values endowed by the pooling values of the general characteristic diagrams;

7. A small sample transfer learning training device is characterized by comprising:

a confidence coefficient error value obtaining unit, configured to obtain a confidence coefficient error value according to whether a detection target and the auxiliary confidence coefficient loss balance coefficient exist in the prediction frame;

the focus loss function obtaining unit is used for obtaining a focus loss function value according to the coordinate error value, the confidence coefficient error value and the classification error value;

8. A small sample transfer learning target detection device is characterized by comprising:

the detection image acquisition module is used for acquiring a detection image;

a detection result obtaining module, configured to input the detection image into a pre-training detection network model obtained by the small sample transfer learning training method according to claim 1 or 2; and the pre-training detection network model obtains a target detection result according to the detection image.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the small sample transfer learning training method according to any one of claims 1-2 or the small sample transfer learning target detection method according to any one of claims 3-6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, implements the training method for small sample transfer learning according to any one of claims 1 to 2 or the target detection method for small sample transfer learning according to any one of claims 3 to 6.