CN110188829B

CN110188829B - Neural network training method, target recognition method and related products

Info

Publication number: CN110188829B
Application number: CN201910472707.5A
Authority: CN
Inventors: 葛艺潇; 陈大鹏; 沈岩涛; 王晓刚; 李鸿升
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2022-01-28
Anticipated expiration: 2039-05-31
Also published as: CN110188829A

Abstract

The embodiment of the application discloses a training method of a neural network, a target recognition method and related products, wherein the training method comprises the following steps: generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network; processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network; the image sample in the united domain and the current labeling result thereof comprise the image sample in the source domain and the labeling result thereof, and the image sample in the target domain and the current pseudo labeling result thereof; and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.

Description

Neural network training method, target recognition method and related products

Technical Field

The application relates to the technical field of computer vision, in particular to a neural network training method, a target recognition method and related products.

Background

In the field of computer vision technology, the problem of object recognition has been the focus of research as the basis for many computer vision applications. The problem of target identification based on domain self-adaptation of inter-domain distribution difference has gradually become a research hotspot in the field of computer vision. The method has important significance for the research of the problem, and is mainly reflected in that: if the sample data of the target domain is added during training, the reusability of the classifier or the detector can be improved, the adaptability of the neural network to a new environment is effectively enhanced, and the training process of the neural network is independent of an application scene to a great extent.

In the prior art, when domain adaptation is implemented, style conversion is generally performed on an image of a source domain to match a target domain, and then a neural network is finely tuned, but the method for implementing domain adaptation has a high requirement on the image quality in the source domain, and the finely tuned neural network only learns the style characteristics of the image in the target domain, and does not learn the interrelation between the images in the target domain and the relationship between the images in the target domain and the source domain.

Disclosure of Invention

The embodiment of the application provides a training method of a neural network, a target identification method and related products, and the method generates a pseudo label for an image sample in a target domain, so that the field self-adaptation is realized during the neural network training, and the trained target neural network has stable identification performance and high identification precision.

In a first aspect, an embodiment of the present application provides a method for training a neural network, including:

generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;

the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof;

processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;

and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.

In some embodiments, the image samples and their labeling results in the training image set further include image samples and their previous pseudo-labeling results in the target domain.

In some embodiments, the number of the first neural networks is N, where N is an integer greater than 1, the network structures of the N first neural networks are the same, and the parameter values of the corresponding network parameters of the N first neural networks are different;

processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network, wherein the processing result comprises:

processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;

according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, adjusting the parameter value of the network parameter of the first neural network, including:

and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.

In some embodiments, in the case where, for a first neural network, the parameter values of the network parameters of the first neural network are adjusted according to the training result of the first neural network, the current-time labeling result of the image sample in the joint domain, and the training results of N-1 first neural networks other than the first neural network, the method further includes:

before the image samples in the joint domain are processed by the N first neural networks, extracting a preset number of image samples from the joint domain;

preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;

processing image samples in the joint domain with the N first neural networks, including:

and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.

In some embodiments, generating a current-time pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network comprises:

and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.

In some embodiments, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;

processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;

and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

In some embodiments, prior to processing the image samples in the joint domain with the second and third neural networks in each neural network pair, the method further comprises:

extracting a preset number of image samples from the joint domain;

preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;

processing the image samples in the joint domain with a second neural network and a third neural network in each neural network pair, comprising:

and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.

and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.

In some embodiments, the method further comprises:

and processing the images in the test image set by using the N first neural networks after parameter adjustment, and taking the first neural network with the optimal processing result as a target neural network.

In some embodiments, the method further comprises:

and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.

In a second aspect, an embodiment of the present application provides a target identification method, including:

and performing image processing by using a target neural network to identify a target in the image, wherein the target neural network is trained by adopting the method of the first aspect.

In a third aspect, an embodiment of the present application provides a training apparatus for a neural network, including:

the pseudo-annotation generating unit is used for generating a current pseudo-annotation result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;

the processing unit is used for processing the image samples in the joint domain by utilizing the first neural network and outputting a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;

and the adjusting unit is used for adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.

the processing unit, when processing the image sample in the joint domain by using the first neural network and outputting a processing result of the first neural network, is specifically configured to: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;

an adjusting unit, configured to, when adjusting a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, specifically:

In some embodiments, in a case where, for a first neural network, a parameter value of a network parameter of the first neural network is adjusted according to a training result of the first neural network, a current-time labeling result of an image sample in a joint domain, and training results of N-1 first neural networks other than the first neural network, the apparatus further includes: a pre-processing unit;

the preprocessing unit is used for extracting a preset number of image samples from the joint domain before the image samples in the joint domain are processed by the N first neural networks; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;

the processing unit, when processing the image sample in the joint domain by using the N first neural networks, is specifically configured to: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.

In some embodiments, the pseudo-annotation generating unit, when generating the current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.

the processing unit, when processing the image sample in the joint domain by using the first neural network and outputting a processing result of the first neural network, is specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;

an adjusting unit, configured to, when adjusting a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, specifically: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

In some embodiments, the apparatus further comprises a pre-processing unit;

the preprocessing unit is used for extracting a preset number of image samples from the joint domain before the image samples in the joint domain are processed by using the second neural network and the third neural network in each neural network pair; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;

the processing unit, when processing the image sample in the joint domain using the second neural network and the third neural network in each neural network pair, is specifically configured to: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.

In some embodiments, the pseudo-annotation generating unit, when generating the current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.

In some embodiments, the device further comprises a test unit;

and the test unit is used for processing the images in the test image set by using the N first neural networks after the parameters are adjusted, and taking the first neural network with the optimal processing result as a target neural network.

In some embodiments, the device further comprises a test unit;

and the test unit is used for processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.

In a fourth aspect, an embodiment of the present application provides an apparatus for target identification, including:

an identifying unit, configured to perform image processing using a target neural network to identify a target in the image, where the target neural network is trained by the method of the first aspect.

In a fifth aspect, the present application provides an electronic device, including a processor and a memory, where the memory is configured to store computer-readable instructions, and the processor is configured to call the instructions stored in the memory to execute the instructions of the steps in the method according to the first aspect or the second aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program, which when executed by a processor implements the method according to the first or second aspect.

In a seventh aspect, the present application provides a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer being operable to cause a computer to perform the method according to the first or second aspect.

The embodiment of the application has the following beneficial effects:

it can be seen that, in the embodiment of the present application, by adding a pseudo-annotation result to an image sample in a target domain, the image sample in the target domain also includes supervisory information, so that the image sample in a joint domain composed of a source domain and the target domain all includes the supervisory information, and thus, the image sample in the joint domain can be utilized to perform optimization training on a neural network; meanwhile, because a pseudo-labeling result is added to the image sample in the target domain, the neural network can learn the relation between the image samples of the target domain and the relation between the image sample of the source domain and the image sample of the target domain when the neural network is trained by using the image sample in the joint domain, so that the trained target neural network can adapt to various recognition scenes; and moreover, the image samples in the target domain are used for the optimization training of the neural network, so that the image samples in the training image set are richer, and the identification precision of the target neural network is higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for training a neural network according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;

fig. 2A is a schematic diagram of a training scenario of a training method of a neural network according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;

FIG. 3A is a schematic diagram of a training scenario of another training method for a neural network according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an exercise device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an apparatus for object recognition according to an embodiment of the present application;

fig. 6 is a block diagram illustrating functional units of a training apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

When the domain self-adaptation is realized, firstly, the neural network training is carried out by utilizing the image samples (the image samples in the source domain have corresponding labeling results) in the source domain (source domain) to obtain an initial neural network (), then the pseudo labels (pseudo labeling results) are added to the image samples in the target domain (target domain), and the source domain and the target domain added with the pseudo labels form a combined domain D, so that each image sample in the combined domain D comprises the labels (labeling results), and at the moment, the combined domain can be adopted to carry out supervision training on the initial neural network to obtain the neural network with strong adaptability. Adding the pseudo label to the image sample in the target domain can extract the characteristic information of the image sample in the target domain through the initial neural network, clustering the characteristic information, and adding the pseudo label to the image sample in the target domain through clustering. Assuming that the initial neural network has a characteristic transfer function of F (,; theta), such as a cross-entropy loss function, when the initial neural network is supervised and trained by using a joint domain, the corresponding classification loss function is:

wherein L is_idAs a function of classification loss, x_iFor the i-th image sample in the joint domain D, y_iAs image samples x_iCorresponding supervisory information, C (F (x)_i(ii) a θ)) is a pair of image samples x_iN is the image sample in the joint domain DTotal number, L_ceIs a cross entropy loss function.

Because the supervision information of the partial image samples in the joint domain D is a pseudo label obtained by clustering, learning of features is insufficient, and noise (i.e., an erroneous label) is inevitably caused, resulting in low recognition accuracy of the optimized neural network. In order to solve the above defects, the technical solution of the present application is proposed to solve the problem of noise generated when a pseudo tag is added to an image sample in a target domain, and implement domain adaptation.

Referring to fig. 1, fig. 1 is a method for training a neural network according to an embodiment of the present application, where the method includes:

step 101: and generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network.

The first neural network is obtained by training image samples in a training image set and labeling results thereof, the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof, and the first neural network is obtained by training only the image samples in the source domain.

Optionally, the implementation process of generating the current pseudo-annotation result of the image sample in the target domain may be: clustering the characteristics of each image sample in the extracted target domain to obtain a clustering result, generating a pseudo-labeling result of the image sample in the target domain based on the clustering result, namely, coding each cluster in the clustering result to obtain a pseudo-labeling result corresponding to each cluster, then determining the cluster to which each image sample in the target domain belongs, and taking the pseudo-labeling result corresponding to the cluster as the pseudo-labeling result of the image sample, so that the pseudo-labeling result of each image sample is only used for distinguishing the image sample in the target domain and does not represent the real identity corresponding to the image sample. For example, if the clustering results in 500 clusters, the pseudo labeling results of the 500 clusters can be respectively encoded as the numbers "1", "2", … "and" 500 ", and the image samples in the target domain can be distinguished by the numbers" 1 "," 2 ", …" and "500".

The Clustering algorithm used for Clustering may be one of a K-means Clustering algorithm, a maximum Expectation Clustering algorithm EM (english: Expectation Clustering), a Hierarchical Clustering algorithm HAC (english: Hierarchical Clustering), and the like.

The image of the source domain and the image of the target domain may contain the same object, for example, for the pedestrian re-identification technology, the same pedestrian may appear in the image of the source domain and also in the image of the target domain; however, there may be a difference in time or environment between the image of the source domain and the image of the target domain, for example, the image of the source domain is the image of a daytime region, and the image of the target domain is the image of a night region.

For example, for the pedestrian re-identification technology, the image samples in the source domain and the target domain are independently distributed, and the image samples in the source domain include the artificial labeling result, which reflects the real identity information corresponding to each image sample, but the image samples in the target domain do not include the labeling result. Therefore, firstly, only image samples in a source domain are adopted for neural network training to obtain a pedestrian recognition network; if the image samples in the target domain are used for optimizing the pedestrian identification network, the image samples in the target domain need to be added with labels, then the image samples in the target domain and the image samples in the source domain form a combined domain, and the image samples in the source domain are used for optimizing the pedestrian identification network, so that the pedestrian identification network can identify the identity of people in different environments. For example, two cameras capture two images of the same pedestrian at two places (the two places have different ambient brightness), and the optimal person identification network outputs the two images as the identification result of the same pedestrian when identifying the two images.

Step 102: and processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network.

The processing result is a classification result of the first neural network on the image sample, that is, a probability value of the image sample belonging to each category.

The image samples in the joint domain and the current labeling result thereof comprise the image samples in the source domain and the labeling result thereof, and the image samples in the target domain and the current pseudo labeling result thereof.

The image samples in the joint domain are composed of image samples in a source domain and image samples in a target domain, and the annotation result of the image samples in the joint domain comprises an artificial annotation result of the image samples in the source domain and a pseudo annotation result added to the image samples based on the first neural network in the target domain.

Step 103: and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.

Further, in the process of training the first neural network by using the image samples in the joint domain, after the image samples in the joint domain all participate in one training (for example, the image samples in the joint domain include 100, 10 image samples are taken in each training, and then after 10 times of training, the image samples in the joint domain all participate in one training), the trained neural network is used to generate the pseudo-labeling result again by pseudo-imaging the image samples in the target domain. Therefore, in some embodiments, the image samples and their labeling results in the training image set include, in addition to the image samples and their labeling results in the source domain, the image samples and their previous pseudo labeling results in the target domain, where the pseudo labeling results are pseudo labeling results added to the image samples in the target domain during the last training.

It can be understood that, after the parameter values of the network parameters of the first neural network are adjusted, the first neural network after the network parameters are adjusted is used to extract the features in the image samples in the target domain, and a pseudo-annotation result is generated for the image samples again to form the annotation sister u of the image samples in the new joint domain, the image samples in the joint domain are processed again by using the first neural network, the processing result of the first neural network is output again, the parameter values of the network parameters of the first neural network are adjusted by using the processing result output again and the annotation result of the image samples in the new joint domain, and so on, after the similar operation is repeated for a plurality of times, when the performance of the first neural network is stable and optimal, the training process of the neural network is stopped.

Referring to fig. 2, fig. 2 is a diagram of another training method for a neural network according to an embodiment of the present application, where the method is applied to a training scenario in which N first neural networks are provided, the N first neural networks have the same network structure, and parameter values of corresponding network parameters of the N first neural networks are different, and the method includes:

step 201: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.

Optionally, each of the N first neural networks is used to perform feature extraction on the image sample in the target domain, so as to obtain N features corresponding to the image sample, determine an average value of the N features, so as to obtain an average feature corresponding to the image sample, and then perform clustering based on the average feature corresponding to each image sample, so as to obtain a pseudo-labeling result corresponding to each image sample, where the clustering method refers to the process shown in step 101, and is not described herein again.

Step 202: and processing the image samples in the joint domain by using the N first neural networks, and outputting the processing results of the N first neural networks.

Step 203: and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.

Optionally, the implementation process of adjusting the parameter value of the network parameter of the first neural network may be: determining a first classification loss of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, determining a second classification loss according to the processing result of the first neural network and the processing result of part or all of the N first neural networks except the first neural network, summing the first classification loss and the second classification loss to obtain a target classification loss of the first neural network, and adjusting the parameter value of the network parameter of the first neural network based on the target classification loss. The target classification loss may be processed by a gradient descent method to adjust parameter values of network parameters of the second neural network.

Referring to fig. 2A, fig. 2A is a training scenario of a training method of a neural network provided in an embodiment of the present application, as shown in fig. 2A, the number of the first neural networks is 2, the two first neural networks are two neural networks for mutual supervised learning, and a process of adjusting parameter values of network parameters is specifically described below by taking the two first neural networks shown in fig. 2A as an example.

Suppose that the first neural network 1 has F (,; θ)₁) The first neural network 2 has F (,; theta₂) The first classification loss corresponding to the first neural network 1 is shown in formula (2), and the second classification loss is shown in formula (3):

wherein L is_id(θ₁) For a first classification loss, L, corresponding to the first neural network 1_ceAs a cross-entropy loss function, x_iFor the ith image sample in the joint domain, θ₁Is a network parameter, y, of the first neural network 1_iFor the labeling result of the ith image sample, C₁(F(x_i；θ₁) Is the processing result of the first neural network 1 on the ith image sample, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.

Wherein L is_soft(θ₁|θ₂) For a corresponding second classification loss, C, of the first neural network 1₂(F(x′_i；θ₂) Are second neural network 2 to image sample x'_iResult of treatment of (1), C₁(F(x_i；θ₁) The first neural network 1 on image samples x_iAnd (4) as a processing result, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.

And finally, adding the formula (2) and the formula (3) to obtain the target classification loss of the first neural network 1, processing the target classification loss by using a gradient descent method based on the target classification loss, and adjusting the network parameter theta of the first neural network 1₁The parameter value of (2). Similarly, the process of adjusting the parameter value of the network parameter of the second neural network 2 is consistent with the process of adjusting the parameter of the network parameter of the second neural network 1, and is not repeated.

Furthermore, a domain discriminator can be connected behind each first neural network, the characteristics of the image sample in the combined domain extracted by the first neural network are obtained, the characteristic information is input into the domain discriminator, and the domain discriminator is used for determining the domain probability of the image sample, wherein the domain probability is the probability that the image sample belongs to the source domain or the target domain; determining a domain classification loss based on the domain probability; combining the domain classification loss with the target classification loss to obtain a final classification loss, processing the final classification loss by using a gradient descent method, and adjusting a network parameter theta of the first neural network 1₁The parameter value of (2).

Wherein the domain classification loss is shown in formula (4):

wherein L is_DFor domain classification loss, w is the discrimination parameter of the domain discriminator,

as the ith image in the source domainThe sample is taken from the sample container,

for the ith image sample in the target domain, N_sIs the total number of training samples in the source domain, N_tThe total number of training samples in the target domain,

pair domain discriminator pair image sample

As a result of the processing of (1),

pairing image samples for domain discriminators

The result of the processing of (1).

It can be seen that, in this embodiment, compared with the formula (1), the initial neural network is supervised and trained only by using the pseudo-labeling result, in the present scheme, while the first neural network is supervised and trained by using the pseudo-labeling result, a processing result of another first neural network is also adopted as the supervision information of the first neural network, and the first neural network is supervised and trained second by using the supervision information.

In some embodiments, prior to processing the image samples in the joint domain with the N first neural networks, the method further comprises: extracting a preset number of image samples from the joint domain; and carrying out N kinds of preprocessing in different modes on the extracted image samples to obtain N kinds of preprocessed image samples. Because the image samples are randomly disturbed, the difference of the image samples is ensured, the richness of the image samples used in the network optimization is ensured, and the trained target neural network can adapt to complicated and changeable input samples.

The N different ways of preprocessing the extracted image samples may be: and randomly disturbing the characteristics of each extracted image sample to obtain N preprocessed image samples. For example, N kinds of preprocessed image samples can be obtained by performing masking processing on the extracted image samples N times, using the remaining part after each masking as an image sample, and masking different regions each time.

In some embodiments, based on the N pre-processed image samples obtained above, the processing of the image samples in the joint domain by using the N first neural networks may be implemented by: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.

In some embodiments, the method further comprises: and processing the images in the test image set by using the N first neural networks after parameter adjustment, and taking the first neural network with the optimal processing result as a target neural network. In the present embodiment, the first neural network with the optimal processing result is acquired so that the recognition accuracy is high when the target is recognized by the first neural network.

Referring to fig. 3, fig. 3 is a training method of another training method for a neural network according to an embodiment of the present disclosure, where the training method is applied to a training scenario in which the number of first neural networks is N, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training; the method comprises the following steps:

step 301: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.

Step 302: and processing the image sample in the joint domain by utilizing the second neural network and the third neural network in each neural network pair, and outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair.

Step 303: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

Optionally, the implementation process of adjusting the network parameter values of the second neural network may be: determining a first classification loss of the second neural network according to a processing result of the second neural network and a current labeling result of the image sample in a joint domain, determining a second classification loss of the second neural network according to the processing result of the second neural network and a processing result of a third neural network in each neural network pair except the neural network pair comprising the second neural network, summing the first classification loss and the second classification loss to obtain a target classification loss of the second neural network, and adjusting a parameter value of a network parameter of the second neural network based on the target classification loss, namely processing the target classification loss by a gradient descent method so as to adjust the parameter value of the network parameter of the second neural network.

Referring to fig. 3A, fig. 3A is another training scenario of a neural network provided in the embodiment of the present application, as shown in fig. 3A, the number of the first neural networks is 4, and includes 2 neural network pairs, and an adjustment process for adjusting network parameters of a second neural network in each neural network pair is specifically described below by taking two neural network pairs shown in fig. 3A as an example.

Suppose that the second neural network 1 of the first neural network pair has F (,; θ)₁) Characteristic transformation function of, first godThe third neural network 1 via the network pair (comprising the second neural network 1 and the third neural network 1) has F (,; E [ theta ])₁]) The second neural network 2 of the second neural network pair (including the second neural network 2 and the third neural network 2) has F (,; theta₂) The third neural network 2 of the second neural network pair has F (,; e [ theta ]₂]) The first classification loss corresponding to the second neural network 1 is shown in formula (5), and the second classification loss is shown in formula (6):

wherein L is_id(θ₁) First class loss, L, for second neural network 1 of the first neural network pair_ceAs a cross-entropy loss function, x_iFor the ith image sample in the joint domain, θ₁Network parameter, y, of a second neural network 1 being a first neural network pair_iFor the labeling result of the ith image sample, C₁(F(x_i；θ₁) Is the processing result of the second neural network 1 on the ith image sample, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.

Wherein L is_mt(θ₁|θ₂) For a corresponding second classification loss, C, of the second neural network 1₂(F(x′_i；E[θ₂]) X 'of the third neural network 2 to image sample'_iResult of treatment of (1), C₁(F(x_i；θ₁) Second neural network 1 on image samples x_iResult of processing of (1), θ₁Network parameters of the second neural network 1, E [ theta ]₂]And N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.

Finally, the formula (4) and the formula (5) are added to obtainTarget classification loss of the second neural network 1 of the first neural network pair, based on which the network parameter θ of the second neural network 1 of the first neural network pair is adjusted using a gradient descent method₁The process of adjusting the parameter value of the network parameter of the second neural network 1 of the second neural network pair is the same as the process of adjusting the parameter value of the network parameter of the second neural network of the first neural network pair, and is not repeated.

Further, the target classification loss and the domain classification loss of the above formula (4) are combined to obtain a final classification loss corresponding to the second neural network 1 of the first neural network pair, and similarly, the final classification loss is processed by using a gradient descent method to adjust the parameter value of the network parameter of the second neural network 1 of the first neural network pair.

In the above embodiment taking two neural network pairs as an example, the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training, and the specific determination process can be seen in formula (7) and formula (8);

E[θ₁]＝αE′[θ₁]+(1-α)θ₁ (7)；

wherein, E [ theta ]₁]Is a network parameter of the third neural network 1, E' [ theta ]₁]The network parameter, θ, of the third neural network 1 after each training before the current training₁The network parameters of the second neural network 1 after the current training and adjustment are set, and alpha is a preset hyper-parameter.

E[θ₂]＝αE′[θ₂]+(1-α)θ₂ (8)；

Wherein, E [ theta ]₂]Is a network parameter of the third neural network 2, E' [ theta ]₂]The network parameter, theta, of the third neural network 2 after each training before the current training₂The network parameter of the second neural network 2 after the current training adjustment is shown, and alpha is a preset hyper-parameter.

It can be seen that, in this embodiment, compared with the method in which only the pseudo-labeled result is used to perform supervised training on the initial neural network in formula (1), and the processing class result of another neural network pair is used to perform supervised training on the middle and second neural networks of a neural network pair, since the processing class result of another neural network pair is derived from the third neural network, since the parameter value of the network parameter of the third neural network is not simply updated by the gradient descent method, and the adjustment of the network parameter of the third neural network is more stable, the processing result of the third neural network is used as the supervisory information of the second neural network of another neural network pair, so that it is possible to avoid error amplification, and even if the prediction of the processing result is erroneous, after multiple iterations, finally, the prediction error of the processing result can be eliminated, so that the trained target neural network is more stable.

In some embodiments, prior to processing the image samples in the joint domain with the N first neural networks, the method further comprises: extracting a preset number of image samples from the joint domain; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples; processing the image samples in the joint domain with a second neural network and a third neural network in each neural network pair, comprising: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.

In some embodiments, after performing N/2 different preprocessing on the extracted image samples, the processing on the image samples in the joint domain by using the second neural network and the third neural network in each neural network pair, and the implementation process of outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair may be: and respectively processing one kind of preprocessed image sample by utilizing each neural network pair, namely respectively processing the same kind of preprocessed image sample by utilizing a second neural network and a third neural network in one neural network pair so as to output a processing result of the second neural network and a processing result of the third neural network in each neural network pair.

In some embodiments, the method further comprises: and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network. Since the parameter adjustment of the third neural network is relatively stable, the third neural network can be generally used as the target neural network.

The embodiment of the application provides a flow schematic diagram of a target identification method, and the method comprises the following steps:

image processing is performed using a target neural network to identify a target in the image.

The target neural network is trained by using the training method shown in fig. 1, fig. 2 or fig. 3.

It can be seen that, in the embodiment of the present application, the target neural network is a network with more stable performance, so that the target neural network is used to identify the target, and the accuracy of target identification can be improved.

Referring to fig. 4, and fig. 4 are schematic structural diagrams of an exercise device 500 according to an embodiment of the present application, as shown in fig. 4, the exercise device 500 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are different from the one or more application programs, and the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for:

when the first neural network is used for processing the image sample in the joint domain and outputting the processing result of the first neural network, the program comprises instructions for executing the following steps: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;

when adjusting the parameter values of the network parameters of the first neural network according to the processing result of the first neural network and the current-time labeling result of the image sample in the joint domain, the program includes instructions for executing the following steps: and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.

In some embodiments, the program further includes instructions for, for a first neural network, adjusting parameter values of network parameters of the first neural network according to a training result of the first neural network, a current-time labeling result of the image sample in the joint domain, and training results of N-1 first neural networks other than the first neural network among the N first neural networks, performing the following steps:

before the image samples in the joint domain are processed by the N first neural networks, extracting a preset number of image samples from the joint domain; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;

when processing the image samples in the joint domain using the N first neural networks, the program is specifically configured to execute the following steps:

In some embodiments, the program is specifically configured to execute the following steps in generating a pseudo-labeling result of the current time of the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network:

when the first neural network is used to process the image sample in the joint domain and output the processing result of the first neural network, the program is specifically used to execute the following steps:

when the parameter values of the network parameters of the first neural network are adjusted according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, the program is specifically used for executing the following steps:

In some embodiments, the program further comprises instructions for, prior to processing the image sample in the joint domain using the second and third neural networks in each neural network pair, performing the steps of:

extracting a preset number of image samples from the joint domain;

the program is particularly for instructions for performing the following steps in processing an image sample in the joint domain using a second neural network and a third neural network of each neural network pair:

In some embodiments, when generating the current pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network, the program is specifically configured to execute the following steps:

In some embodiments, the program is specifically for executing the instructions of the following steps:

Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus 600 for object recognition according to an embodiment of the present application, as shown in fig. 5, the training apparatus 600 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are different from the one or more application programs, and the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for:

Wherein the target neural network is trained by the method as described in fig. 1, fig. 2 or fig. 3.

Referring to fig. 6, fig. 6 shows a block diagram of a possible functional unit of the training device 700 according to the above embodiment, and the training device 700 includes: pseudo label generating unit 710, processing unit 720, and adjusting unit 730, wherein:

a pseudo-annotation generating unit 710, configured to generate a current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network;

the processing unit 720 is configured to process the image sample in the joint domain by using the first neural network, and output a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;

the adjusting unit 730 is configured to adjust a parameter value of a network parameter of the first neural network according to a processing result of the first neural network and a current labeling result of the image sample in the joint domain.

the processing unit 720, when the first neural network is used to process the image sample in the joint domain and output a processing result of the first neural network, is specifically configured to: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;

the adjusting unit 730, when adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, is specifically configured to:

In some embodiments, in a case where, for a first neural network, a parameter value of a network parameter of the first neural network is adjusted according to a training result of the first neural network, a current-time labeling result of an image sample in a joint domain, and training results of N-1 first neural networks other than the first neural network, the apparatus further includes: a preprocessing unit 740;

a preprocessing unit 740, configured to extract a preset number of image samples from the joint domain before processing the image samples in the joint domain by using the N first neural networks; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;

the processing unit 720, when processing the image samples in the joint domain by using the N first neural networks, is specifically configured to: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.

In some embodiments, the pseudo-annotation generating unit 710, when generating the pseudo-annotation result of the current time of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.

the processing unit 720, when the first neural network is used to process the image sample in the joint domain and output a processing result of the first neural network, is specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;

the adjusting unit 730, when adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, is specifically configured to: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

In some embodiments, the apparatus further comprises a pre-processing unit 740;

a preprocessing unit 740, configured to extract a preset number of image samples from the joint domain before processing the image samples in the joint domain by using the second neural network and the third neural network in each neural network pair; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples; the processing unit, when processing the image sample in the joint domain using the second neural network and the third neural network in each neural network pair, is specifically configured to: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.

In some embodiments, the pseudo-annotation generating unit 710, when generating the pseudo-annotation result of the current time of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.

In some embodiments, the apparatus further comprises a test unit 750;

and the test unit 750 is configured to process the images in the test image set by using the N first neural networks after parameter adjustment, and use the first neural network with the optimal processing result as the target neural network.

In some embodiments, the apparatus further comprises a test unit 750;

and the test unit 750 is configured to process the images in the test image set by using the N/2 second neural networks after parameter adjustment, and use the second neural network with the optimal processing result as the target neural network.

The embodiment of the present application further provides a device for target identification, including:

an identification unit configured to perform image processing using a target neural network to identify a target in the image; wherein the target neural network is trained by the method described in FIG. 1, FIG. 2 or FIG. 3.

Embodiments of the present application also provide a computer storage medium, which stores a computer program, where the computer program is executed by a processor to implement part or all of the steps of any one of the neural network training methods or the target recognition method as set forth in the above method embodiments.

Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, the computer program being operable to cause a computer to perform part or all of the steps of any one of the methods for training method object recognition of a neural network as set forth in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of training a neural network, comprising:

the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof, image samples in a target domain and previous pseudo-labeling results thereof;

the number of the first neural networks is N, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;

processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network, specifically comprising: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;

according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, adjusting the parameter value of the network parameter of the first neural network, specifically comprising: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

2. The method of claim 1, wherein the image samples and their labeling results in the training image set further comprise the image samples and their previous pseudo-labeling results in the target domain.

3. The method of claim 1,

for a first neural network, adjusting parameter values of network parameters of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain, and the processing result of part or all of the N first neural networks except the first neural network, including:

4. The method of any one of claims 1-3, wherein prior to processing the image samples in the joint domain with the second and third neural networks in each neural network pair, the method further comprises:

extracting a preset number of image samples from the joint domain;

5. The method of claim 4, wherein generating a current-time pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network comprises:

6. The method of claim 5, further comprising:

7. A method of object recognition, comprising:

image processing using a target neural network to identify targets in the image, wherein the target neural network is trained using the method of any one of claims 1-6.

8. An apparatus for training a neural network, comprising:

a processing unit, configured to process the image sample in the joint domain by using the first neural network, and output a processing result of the first neural network, and specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;

an adjusting unit, configured to adjust a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, and specifically configured to: and processing the image sample in the joint domain by utilizing the second neural network and the third neural network in each neural network pair, and outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair.

9. The apparatus of claim 8, wherein the image samples and their labeling results in the training image set further comprise the image samples and their previous pseudo-labeling results in the target domain.

10. The apparatus of claim 8,

an adjusting unit, configured to, when, for one first neural network, according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain, and the processing results of some or all of the N first neural networks except the first neural network, adjust a parameter value of a network parameter of the first neural network, specifically: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.

11. The apparatus according to any one of claims 8-10, wherein the apparatus further comprises a pre-processing unit;

12. The apparatus of claim 11,

the pseudo-annotation generating unit, when generating a current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.

13. The apparatus of claim 12, further comprising a test unit;

14. An apparatus for object recognition, comprising:

an identification unit for performing image processing using a target neural network to identify a target in an image, wherein the target neural network is trained using the method of any one of claims 1-6.

15. An electronic device comprising a processor, a memory, wherein the memory is configured to store computer readable instructions, and wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1-6 or the method of claim 7.

16. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-6 or the method of claim 7.