CN110188829B - Neural network training method, target recognition method and related products - Google Patents

Neural network training method, target recognition method and related products Download PDF

Info

Publication number
CN110188829B
CN110188829B CN201910472707.5A CN201910472707A CN110188829B CN 110188829 B CN110188829 B CN 110188829B CN 201910472707 A CN201910472707 A CN 201910472707A CN 110188829 B CN110188829 B CN 110188829B
Authority
CN
China
Prior art keywords
neural network
image
domain
processing
image sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910472707.5A
Other languages
Chinese (zh)
Other versions
CN110188829A (en
Inventor
葛艺潇
陈大鹏
沈岩涛
王晓刚
李鸿升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201910472707.5A priority Critical patent/CN110188829B/en
Publication of CN110188829A publication Critical patent/CN110188829A/en
Application granted granted Critical
Publication of CN110188829B publication Critical patent/CN110188829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a training method of a neural network, a target recognition method and related products, wherein the training method comprises the following steps: generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network; processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network; the image sample in the united domain and the current labeling result thereof comprise the image sample in the source domain and the labeling result thereof, and the image sample in the target domain and the current pseudo labeling result thereof; and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.

Description

Neural network training method, target recognition method and related products
Technical Field
The application relates to the technical field of computer vision, in particular to a neural network training method, a target recognition method and related products.
Background
In the field of computer vision technology, the problem of object recognition has been the focus of research as the basis for many computer vision applications. The problem of target identification based on domain self-adaptation of inter-domain distribution difference has gradually become a research hotspot in the field of computer vision. The method has important significance for the research of the problem, and is mainly reflected in that: if the sample data of the target domain is added during training, the reusability of the classifier or the detector can be improved, the adaptability of the neural network to a new environment is effectively enhanced, and the training process of the neural network is independent of an application scene to a great extent.
In the prior art, when domain adaptation is implemented, style conversion is generally performed on an image of a source domain to match a target domain, and then a neural network is finely tuned, but the method for implementing domain adaptation has a high requirement on the image quality in the source domain, and the finely tuned neural network only learns the style characteristics of the image in the target domain, and does not learn the interrelation between the images in the target domain and the relationship between the images in the target domain and the source domain.
Disclosure of Invention
The embodiment of the application provides a training method of a neural network, a target identification method and related products, and the method generates a pseudo label for an image sample in a target domain, so that the field self-adaptation is realized during the neural network training, and the trained target neural network has stable identification performance and high identification precision.
In a first aspect, an embodiment of the present application provides a method for training a neural network, including:
generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof;
processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.
In some embodiments, the image samples and their labeling results in the training image set further include image samples and their previous pseudo-labeling results in the target domain.
In some embodiments, the number of the first neural networks is N, where N is an integer greater than 1, the network structures of the N first neural networks are the same, and the parameter values of the corresponding network parameters of the N first neural networks are different;
processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network, wherein the processing result comprises:
processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;
according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, adjusting the parameter value of the network parameter of the first neural network, including:
and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.
In some embodiments, in the case where, for a first neural network, the parameter values of the network parameters of the first neural network are adjusted according to the training result of the first neural network, the current-time labeling result of the image sample in the joint domain, and the training results of N-1 first neural networks other than the first neural network, the method further includes:
before the image samples in the joint domain are processed by the N first neural networks, extracting a preset number of image samples from the joint domain;
preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;
processing image samples in the joint domain with the N first neural networks, including:
and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.
In some embodiments, generating a current-time pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network comprises:
and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.
In some embodiments, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network, wherein the processing result comprises:
processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;
according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, adjusting the parameter value of the network parameter of the first neural network, including:
and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
In some embodiments, prior to processing the image samples in the joint domain with the second and third neural networks in each neural network pair, the method further comprises:
extracting a preset number of image samples from the joint domain;
preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;
processing the image samples in the joint domain with a second neural network and a third neural network in each neural network pair, comprising:
and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
In some embodiments, generating a current-time pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network comprises:
and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
In some embodiments, the method further comprises:
and processing the images in the test image set by using the N first neural networks after parameter adjustment, and taking the first neural network with the optimal processing result as a target neural network.
In some embodiments, the method further comprises:
and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.
In a second aspect, an embodiment of the present application provides a target identification method, including:
and performing image processing by using a target neural network to identify a target in the image, wherein the target neural network is trained by adopting the method of the first aspect.
In a third aspect, an embodiment of the present application provides a training apparatus for a neural network, including:
the pseudo-annotation generating unit is used for generating a current pseudo-annotation result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof;
the processing unit is used for processing the image samples in the joint domain by utilizing the first neural network and outputting a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
and the adjusting unit is used for adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.
In some embodiments, the image samples and their labeling results in the training image set further include image samples and their previous pseudo-labeling results in the target domain.
In some embodiments, the number of the first neural networks is N, where N is an integer greater than 1, the network structures of the N first neural networks are the same, and the parameter values of the corresponding network parameters of the N first neural networks are different;
the processing unit, when processing the image sample in the joint domain by using the first neural network and outputting a processing result of the first neural network, is specifically configured to: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;
an adjusting unit, configured to, when adjusting a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, specifically:
and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.
In some embodiments, in a case where, for a first neural network, a parameter value of a network parameter of the first neural network is adjusted according to a training result of the first neural network, a current-time labeling result of an image sample in a joint domain, and training results of N-1 first neural networks other than the first neural network, the apparatus further includes: a pre-processing unit;
the preprocessing unit is used for extracting a preset number of image samples from the joint domain before the image samples in the joint domain are processed by the N first neural networks; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;
the processing unit, when processing the image sample in the joint domain by using the N first neural networks, is specifically configured to: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.
In some embodiments, the pseudo-annotation generating unit, when generating the current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.
In some embodiments, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
the processing unit, when processing the image sample in the joint domain by using the first neural network and outputting a processing result of the first neural network, is specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;
an adjusting unit, configured to, when adjusting a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, specifically: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
In some embodiments, the apparatus further comprises a pre-processing unit;
the preprocessing unit is used for extracting a preset number of image samples from the joint domain before the image samples in the joint domain are processed by using the second neural network and the third neural network in each neural network pair; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;
the processing unit, when processing the image sample in the joint domain using the second neural network and the third neural network in each neural network pair, is specifically configured to: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
In some embodiments, the pseudo-annotation generating unit, when generating the current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
In some embodiments, the device further comprises a test unit;
and the test unit is used for processing the images in the test image set by using the N first neural networks after the parameters are adjusted, and taking the first neural network with the optimal processing result as a target neural network.
In some embodiments, the device further comprises a test unit;
and the test unit is used for processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.
In a fourth aspect, an embodiment of the present application provides an apparatus for target identification, including:
an identifying unit, configured to perform image processing using a target neural network to identify a target in the image, where the target neural network is trained by the method of the first aspect.
In a fifth aspect, the present application provides an electronic device, including a processor and a memory, where the memory is configured to store computer-readable instructions, and the processor is configured to call the instructions stored in the memory to execute the instructions of the steps in the method according to the first aspect or the second aspect.
In a sixth aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program, which when executed by a processor implements the method according to the first or second aspect.
In a seventh aspect, the present application provides a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer being operable to cause a computer to perform the method according to the first or second aspect.
The embodiment of the application has the following beneficial effects:
it can be seen that, in the embodiment of the present application, by adding a pseudo-annotation result to an image sample in a target domain, the image sample in the target domain also includes supervisory information, so that the image sample in a joint domain composed of a source domain and the target domain all includes the supervisory information, and thus, the image sample in the joint domain can be utilized to perform optimization training on a neural network; meanwhile, because a pseudo-labeling result is added to the image sample in the target domain, the neural network can learn the relation between the image samples of the target domain and the relation between the image sample of the source domain and the image sample of the target domain when the neural network is trained by using the image sample in the joint domain, so that the trained target neural network can adapt to various recognition scenes; and moreover, the image samples in the target domain are used for the optimization training of the neural network, so that the image samples in the training image set are richer, and the identification precision of the target neural network is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for training a neural network according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;
fig. 2A is a schematic diagram of a training scenario of a training method of a neural network according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;
FIG. 3A is a schematic diagram of a training scenario of another training method for a neural network according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an exercise device according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an apparatus for object recognition according to an embodiment of the present application;
fig. 6 is a block diagram illustrating functional units of a training apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
When the domain self-adaptation is realized, firstly, the neural network training is carried out by utilizing the image samples (the image samples in the source domain have corresponding labeling results) in the source domain (source domain) to obtain an initial neural network (), then the pseudo labels (pseudo labeling results) are added to the image samples in the target domain (target domain), and the source domain and the target domain added with the pseudo labels form a combined domain D, so that each image sample in the combined domain D comprises the labels (labeling results), and at the moment, the combined domain can be adopted to carry out supervision training on the initial neural network to obtain the neural network with strong adaptability. Adding the pseudo label to the image sample in the target domain can extract the characteristic information of the image sample in the target domain through the initial neural network, clustering the characteristic information, and adding the pseudo label to the image sample in the target domain through clustering. Assuming that the initial neural network has a characteristic transfer function of F (,; theta), such as a cross-entropy loss function, when the initial neural network is supervised and trained by using a joint domain, the corresponding classification loss function is:
Figure BDA0002081234350000061
wherein L isidAs a function of classification loss, xiFor the i-th image sample in the joint domain D, yiAs image samples xiCorresponding supervisory information, C (F (x)i(ii) a θ)) is a pair of image samples xiN is the image sample in the joint domain DTotal number, LceIs a cross entropy loss function.
Because the supervision information of the partial image samples in the joint domain D is a pseudo label obtained by clustering, learning of features is insufficient, and noise (i.e., an erroneous label) is inevitably caused, resulting in low recognition accuracy of the optimized neural network. In order to solve the above defects, the technical solution of the present application is proposed to solve the problem of noise generated when a pseudo tag is added to an image sample in a target domain, and implement domain adaptation.
Referring to fig. 1, fig. 1 is a method for training a neural network according to an embodiment of the present application, where the method includes:
step 101: and generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network.
The first neural network is obtained by training image samples in a training image set and labeling results thereof, the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof, and the first neural network is obtained by training only the image samples in the source domain.
Optionally, the implementation process of generating the current pseudo-annotation result of the image sample in the target domain may be: clustering the characteristics of each image sample in the extracted target domain to obtain a clustering result, generating a pseudo-labeling result of the image sample in the target domain based on the clustering result, namely, coding each cluster in the clustering result to obtain a pseudo-labeling result corresponding to each cluster, then determining the cluster to which each image sample in the target domain belongs, and taking the pseudo-labeling result corresponding to the cluster as the pseudo-labeling result of the image sample, so that the pseudo-labeling result of each image sample is only used for distinguishing the image sample in the target domain and does not represent the real identity corresponding to the image sample. For example, if the clustering results in 500 clusters, the pseudo labeling results of the 500 clusters can be respectively encoded as the numbers "1", "2", … "and" 500 ", and the image samples in the target domain can be distinguished by the numbers" 1 "," 2 ", …" and "500".
The Clustering algorithm used for Clustering may be one of a K-means Clustering algorithm, a maximum Expectation Clustering algorithm EM (english: Expectation Clustering), a Hierarchical Clustering algorithm HAC (english: Hierarchical Clustering), and the like.
The image of the source domain and the image of the target domain may contain the same object, for example, for the pedestrian re-identification technology, the same pedestrian may appear in the image of the source domain and also in the image of the target domain; however, there may be a difference in time or environment between the image of the source domain and the image of the target domain, for example, the image of the source domain is the image of a daytime region, and the image of the target domain is the image of a night region.
For example, for the pedestrian re-identification technology, the image samples in the source domain and the target domain are independently distributed, and the image samples in the source domain include the artificial labeling result, which reflects the real identity information corresponding to each image sample, but the image samples in the target domain do not include the labeling result. Therefore, firstly, only image samples in a source domain are adopted for neural network training to obtain a pedestrian recognition network; if the image samples in the target domain are used for optimizing the pedestrian identification network, the image samples in the target domain need to be added with labels, then the image samples in the target domain and the image samples in the source domain form a combined domain, and the image samples in the source domain are used for optimizing the pedestrian identification network, so that the pedestrian identification network can identify the identity of people in different environments. For example, two cameras capture two images of the same pedestrian at two places (the two places have different ambient brightness), and the optimal person identification network outputs the two images as the identification result of the same pedestrian when identifying the two images.
Step 102: and processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network.
The processing result is a classification result of the first neural network on the image sample, that is, a probability value of the image sample belonging to each category.
The image samples in the joint domain and the current labeling result thereof comprise the image samples in the source domain and the labeling result thereof, and the image samples in the target domain and the current pseudo labeling result thereof.
The image samples in the joint domain are composed of image samples in a source domain and image samples in a target domain, and the annotation result of the image samples in the joint domain comprises an artificial annotation result of the image samples in the source domain and a pseudo annotation result added to the image samples based on the first neural network in the target domain.
Step 103: and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.
Further, in the process of training the first neural network by using the image samples in the joint domain, after the image samples in the joint domain all participate in one training (for example, the image samples in the joint domain include 100, 10 image samples are taken in each training, and then after 10 times of training, the image samples in the joint domain all participate in one training), the trained neural network is used to generate the pseudo-labeling result again by pseudo-imaging the image samples in the target domain. Therefore, in some embodiments, the image samples and their labeling results in the training image set include, in addition to the image samples and their labeling results in the source domain, the image samples and their previous pseudo labeling results in the target domain, where the pseudo labeling results are pseudo labeling results added to the image samples in the target domain during the last training.
It can be understood that, after the parameter values of the network parameters of the first neural network are adjusted, the first neural network after the network parameters are adjusted is used to extract the features in the image samples in the target domain, and a pseudo-annotation result is generated for the image samples again to form the annotation sister u of the image samples in the new joint domain, the image samples in the joint domain are processed again by using the first neural network, the processing result of the first neural network is output again, the parameter values of the network parameters of the first neural network are adjusted by using the processing result output again and the annotation result of the image samples in the new joint domain, and so on, after the similar operation is repeated for a plurality of times, when the performance of the first neural network is stable and optimal, the training process of the neural network is stopped.
Referring to fig. 2, fig. 2 is a diagram of another training method for a neural network according to an embodiment of the present application, where the method is applied to a training scenario in which N first neural networks are provided, the N first neural networks have the same network structure, and parameter values of corresponding network parameters of the N first neural networks are different, and the method includes:
step 201: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.
Optionally, each of the N first neural networks is used to perform feature extraction on the image sample in the target domain, so as to obtain N features corresponding to the image sample, determine an average value of the N features, so as to obtain an average feature corresponding to the image sample, and then perform clustering based on the average feature corresponding to each image sample, so as to obtain a pseudo-labeling result corresponding to each image sample, where the clustering method refers to the process shown in step 101, and is not described herein again.
Step 202: and processing the image samples in the joint domain by using the N first neural networks, and outputting the processing results of the N first neural networks.
Step 203: and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.
Optionally, the implementation process of adjusting the parameter value of the network parameter of the first neural network may be: determining a first classification loss of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, determining a second classification loss according to the processing result of the first neural network and the processing result of part or all of the N first neural networks except the first neural network, summing the first classification loss and the second classification loss to obtain a target classification loss of the first neural network, and adjusting the parameter value of the network parameter of the first neural network based on the target classification loss. The target classification loss may be processed by a gradient descent method to adjust parameter values of network parameters of the second neural network.
Referring to fig. 2A, fig. 2A is a training scenario of a training method of a neural network provided in an embodiment of the present application, as shown in fig. 2A, the number of the first neural networks is 2, the two first neural networks are two neural networks for mutual supervised learning, and a process of adjusting parameter values of network parameters is specifically described below by taking the two first neural networks shown in fig. 2A as an example.
Suppose that the first neural network 1 has F (,; θ)1) The first neural network 2 has F (,; theta2) The first classification loss corresponding to the first neural network 1 is shown in formula (2), and the second classification loss is shown in formula (3):
Figure BDA0002081234350000081
wherein L isid1) For a first classification loss, L, corresponding to the first neural network 1ceAs a cross-entropy loss function, xiFor the ith image sample in the joint domain, θ1Is a network parameter, y, of the first neural network 1iFor the labeling result of the ith image sample, C1(F(xi;θ1) Is the processing result of the first neural network 1 on the ith image sample, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.
Figure BDA0002081234350000082
Wherein L issoft12) For a corresponding second classification loss, C, of the first neural network 12(F(x′i;θ2) Are second neural network 2 to image sample x'iResult of treatment of (1), C1(F(xi;θ1) The first neural network 1 on image samples xiAnd (4) as a processing result, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.
And finally, adding the formula (2) and the formula (3) to obtain the target classification loss of the first neural network 1, processing the target classification loss by using a gradient descent method based on the target classification loss, and adjusting the network parameter theta of the first neural network 11The parameter value of (2). Similarly, the process of adjusting the parameter value of the network parameter of the second neural network 2 is consistent with the process of adjusting the parameter of the network parameter of the second neural network 1, and is not repeated.
Furthermore, a domain discriminator can be connected behind each first neural network, the characteristics of the image sample in the combined domain extracted by the first neural network are obtained, the characteristic information is input into the domain discriminator, and the domain discriminator is used for determining the domain probability of the image sample, wherein the domain probability is the probability that the image sample belongs to the source domain or the target domain; determining a domain classification loss based on the domain probability; combining the domain classification loss with the target classification loss to obtain a final classification loss, processing the final classification loss by using a gradient descent method, and adjusting a network parameter theta of the first neural network 11The parameter value of (2).
Wherein the domain classification loss is shown in formula (4):
Figure BDA0002081234350000091
wherein L isDFor domain classification loss, w is the discrimination parameter of the domain discriminator,
Figure BDA0002081234350000092
as the ith image in the source domainThe sample is taken from the sample container,
Figure BDA0002081234350000093
for the ith image sample in the target domain, NsIs the total number of training samples in the source domain, NtThe total number of training samples in the target domain,
Figure BDA0002081234350000094
pair domain discriminator pair image sample
Figure BDA0002081234350000095
As a result of the processing of (1),
Figure BDA0002081234350000096
pairing image samples for domain discriminators
Figure BDA0002081234350000097
The result of the processing of (1).
It can be seen that, in this embodiment, compared with the formula (1), the initial neural network is supervised and trained only by using the pseudo-labeling result, in the present scheme, while the first neural network is supervised and trained by using the pseudo-labeling result, a processing result of another first neural network is also adopted as the supervision information of the first neural network, and the first neural network is supervised and trained second by using the supervision information.
In some embodiments, prior to processing the image samples in the joint domain with the N first neural networks, the method further comprises: extracting a preset number of image samples from the joint domain; and carrying out N kinds of preprocessing in different modes on the extracted image samples to obtain N kinds of preprocessed image samples. Because the image samples are randomly disturbed, the difference of the image samples is ensured, the richness of the image samples used in the network optimization is ensured, and the trained target neural network can adapt to complicated and changeable input samples.
The N different ways of preprocessing the extracted image samples may be: and randomly disturbing the characteristics of each extracted image sample to obtain N preprocessed image samples. For example, N kinds of preprocessed image samples can be obtained by performing masking processing on the extracted image samples N times, using the remaining part after each masking as an image sample, and masking different regions each time.
In some embodiments, based on the N pre-processed image samples obtained above, the processing of the image samples in the joint domain by using the N first neural networks may be implemented by: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.
In some embodiments, the method further comprises: and processing the images in the test image set by using the N first neural networks after parameter adjustment, and taking the first neural network with the optimal processing result as a target neural network. In the present embodiment, the first neural network with the optimal processing result is acquired so that the recognition accuracy is high when the target is recognized by the first neural network.
Referring to fig. 3, fig. 3 is a training method of another training method for a neural network according to an embodiment of the present disclosure, where the training method is applied to a training scenario in which the number of first neural networks is N, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training; the method comprises the following steps:
step 301: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
Step 302: and processing the image sample in the joint domain by utilizing the second neural network and the third neural network in each neural network pair, and outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair.
Step 303: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
Optionally, the implementation process of adjusting the network parameter values of the second neural network may be: determining a first classification loss of the second neural network according to a processing result of the second neural network and a current labeling result of the image sample in a joint domain, determining a second classification loss of the second neural network according to the processing result of the second neural network and a processing result of a third neural network in each neural network pair except the neural network pair comprising the second neural network, summing the first classification loss and the second classification loss to obtain a target classification loss of the second neural network, and adjusting a parameter value of a network parameter of the second neural network based on the target classification loss, namely processing the target classification loss by a gradient descent method so as to adjust the parameter value of the network parameter of the second neural network.
Referring to fig. 3A, fig. 3A is another training scenario of a neural network provided in the embodiment of the present application, as shown in fig. 3A, the number of the first neural networks is 4, and includes 2 neural network pairs, and an adjustment process for adjusting network parameters of a second neural network in each neural network pair is specifically described below by taking two neural network pairs shown in fig. 3A as an example.
Suppose that the second neural network 1 of the first neural network pair has F (,; θ)1) Characteristic transformation function of, first godThe third neural network 1 via the network pair (comprising the second neural network 1 and the third neural network 1) has F (,; E [ theta ])1]) The second neural network 2 of the second neural network pair (including the second neural network 2 and the third neural network 2) has F (,; theta2) The third neural network 2 of the second neural network pair has F (,; e [ theta ]2]) The first classification loss corresponding to the second neural network 1 is shown in formula (5), and the second classification loss is shown in formula (6):
Figure BDA0002081234350000101
wherein L isid1) First class loss, L, for second neural network 1 of the first neural network pairceAs a cross-entropy loss function, xiFor the ith image sample in the joint domain, θ1Network parameter, y, of a second neural network 1 being a first neural network pairiFor the labeling result of the ith image sample, C1(F(xi;θ1) Is the processing result of the second neural network 1 on the ith image sample, N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.
Figure BDA0002081234350000102
Wherein L ismt12) For a corresponding second classification loss, C, of the second neural network 12(F(x′i;E[θ2]) X 'of the third neural network 2 to image sample'iResult of treatment of (1), C1(F(xi;θ1) Second neural network 1 on image samples xiResult of processing of (1), θ1Network parameters of the second neural network 1, E [ theta ]2]And N is the total number of image samples in the joint domain, and i is more than or equal to 1 and less than or equal to N.
Finally, the formula (4) and the formula (5) are added to obtainTarget classification loss of the second neural network 1 of the first neural network pair, based on which the network parameter θ of the second neural network 1 of the first neural network pair is adjusted using a gradient descent method1The process of adjusting the parameter value of the network parameter of the second neural network 1 of the second neural network pair is the same as the process of adjusting the parameter value of the network parameter of the second neural network of the first neural network pair, and is not repeated.
Further, the target classification loss and the domain classification loss of the above formula (4) are combined to obtain a final classification loss corresponding to the second neural network 1 of the first neural network pair, and similarly, the final classification loss is processed by using a gradient descent method to adjust the parameter value of the network parameter of the second neural network 1 of the first neural network pair.
In the above embodiment taking two neural network pairs as an example, the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training, and the specific determination process can be seen in formula (7) and formula (8);
E[θ1]=αE′[θ1]+(1-α)θ1 (7);
wherein, E [ theta ]1]Is a network parameter of the third neural network 1, E' [ theta ]1]The network parameter, θ, of the third neural network 1 after each training before the current training1The network parameters of the second neural network 1 after the current training and adjustment are set, and alpha is a preset hyper-parameter.
E[θ2]=αE′[θ2]+(1-α)θ2 (8);
Wherein, E [ theta ]2]Is a network parameter of the third neural network 2, E' [ theta ]2]The network parameter, theta, of the third neural network 2 after each training before the current training2The network parameter of the second neural network 2 after the current training adjustment is shown, and alpha is a preset hyper-parameter.
It can be seen that, in this embodiment, compared with the method in which only the pseudo-labeled result is used to perform supervised training on the initial neural network in formula (1), and the processing class result of another neural network pair is used to perform supervised training on the middle and second neural networks of a neural network pair, since the processing class result of another neural network pair is derived from the third neural network, since the parameter value of the network parameter of the third neural network is not simply updated by the gradient descent method, and the adjustment of the network parameter of the third neural network is more stable, the processing result of the third neural network is used as the supervisory information of the second neural network of another neural network pair, so that it is possible to avoid error amplification, and even if the prediction of the processing result is erroneous, after multiple iterations, finally, the prediction error of the processing result can be eliminated, so that the trained target neural network is more stable.
In some embodiments, prior to processing the image samples in the joint domain with the N first neural networks, the method further comprises: extracting a preset number of image samples from the joint domain; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples; processing the image samples in the joint domain with a second neural network and a third neural network in each neural network pair, comprising: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
In some embodiments, after performing N/2 different preprocessing on the extracted image samples, the processing on the image samples in the joint domain by using the second neural network and the third neural network in each neural network pair, and the implementation process of outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair may be: and respectively processing one kind of preprocessed image sample by utilizing each neural network pair, namely respectively processing the same kind of preprocessed image sample by utilizing a second neural network and a third neural network in one neural network pair so as to output a processing result of the second neural network and a processing result of the third neural network in each neural network pair.
In some embodiments, the method further comprises: and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network. Since the parameter adjustment of the third neural network is relatively stable, the third neural network can be generally used as the target neural network.
The embodiment of the application provides a flow schematic diagram of a target identification method, and the method comprises the following steps:
image processing is performed using a target neural network to identify a target in the image.
The target neural network is trained by using the training method shown in fig. 1, fig. 2 or fig. 3.
It can be seen that, in the embodiment of the present application, the target neural network is a network with more stable performance, so that the target neural network is used to identify the target, and the accuracy of target identification can be improved.
Referring to fig. 4, and fig. 4 are schematic structural diagrams of an exercise device 500 according to an embodiment of the present application, as shown in fig. 4, the exercise device 500 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are different from the one or more application programs, and the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for:
generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof;
processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
and adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain.
In some embodiments, the image samples and their labeling results in the training image set further include image samples and their previous pseudo-labeling results in the target domain.
In some embodiments, the number of the first neural networks is N, where N is an integer greater than 1, the network structures of the N first neural networks are the same, and the parameter values of the corresponding network parameters of the N first neural networks are different;
when the first neural network is used for processing the image sample in the joint domain and outputting the processing result of the first neural network, the program comprises instructions for executing the following steps: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;
when adjusting the parameter values of the network parameters of the first neural network according to the processing result of the first neural network and the current-time labeling result of the image sample in the joint domain, the program includes instructions for executing the following steps: and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.
In some embodiments, the program further includes instructions for, for a first neural network, adjusting parameter values of network parameters of the first neural network according to a training result of the first neural network, a current-time labeling result of the image sample in the joint domain, and training results of N-1 first neural networks other than the first neural network among the N first neural networks, performing the following steps:
before the image samples in the joint domain are processed by the N first neural networks, extracting a preset number of image samples from the joint domain; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;
when processing the image samples in the joint domain using the N first neural networks, the program is specifically configured to execute the following steps:
and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.
In some embodiments, the program is specifically configured to execute the following steps in generating a pseudo-labeling result of the current time of the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network:
and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.
In some embodiments, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
when the first neural network is used to process the image sample in the joint domain and output the processing result of the first neural network, the program is specifically used to execute the following steps:
processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;
when the parameter values of the network parameters of the first neural network are adjusted according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, the program is specifically used for executing the following steps:
and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
In some embodiments, the program further comprises instructions for, prior to processing the image sample in the joint domain using the second and third neural networks in each neural network pair, performing the steps of:
extracting a preset number of image samples from the joint domain;
preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;
the program is particularly for instructions for performing the following steps in processing an image sample in the joint domain using a second neural network and a third neural network of each neural network pair:
and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
In some embodiments, when generating the current pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network, the program is specifically configured to execute the following steps:
and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
In some embodiments, the program is specifically for executing the instructions of the following steps:
and processing the images in the test image set by using the N first neural networks after parameter adjustment, and taking the first neural network with the optimal processing result as a target neural network.
In some embodiments, the program is specifically for executing the instructions of the following steps:
and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus 600 for object recognition according to an embodiment of the present application, as shown in fig. 5, the training apparatus 600 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are different from the one or more application programs, and the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for:
image processing is performed using a target neural network to identify a target in the image.
Wherein the target neural network is trained by the method as described in fig. 1, fig. 2 or fig. 3.
Referring to fig. 6, fig. 6 shows a block diagram of a possible functional unit of the training device 700 according to the above embodiment, and the training device 700 includes: pseudo label generating unit 710, processing unit 720, and adjusting unit 730, wherein:
a pseudo-annotation generating unit 710, configured to generate a current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof;
the processing unit 720 is configured to process the image sample in the joint domain by using the first neural network, and output a processing result of the first neural network; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
the adjusting unit 730 is configured to adjust a parameter value of a network parameter of the first neural network according to a processing result of the first neural network and a current labeling result of the image sample in the joint domain.
In some embodiments, the image samples and their labeling results in the training image set further include image samples and their previous pseudo-labeling results in the target domain.
In some embodiments, the number of the first neural networks is N, where N is an integer greater than 1, the network structures of the N first neural networks are the same, and the parameter values of the corresponding network parameters of the N first neural networks are different;
the processing unit 720, when the first neural network is used to process the image sample in the joint domain and output a processing result of the first neural network, is specifically configured to: processing the image samples in the joint domain by using the N first neural networks, and outputting processing results of the N first neural networks;
the adjusting unit 730, when adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, is specifically configured to:
and aiming at one first neural network, adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain and the processing result of part or all of the N first neural networks except the first neural network.
In some embodiments, in a case where, for a first neural network, a parameter value of a network parameter of the first neural network is adjusted according to a training result of the first neural network, a current-time labeling result of an image sample in a joint domain, and training results of N-1 first neural networks other than the first neural network, the apparatus further includes: a preprocessing unit 740;
a preprocessing unit 740, configured to extract a preset number of image samples from the joint domain before processing the image samples in the joint domain by using the N first neural networks; preprocessing the extracted image samples in N different modes to obtain N preprocessed image samples;
the processing unit 720, when processing the image samples in the joint domain by using the N first neural networks, is specifically configured to: and processing one preprocessed image sample by using each first neural network to output a processing result of each first neural network.
In some embodiments, the pseudo-annotation generating unit 710, when generating the pseudo-annotation result of the current time of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N first neural networks.
In some embodiments, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
the processing unit 720, when the first neural network is used to process the image sample in the joint domain and output a processing result of the first neural network, is specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair;
the adjusting unit 730, when adjusting the parameter value of the network parameter of the first neural network according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, is specifically configured to: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
In some embodiments, the apparatus further comprises a pre-processing unit 740;
a preprocessing unit 740, configured to extract a preset number of image samples from the joint domain before processing the image samples in the joint domain by using the second neural network and the third neural network in each neural network pair; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples; the processing unit, when processing the image sample in the joint domain using the second neural network and the third neural network in each neural network pair, is specifically configured to: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
In some embodiments, the pseudo-annotation generating unit 710, when generating the pseudo-annotation result of the current time of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
In some embodiments, the apparatus further comprises a test unit 750;
and the test unit 750 is configured to process the images in the test image set by using the N first neural networks after parameter adjustment, and use the first neural network with the optimal processing result as the target neural network.
In some embodiments, the apparatus further comprises a test unit 750;
and the test unit 750 is configured to process the images in the test image set by using the N/2 second neural networks after parameter adjustment, and use the second neural network with the optimal processing result as the target neural network.
The embodiment of the present application further provides a device for target identification, including:
an identification unit configured to perform image processing using a target neural network to identify a target in the image; wherein the target neural network is trained by the method described in FIG. 1, FIG. 2 or FIG. 3.
Embodiments of the present application also provide a computer storage medium, which stores a computer program, where the computer program is executed by a processor to implement part or all of the steps of any one of the neural network training methods or the target recognition method as set forth in the above method embodiments.
Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, the computer program being operable to cause a computer to perform part or all of the steps of any one of the methods for training method object recognition of a neural network as set forth in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (16)

1. A method of training a neural network, comprising:
generating a current pseudo-labeling result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof, image samples in a target domain and previous pseudo-labeling results thereof;
the number of the first neural networks is N, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
processing the image sample in the joint domain by using the first neural network, and outputting a processing result of the first neural network, specifically comprising: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
according to the processing result of the first neural network and the current labeling result of the image sample in the joint domain, adjusting the parameter value of the network parameter of the first neural network, specifically comprising: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
2. The method of claim 1, wherein the image samples and their labeling results in the training image set further comprise the image samples and their previous pseudo-labeling results in the target domain.
3. The method of claim 1,
for a first neural network, adjusting parameter values of network parameters of the first neural network according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain, and the processing result of part or all of the N first neural networks except the first neural network, including:
and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
4. The method of any one of claims 1-3, wherein prior to processing the image samples in the joint domain with the second and third neural networks in each neural network pair, the method further comprises:
extracting a preset number of image samples from the joint domain;
preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;
processing the image samples in the joint domain with a second neural network and a third neural network in each neural network pair, comprising:
and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
5. The method of claim 4, wherein generating a current-time pseudo-annotation result for the image sample in the target domain based on the features extracted from the image sample in the target domain by the first neural network comprises:
and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
6. The method of claim 5, further comprising:
and processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.
7. A method of object recognition, comprising:
image processing using a target neural network to identify targets in the image, wherein the target neural network is trained using the method of any one of claims 1-6.
8. An apparatus for training a neural network, comprising:
the pseudo-annotation generating unit is used for generating a current pseudo-annotation result of the image sample in the target domain according to the features extracted from the image sample in the target domain by the first neural network;
the first neural network is obtained by training image samples in a training image set and labeling results thereof, wherein the image samples in the training image set and the labeling results thereof comprise image samples in a source domain and labeling results thereof, image samples in a target domain and previous pseudo-labeling results thereof;
the number of the first neural networks is N, N is an integer greater than or equal to 4, and N is an even number; the N first neural networks are divided into N/2 second neural networks and N/2 third neural networks; a second neural network and a third neural network form a neural network pair, and the current parameter value of one network parameter of the third neural network in each neural network pair is determined according to the current parameter value of the corresponding network parameter of the second neural network in the neural network pair and the parameter value in each previous training;
a processing unit, configured to process the image sample in the joint domain by using the first neural network, and output a processing result of the first neural network, and specifically configured to: processing the image sample in the joint domain by using a second neural network and a third neural network in each neural network pair, and outputting a processing result of the second neural network and a processing result of the third neural network in each neural network pair; the image samples in the joint domain and the current labeling results thereof comprise the image samples in the source domain and the labeling results thereof, and the image samples in the target domain and the current pseudo-labeling results thereof;
an adjusting unit, configured to adjust a parameter value of a network parameter of a first neural network according to a processing result of the first neural network and a current labeling result of an image sample in a joint domain, and specifically configured to: and processing the image sample in the joint domain by utilizing the second neural network and the third neural network in each neural network pair, and outputting the processing result of the second neural network and the processing result of the third neural network in each neural network pair.
9. The apparatus of claim 8, wherein the image samples and their labeling results in the training image set further comprise the image samples and their previous pseudo-labeling results in the target domain.
10. The apparatus of claim 8,
an adjusting unit, configured to, when, for one first neural network, according to the processing result of the first neural network, the current labeling result of the image sample in the joint domain, and the processing results of some or all of the N first neural networks except the first neural network, adjust a parameter value of a network parameter of the first neural network, specifically: and aiming at a second neural network, adjusting the network parameter value of the second neural network according to the processing result of the second neural network, the current labeling result of the image sample in the joint domain and the processing result of a third neural network in each neural network pair except the neural network pair containing the second neural network.
11. The apparatus according to any one of claims 8-10, wherein the apparatus further comprises a pre-processing unit;
the preprocessing unit is used for extracting a preset number of image samples from the joint domain before the image samples in the joint domain are processed by using the second neural network and the third neural network in each neural network pair; preprocessing the extracted image samples in N/2 different modes to obtain N/2 preprocessed image samples;
the processing unit, when processing the image sample in the joint domain using the second neural network and the third neural network in each neural network pair, is specifically configured to: and respectively processing a preprocessed image sample by utilizing a second neural network and a third neural network in each neural network pair.
12. The apparatus of claim 11,
the pseudo-annotation generating unit, when generating a current pseudo-annotation result of the image sample in the target domain according to the feature extracted from the image sample in the target domain by the first neural network, is specifically configured to: and generating a current pseudo-labeling result of the image sample in the target domain according to the average value of the features extracted from the image sample in the target domain by the N/2 second neural networks.
13. The apparatus of claim 12, further comprising a test unit;
and the test unit is used for processing the images in the test image set by using the N/2 second neural networks after parameter adjustment, and taking the second neural network with the optimal processing result as a target neural network.
14. An apparatus for object recognition, comprising:
an identification unit for performing image processing using a target neural network to identify a target in an image, wherein the target neural network is trained using the method of any one of claims 1-6.
15. An electronic device comprising a processor, a memory, wherein the memory is configured to store computer readable instructions, and wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1-6 or the method of claim 7.
16. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-6 or the method of claim 7.
CN201910472707.5A 2019-05-31 2019-05-31 Neural network training method, target recognition method and related products Active CN110188829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910472707.5A CN110188829B (en) 2019-05-31 2019-05-31 Neural network training method, target recognition method and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910472707.5A CN110188829B (en) 2019-05-31 2019-05-31 Neural network training method, target recognition method and related products

Publications (2)

Publication Number Publication Date
CN110188829A CN110188829A (en) 2019-08-30
CN110188829B true CN110188829B (en) 2022-01-28

Family

ID=67719724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910472707.5A Active CN110188829B (en) 2019-05-31 2019-05-31 Neural network training method, target recognition method and related products

Country Status (1)

Country Link
CN (1) CN110188829B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647938B (en) * 2019-09-24 2022-07-15 北京市商汤科技开发有限公司 Image processing method and related device
US11429809B2 (en) 2019-09-24 2022-08-30 Beijing Sensetime Technology Development Co., Ltd Image processing method, image processing device, and storage medium
CN110738146B (en) * 2019-09-27 2020-11-17 华中科技大学 Target re-recognition neural network and construction method and application thereof
CN111598124B (en) * 2020-04-07 2022-11-11 深圳市商汤科技有限公司 Image processing device, image processing apparatus, processor, electronic apparatus, and storage medium
CN111539947B (en) * 2020-04-30 2024-03-29 上海商汤智能科技有限公司 Image detection method, related model training method, related device and equipment
CN112906857B (en) * 2021-01-21 2024-03-19 商汤国际私人有限公司 Network training method and device, electronic equipment and storage medium
CN113221770B (en) * 2021-05-18 2024-06-04 青岛根尖智能科技有限公司 Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN114550215B (en) * 2022-02-25 2022-10-18 北京拙河科技有限公司 Target detection method and system based on transfer learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469109A (en) * 2015-11-19 2016-04-06 中国地质大学(武汉) Transfer learning method based on class centroid alignment and for remote sensing image classification
CN106599922A (en) * 2016-12-16 2017-04-26 中国科学院计算技术研究所 Transfer learning method and transfer learning system for large-scale data calibration
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
CN107909101A (en) * 2017-11-10 2018-04-13 清华大学 Semi-supervised transfer learning character identifying method and system based on convolutional neural networks
CN108182394A (en) * 2017-12-22 2018-06-19 浙江大华技术股份有限公司 Training method, face identification method and the device of convolutional neural networks
CN108197670A (en) * 2018-01-31 2018-06-22 国信优易数据有限公司 Pseudo label generation model training method, device and pseudo label generation method and device
CN108256561A (en) * 2017-12-29 2018-07-06 中山大学 A kind of multi-source domain adaptive migration method and system based on confrontation study
CN109089133A (en) * 2018-08-07 2018-12-25 北京市商汤科技开发有限公司 Method for processing video frequency and device, electronic equipment and storage medium
CN109583594A (en) * 2018-11-16 2019-04-05 东软集团股份有限公司 Deep learning training method, device, equipment and readable storage medium storing program for executing
CN109685078A (en) * 2018-12-17 2019-04-26 浙江大学 Infrared image recognition based on automatic marking

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469109A (en) * 2015-11-19 2016-04-06 中国地质大学(武汉) Transfer learning method based on class centroid alignment and for remote sensing image classification
CN106599922A (en) * 2016-12-16 2017-04-26 中国科学院计算技术研究所 Transfer learning method and transfer learning system for large-scale data calibration
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
CN107909101A (en) * 2017-11-10 2018-04-13 清华大学 Semi-supervised transfer learning character identifying method and system based on convolutional neural networks
CN108182394A (en) * 2017-12-22 2018-06-19 浙江大华技术股份有限公司 Training method, face identification method and the device of convolutional neural networks
CN108256561A (en) * 2017-12-29 2018-07-06 中山大学 A kind of multi-source domain adaptive migration method and system based on confrontation study
CN108197670A (en) * 2018-01-31 2018-06-22 国信优易数据有限公司 Pseudo label generation model training method, device and pseudo label generation method and device
CN109089133A (en) * 2018-08-07 2018-12-25 北京市商汤科技开发有限公司 Method for processing video frequency and device, electronic equipment and storage medium
CN109583594A (en) * 2018-11-16 2019-04-05 东软集团股份有限公司 Deep learning training method, device, equipment and readable storage medium storing program for executing
CN109685078A (en) * 2018-12-17 2019-04-26 浙江大学 Infrared image recognition based on automatic marking

Also Published As

Publication number Publication date
CN110188829A (en) 2019-08-30

Similar Documents

Publication Publication Date Title
CN110188829B (en) Neural network training method, target recognition method and related products
CN109117777B (en) Method and device for generating information
CN110781784A (en) Face recognition method, device and equipment based on double-path attention mechanism
EP3136292A1 (en) Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN112597984B (en) Image data processing method, image data processing device, computer equipment and storage medium
CN112804558B (en) Video splitting method, device and equipment
CN111291887A (en) Neural network training method, image recognition method, device and electronic equipment
CN114693624A (en) Image detection method, device and equipment and readable storage medium
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN110751191A (en) Image classification method and system
CN112884147A (en) Neural network training method, image processing method, device and electronic equipment
CN111652878B (en) Image detection method, image detection device, computer equipment and storage medium
CN116630727B (en) Model training method, deep pseudo image detection method, device, equipment and medium
CN113704534A (en) Image processing method and device and computer equipment
CN112257628A (en) Method, device and equipment for identifying identities of outdoor competition athletes
CN111242114A (en) Character recognition method and device
CN115937596A (en) Target detection method, training method and device of model thereof, and storage medium
Li et al. Detection of partially occluded pedestrians by an enhanced cascade detector
CN115731620A (en) Method for detecting counter attack and method for training counter attack detection model
CN112380369B (en) Training method, device, equipment and storage medium of image retrieval model
CN112989869B (en) Optimization method, device, equipment and storage medium of face quality detection model
CN111401317B (en) Video classification method, device, equipment and storage medium
CN113762042A (en) Video identification method, device, equipment and storage medium
CN113642420B (en) Method, device and equipment for recognizing lip language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant