CN117636086A - Passive domain adaptive target detection method and device - Google Patents

Passive domain adaptive target detection method and device Download PDF

Info

Publication number
CN117636086A
CN117636086A CN202311332829.7A CN202311332829A CN117636086A CN 117636086 A CN117636086 A CN 117636086A CN 202311332829 A CN202311332829 A CN 202311332829A CN 117636086 A CN117636086 A CN 117636086A
Authority
CN
China
Prior art keywords
target
targets
feature
image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311332829.7A
Other languages
Chinese (zh)
Inventor
张璐
张思琦
刘智勇
乔红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202311332829.7A priority Critical patent/CN117636086A/en
Publication of CN117636086A publication Critical patent/CN117636086A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a passive domain adaptive target detection method and device, comprising the following steps: constructing a plurality of feature prototypes of various targets based on first instance features of the various targets extracted by a teacher model from partial images of a target domain data set; correcting target detection results of all images in the target domain data set acquired by the teacher model according to a plurality of feature prototypes of all the targets to obtain pseudo labels of all the images; training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in an image to be detected by using the trained student model; the teacher model and the student model are obtained by training a target detection model by using a source domain data set in advance. According to the invention, the multi-feature prototype guide of various targets in the target domain is used for generating more accurate pseudo labels as the supervision information of model training, so that the accuracy of target detection is improved.

Description

Passive domain adaptive target detection method and device
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a device for detecting a passive domain adaptive target.
Background
When the target detection is carried out, if a target detection model trained by a training set is used for detecting a target in a new environment, the problem that data distribution in the training set (source domain) and data distribution in a testing set (target domain) are inconsistent exists, so that the trained target detection model has poor effect in the new environment. If a large enough data set is collected and marked for a new environment, the method is time-consuming, labor-consuming and high in cost.
In order to solve the problem, passive domain adaptation target detection is proposed, which aims to migrate knowledge in a target detection model trained on a source domain to a target domain, improve the detection performance of the target detection model on the target domain under the condition that the target domain data does not need to be marked and accessed, and greatly reduce marking cost.
Existing passive domain adaptive target detection methods typically employ pseudo tag generation strategies. Pseudo tags are generated for the target domain data set by a pre-trained target detection model on the source domain as supervisory information on the target domain to fine tune the target detector. In such methods, the quality of the pseudo tag affects the performance of the target detection model after adaptation. For example, class imbalance problems can introduce noisy false labels, as common classes are more of a concern, affecting the performance of target detection for rare classes. Therefore, the problems of inaccurate pseudo labels, sensitivity of the target detection model to domain offset and the like result in lower accuracy of target detection.
Disclosure of Invention
The invention provides a passive domain adaptive target detection method and device, which are used for solving the defects that in the prior art, the quality of a pseudo tag generated by a target detection model for a target domain data set is poor and the accuracy of target detection is affected, and improving the quality of the pseudo tag, thereby improving the accuracy of target detection.
The invention provides a passive domain adaptive target detection method, which comprises the following steps:
constructing a plurality of feature prototypes of various targets based on first instance features of the various targets extracted by a teacher model from partial images of a target domain data set;
correcting target detection results of all images in the target domain data set acquired by the teacher model according to a plurality of feature prototypes of all the targets to obtain pseudo labels of all the images;
training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in an image to be detected by using the trained student model;
the teacher model and the student model are obtained by training a target detection model by using a source domain data set in advance.
According to the passive domain adaptive target detection method provided by the invention, the first example features of various targets extracted from partial images of a target domain data set based on a teacher model are used for constructing a plurality of feature prototypes of the various targets, and the method comprises the following steps:
Randomly extracting a partial image from the target domain dataset;
detecting a first class label and a first detection frame of each target in the partial image based on the teacher model, and taking an image area in the first detection frame of the target as the first example characteristic;
according to the first class labels of the targets, determining targets belonging to the same class in the partial image;
and constructing a plurality of feature prototypes of each type of target according to the first example features of each type of target.
According to the passive domain adaptive target detection method provided by the invention, a plurality of feature prototypes of various targets are constructed according to the first example features of the various targets, and the method comprises the following steps:
clustering the first example features of the targets for multiple times, wherein the number of clusters in each cluster is different;
and determining the contour score of each cluster, and taking the cluster center of each cluster in the clusters corresponding to the maximum contour score as the feature prototype of each type of target.
According to the method for detecting the passive domain adaptive target provided by the invention, the target detection results of the images in the target domain data set acquired by the teacher model are corrected according to the feature prototypes of the various targets to obtain the pseudo labels of the images, and the method comprises the following steps:
Detecting a second class label, a second detection frame and confidence scores of the targets belonging to the second class label from each image of the target domain data set based on the teacher model;
taking the image area in the second detection frame of each target as a second example characteristic of each target;
correcting the second class labels of the targets and confidence scores of the targets belonging to the second class labels according to the similarity between the second example features of the targets and the feature prototypes of the targets;
and determining the pseudo tags of the targets according to the second detection frames of the targets, the corrected second class tags and the corrected confidence scores.
According to the method for detecting the passive domain adaptive targets provided by the invention, the correction of the second class labels of the targets and the confidence scores of the targets belonging to the second class labels according to the similarity between the second instance features of the targets and the feature prototypes of the targets comprises the following steps:
determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
According to the first class labels corresponding to the maximum values in the first maximum values corresponding to all the class targets, the first class labels are used as second class labels after correction of the targets;
and taking the maximum value in the first maximum values corresponding to all the category targets as the corrected confidence score of each target.
According to the passive domain adaptive target detection method provided by the invention, the image area in the second detection frame of each target is used as the second example characteristic of each target, and the method comprises the following steps:
comparing the confidence scores of the targets belonging to the second class labels with a preset threshold, wherein the preset threshold is determined according to the number of preset detection classes corresponding to the target detection model;
and taking the image area in the second detection frame of each target as a second example characteristic of each target under the condition that the confidence score is smaller than or equal to the preset threshold value.
According to the passive domain adaptive target detection method provided by the invention, each image of the target domain data set is taken as a sample, and the pseudo tag of each image is taken as a tag to train a student model, and the method comprises the following steps:
detecting a third class label, a third detection frame and confidence scores of the targets belonging to the third class label from each image of the target domain dataset based on the student model;
Taking the image area in the third detection frame of each target as a third example characteristic of each target;
determining a first loss of each target according to the loss between the third class label of each target in each image of the target domain data set and the corrected second class label and the loss between the third detection frame and the second detection frame;
determining a second loss of each target according to the similarity between the second example feature and the third example feature of each target and the feature prototypes of each target;
determining a third loss of each target according to the confidence score of each target belonging to the third category label and the corrected confidence score;
training the student model based on the first loss, the second loss, and the third loss.
According to the method for detecting the passive domain adaptive targets provided by the invention, the second loss of each target is determined according to the similarity between the second example characteristic and the third example characteristic of each target and a plurality of characteristic prototypes of each target, and the method comprises the following steps:
determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
Determining a second maximum value in similarity between the third instance feature of each object and a plurality of feature prototypes of each object;
and determining a second loss of each target according to the first maximum value and the second maximum value.
According to the passive domain adaptive target detection method provided by the invention, when each image of the target domain data set is taken as a sample, and the pseudo tag of each image is taken as a tag to train a student model, the method further comprises the following steps:
determining a new clustering center of each target according to the second example characteristic of each target newly generated in each preset number of iterations in the training process;
selecting a feature prototype most similar to a new cluster center of each type of target from the feature prototypes of each type of target;
and updating the feature prototype of the selected various targets by using the new clustering centers of the various targets.
The invention also provides a passive domain adaptive target detection device, which comprises:
the construction module is used for constructing a plurality of feature prototypes of various targets based on first example features of the various targets extracted from partial images of the target domain data set by the teacher model;
The generating module is used for correcting target detection results of all images in the target domain data set acquired by the teacher model according to a plurality of feature prototypes of all the targets to obtain pseudo labels of all the images;
and the training module is used for taking each image of the target domain data set as a sample, taking the pseudo tag of each image as a tag to train the student model, and detecting the target in the image to be detected by using the trained student model.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a passive domain adaptive target detection method as described in any one of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a passive domain adaptive target detection method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a passive domain adaptive target detection method as described in any one of the above.
According to the passive domain adaptive target detection method and device, the first example features of various targets are extracted from the partial images of the target domain data set by using the teacher model, a plurality of feature prototypes of the various targets are constructed, more representative category information is provided for the target domain, so that after the target detection results of the images in the target domain data set predicted by the teacher model are corrected by the plurality of feature prototypes of the various targets, more accurate pseudo labels are obtained and used as supervision information for training of the student model, and the accuracy of target detection is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a passive domain adaptive target detection method provided by the invention;
FIG. 2 is a schematic diagram of a passive domain adaptive target detection method framework provided by the invention;
FIG. 3 is a schematic diagram of a passive domain adaptive target detection device according to the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A passive domain adaptive target detection method of the present invention is described below with reference to fig. 1, including:
step 101, constructing a plurality of feature prototypes of various targets based on first instance features of various targets extracted by a teacher model from partial images of a target domain data set;
target domain data setComprises N t The images in the target domain dataset are unlabeled. From the target Domain data set->A partial image is extracted.
Inputting the extracted partial image into teacher model θ tea Target detection is performed to obtain a first example feature f of the target in the partial image t I.e. the image area within the detection frame of the object, and the first class label of the object.
And counting each type of object in the partial image according to the first type label of the object. And analyzing the first example features of the targets to obtain a plurality of feature prototypes of the targets. Multiple feature prototypes of various types of objects are used to represent features of various types of objects.
A cluster center adaptively generated from the distribution of the first instance features of each class of targets may serve as a plurality of feature prototypes for each class of targets, thereby providing more representative class information for the target domain. Migration of knowledge from a source domain to a target domain is guided by building class-specific multi-feature prototypes. The present embodiment does not limit the manner of constructing the feature prototype.
Step 102, correcting target detection results of images in a target domain data set acquired by a teacher model according to a plurality of feature prototypes of various targets to obtain pseudo labels of the images;
and inputting each image in the target domain data set into a teacher model to obtain a target detection result of each image.
Each image in the target domain dataset may be weakly enhanced, including random horizontal flipping, prior to being input into the teacher model. I.e., the teacher model inputs a weakly enhanced image.
Because the teacher model is obtained by training the source domain data set, the pseudo labels predicted by the teacher model on each image are inaccurate. And (3) introducing a plurality of feature prototypes specific to various targets in the target domain data set to correct target detection results of images predicted by the teacher model so as to allocate more accurate pseudo labels to the targets in the images under the condition that the within-class changes exist.
More accurate pseudo tags can be obtained based on distances between second instance features of the object extracted from each image in the object domain dataset by the teacher model and multiple feature prototypes of various types of objects.
Step 103, training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in an image to be detected by using the trained student model; and taking the target detection model obtained by training by using the source domain data set as a teacher model and a student model.
The source domain dataset comprises a plurality of images, and targets in each image are marked with category labels and detection frames.
The target detection model may employ a two-stage target detector Faster-RCNN (Faster Region Convolutional Neural Networks, faster regional convolutional neural network), but is not limited to such a model.
Training the target detection model on the source domain data set to obtain an initial target detection model theta. Respectively taking the initial target detection model theta as an initial teacher model theta tea And an initial student model θ tea . After training the target detection model on the source domain dataset, the source domain dataset is no longer used.
Input each image of the target domain dataset as a sample into the student model θ stu Obtaining the student model theta stu Predicted target detection results. According to student model theta stu Differences between predicted target detection results and corresponding pseudo tags, and student model theta is adjusted stu So that student model theta stu The difference between the predicted target detection result and the corresponding pseudo tag is smaller than the set value.
The parameters of the teacher model are updated by using an exponential moving average method according to the parameters of the student model, and the formula is as follows:
θ tea =η·θ tea +(1-η)·θ stu
where η is a coefficient of the exponential moving average method, which may be set to 0.99.
Forward reasoning is continued by the updated teacher model, and the trained student model can detect targets in the new environment.
The images in the target domain dataset may be strongly enhanced prior to being input into the student model, including one or more of random gray scale, gaussian blur, and color dithering. I.e. the student model inputs a strongly enhanced image. The frame diagram of the passive domain adaptive target detection method provided in this embodiment is shown in fig. 2.
Experiments were performed on multiple domain-adapted target detection benchmarks, including Cityscapes, foggy Cityscapes, KITTI, sim10k, BDD100k, PASCAL VOC, and waters datasets. Experimental results show that the method provided by the embodiment is superior to other passive domain adaptation target detection methods in performance.
According to the embodiment, the first example features of various targets are extracted from the partial images of the target domain data set by using the teacher model, a plurality of feature prototypes of the various targets are constructed, and more representative category information is provided for the target domain, so that after the target detection results of the images in the target domain data set predicted by the teacher model are corrected by the plurality of feature prototypes of the various targets, more accurate pseudo labels are obtained and serve as supervision information for training of the student model, and the accuracy of target detection is improved.
Based on the above embodiment, in this embodiment, based on the first example features of each type of object extracted from the partial image of the object domain data set by the teacher model, a plurality of feature prototypes of each type of object are constructed, including:
randomly extracting a part of images from the target domain data set;
detecting a first class label and a first detection frame of each target in a partial image based on a teacher model, and taking an image area in the first detection frame of the target as a first example characteristic;
According to the first class labels of the targets, determining targets belonging to the same class in part of the images;
a plurality of feature prototypes for each class of object is constructed based on the first instance features for each class of object.
From a target domain datasetSuch as randomly sampling 500 images.
And detecting the targets in each image in the partial images by using the teacher model to obtain a first class label of each target in each image, a first detection frame and confidence scores of the targets belonging to the first class label.
When the first class label of the target is determined, the teacher model predicts the confidence score of the target belonging to each preset class label, and takes the preset class label with the highest confidence score as the first class label of the target.
For each of the extracted partial imagesClass targets, the example features of the targets with highest confidence scores in each class target can be reserved as the first example features with class representativeness
From a first example feature of an object having the same first class label in a partial imageA plurality of feature prototypes for various types of targets are constructed.
On the basis of the above embodiment, in this embodiment, a plurality of feature prototypes of various targets are constructed according to the first example features of the various targets, including:
Clustering the first example features of various targets for multiple times, wherein the number of clusters in each cluster is different;
and determining the contour score of each cluster, and taking the cluster center of each cluster in the clusters corresponding to the maximum contour score as the feature prototype of each type of target.
And clustering the first example features of the i-th class of targets for a plurality of times respectively. Each clustering divides the first instance features of the class i object into multiple groups such that the differences between the first instance features within each group are smaller and the differences between the first instance features of different groups are larger. The number of groups into which the first example feature is divided is the number of clusters.
The profile score for each cluster is calculated and used to characterize the compactness of each cluster of the cluster, with the greater the value the better. The average of the profile scores of the clusters in each cluster may be taken as the profile score S of each cluster. Reserving the clustering with the largest contour score, taking the cluster number n of the clustering as the feature prototype number of the ith class target
Taking the cluster center of each cluster in the sub-clusters as the feature prototype of the ith class of targetsWherein the method comprises the steps of For the j-th feature prototype of the i-th class of object, |G ij I is the jth cluster G belonging to the ith class of objects ij Is the number of first instance features of (c).
For example, the first instance features of the class i object are clustered three times, dividing the first instance features of the class i object into 2, 3, and 4 clusters, respectively. The contour score of the first clustering is 0.82, the contour score of the second clustering is 0.92, and the contour score of the third clustering is 0.88, then the number of feature prototypes of the i-th type target is 2 clusters divided in the second clustering. The feature prototype of the i-th class of targets is the cluster center of two clusters in the second cluster.
Based on the above embodiment, in this embodiment, according to a plurality of feature prototypes of various targets, correcting target detection results of each image in a target domain data set acquired by a teacher model to obtain pseudo labels of each image, including:
detecting a second class label, a second detection frame and confidence scores of the targets belonging to the second class labels from each image of the target domain data set based on the teacher model;
taking the image area in the second detection frame of each target as a second example characteristic of each target;
correcting the second class labels of the targets and the confidence scores of the targets belonging to the second class labels according to the similarity between the second example features of the targets and the feature prototypes of the targets;
And determining the pseudo tag of each target according to the second detection frame of each target, the corrected second class tag and the corrected confidence score.
Image i in target domain data setInputting a teacher model to obtain a target detection result predicted by the teacher model:
wherein,is a second class label of the object in the i-th image in the predicted object domain dataset,/for the object in the i-th image in the predicted object domain dataset>Is a second detection frame of the object in the i-th image in the predicted object domain dataset,/for the object in the i-th image in the predicted object domain dataset>The confidence score that the target output by the classification layer, e.g., softmax, belongs to the second class label, and F is the teacher model.
Because of the domain offset between the source domain and the target domain, the target detection result contains noise samples, such as misclassified positive samples.
And correcting the target detection result by using the similarity between the second example features of the targets and the feature prototypes of the targets to obtain pseudo labels of the targets.
Based on the above embodiment, in this embodiment, correcting, according to the similarity between the second example feature of each target and the feature prototypes of each target, the second class label of each target and the confidence score of each target belonging to the second class label includes:
Determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
according to the first class labels corresponding to the maximum values in the first maximum values corresponding to all the class targets, the first class labels are used as second class labels after correction of all the targets;
and taking the maximum value in the first maximum values corresponding to all the category targets as the confidence score after correction of each target.
Based on the similarity between the second example feature of each object and the feature prototypes of each object, the formula for correcting and filtering the object detection result predicted by the teacher model can be expressed as follows:
wherein f t ' is the second detection frame B t A corresponding feature of the second example is that,an nth feature prototype, f, for an ith class of objects t ' AND->The similarity between them can be expressed by dot product of the two, nc is the total number of categories of interest, +.>N feature prototype for the j-th class of object,>is a corrected second class label, exp represents an exponential functionCount (n)/(l)>Is the confidence score after correction, τ is a set constant, +.>Is the final pseudo tag.
On the basis of the above-described embodiments, in this embodiment, an image area within a second detection frame of each object is taken as a second example feature of each object, including:
Comparing the confidence scores of the targets belonging to the second class labels with a preset threshold, wherein the preset threshold is determined according to the number of preset detection classes corresponding to the target detection model;
and taking the image area in the second detection frame of each target as a second example characteristic of each target under the condition that the confidence score is smaller than or equal to a preset threshold value.
The target detection result contains noise samples, such as negative samples, due to the domain offset between the source domain and the target domain. To filter out negative samples, filtering is done by confidence score:
c is the number of preset detection categories corresponding to the target detection model. When the target detection model detects the target, outputting the confidence coefficient score of the target belonging to each preset detection category, and taking the preset detection category with the highest confidence coefficient score as the category label of the target.
And determining a second example characteristic of the target according to the target detection result of filtering out the negative sample. The target detection result with the negative sample filtered is further corrected according to the second example characteristic of the target.
Based on the above embodiment, in this embodiment, training a student model using each image of the target domain data set as a sample and using a pseudo tag of each image as a tag includes:
Detecting a third class label, a third detection frame and confidence scores of the targets belonging to the third class label from each image of the target domain data set based on the student model;
taking the image area in the third detection frame of each target as a third example characteristic of each target;
determining a first loss of each target according to the loss between the third class label and the corrected second class label of each target in each image of the target domain data set and the loss between the third detection frame and the second detection frame;
determining a second loss of each target according to the similarity between the second example feature and the third example feature of each target and a plurality of feature prototypes of each target;
determining a third loss of each target according to the confidence score of each target belonging to the third category label and the corrected confidence score;
the student model is trained based on the first loss, the second loss, and the third loss.
For consistent learning strategies for target detection, the target detection model reduces sensitivity to domain variations by maintaining consistency between the original image and the stylistic perturbation image. The existing research applies consistency regularization on a single prediction layer, has limited performance improvement, or introduces auxiliary branches to construct consistency regularization, so that the parameter number of the model is increased.
The embodiment designs multi-level consistency regularization, including consistency regularization based on prototypes and predictive consistency regularization, further improves the performance of consistency learning in relieving domain offset problems, and introduces fewer additional model parameters.
And taking the pseudo labels of the images in the target domain data set as supervision information of the student model, and constructing a self-training loss, namely a first loss. The self-training loss function is constructed as:
wherein,and->The classification loss and the detection frame regression loss of the target in each image are respectively represented. The classification loss is obtained according to the corrected second class label in the third class label and the pseudo label of the target predicted by the student model. The regression loss of the detection frame is obtained according to a third detection frame of the target predicted by the student model and a second detection frame in the pseudo tag.
By utilizing the more accurate pseudo-labels generated by the multiple prototypes as supervisory signals for the student model on the unlabeled target domain, the detection performance of the target detector on the target domain is facilitated to be improved.
For prototype-based consistency regularization, this can be achieved by computing a class probability distribution of the instance features using multi-class prototypes. Specifically, the class probability distribution of the second example features can be determined according to the similarity between the second example features extracted by the teacher model and the feature prototypes of various targets. The class probability distribution of the third example feature may be determined based on similarities between the third example feature extracted by the student model and a plurality of feature prototypes of the various types of targets. Constructing a prototype-based consistency penalty based on the class probability distribution of the second example feature and the class probability distribution of the third example feature I.e. the second loss.
Training is performed according to the second loss, so that differences between class probability distribution of example features extracted by the teacher model and the student model can be minimized, and sensitivity of the target detection model to changes in the data field is effectively reduced.
And constructing a prediction consistency regularization loss, namely a third loss, by using confidence scores of the prediction of the teacher model and the student model. For each image in the target domain dataset, maintaining consistency in predictions for the weakly enhanced image and the strongly enhanced image helps the target detection model learn better migratable features.
Confidence scores of teacher model and student model predictions are respectively expressed as P stu And P stu And defining the prediction consistency regularization loss as:
using the constructed first, second, and third losses, student model training is supervised. Updating student model θ by an objective function as shown below stu Is defined by the parameters:
wherein,self-training loss of student model is supervised by pseudo tags,>is based on the consistency regularization loss of prototypes, < >>Is the prediction consistency regularization loss.
On the basis of the above embodiment, in this embodiment, determining the second loss of each object according to the similarity between the second example feature and the third example feature of each object and the feature prototypes of each object includes:
Determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
determining a second maximum value in the similarity between the third instance feature of each object and the feature prototypes of each object;
and determining a second loss of each target according to the first maximum value and the second maximum value.
Can determine a second example feature f extracted by the teacher model t The first maximum value of the similarity with the feature prototypes of the various targets serves as the similarity between the second example feature and the feature prototypes of the various targets. Determining the sum of the first maximum values corresponding to all the category targets, and taking the ratio between the first maximum values corresponding to the various types of targets and the sum of the first maximum values corresponding to all the category targets as the category probability distribution Z of the second example characteristic tea
Third example features that may be determined for student model extractionA second maximum value of the similarities with the feature prototypes of the various types of targets serves as the similarity between the third example feature and the feature prototypes of the various types of targets. Determining the sum of the second maximum values corresponding to all the category targets, and taking the ratio between the second maximum values corresponding to the various types of targets and the sum of the second maximum values corresponding to all the category targets as the category probability distribution Z of the third example characteristic stu The formula is as follows:
where sim is a similarity function, which may be a cosine distance.
By minimizing prototype-based consistency regularization penalty, i.e., second penaltyTo force Z stu And Z twa Consistency between:
wherein,is the Kullback-Leibler divergence, which is used to measure the degree of difference between two category probability distributions.
On the basis of the above embodiment, in this embodiment, when training a student model by taking each image of the target domain data set as a sample and taking the pseudo tag of each image as a tag, the method further includes:
determining new clustering centers of various targets according to the second example characteristics of the targets newly generated in each preset number of iterations in the training process;
selecting a feature prototype most similar to a new cluster center of each type of target from feature prototypes of each type of target;
and updating the feature prototype of each selected target by using a new cluster center of each target.
And dynamically updating the feature prototypes of the various targets according to the second example features extracted by the teacher model. A memory may be used to store second example features extracted by the teacher model.
In the training process of the student model, the teacher model extracts the second example characteristic of one image to generate a pseudo tag every time. The teacher model newly generates the second example features of the object in 100 images every predetermined number of iterations, e.g., 100. Taking the average value of the second example characteristics of the newly generated targets of various types as a new clustering center of the targets of various types C represents the category number of the target and clears the memory bank.
Selecting new cluster centers v in feature prototypes of the same category i Most similar prototypePrototype +.>
Where α is a momentum coefficient, which can be set to 0.99.
The passive domain adaptive target detection device provided by the invention is described below, and the passive domain adaptive target detection device described below and the passive domain adaptive target detection method described above can be referred to correspondingly.
As shown in fig. 3, the apparatus includes a construction module 301, a generation module 302, and a training module 303, wherein:
the construction module 301 is configured to construct a plurality of feature prototypes of various targets based on first example features of various targets extracted from a partial image of a target domain data set by a teacher model;
the generating module 302 is configured to correct, according to a plurality of feature prototypes of various targets, a target detection result of each image in the target domain data set acquired by the teacher model, so as to obtain a pseudo tag of each image;
the training module 303 is configured to train the student model with each image of the target domain dataset as a sample and with the pseudo tag of each image as a tag, and detect a target in the image to be detected using the trained student model;
The teacher model and the student model are obtained by training the target detection model by using the source domain data set in advance.
According to the embodiment, the first example features of various targets are extracted from the partial images of the target domain data set by using the teacher model, a plurality of feature prototypes of the various targets are constructed, and more representative category information is provided for the target domain, so that after the target detection results of the images in the target domain data set predicted by the teacher model are corrected by the plurality of feature prototypes of the various targets, more accurate pseudo labels are obtained and serve as supervision information for training of the student model, and the accuracy of target detection is improved.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a passive domain adaptive target detection method comprising: constructing a plurality of feature prototypes of various targets based on first instance features of various targets extracted by a teacher model from partial images of a target domain data set; correcting target detection results of all images in a target domain data set acquired by a teacher model according to a plurality of feature prototypes of all types of targets to obtain pseudo labels of all the images; training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in the image to be detected by using the trained student model; the teacher model and the student model are obtained by training the target detection model by using the source domain data set in advance.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can perform a passive domain adaptive target detection method provided by the above methods, where the method includes: constructing a plurality of feature prototypes of various targets based on first instance features of various targets extracted by a teacher model from partial images of a target domain data set; correcting target detection results of all images in a target domain data set acquired by a teacher model according to a plurality of feature prototypes of all types of targets to obtain pseudo labels of all the images; training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in the image to be detected by using the trained student model; the teacher model and the student model are obtained by training the target detection model by using the source domain data set in advance.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the passive domain adaptive target detection method provided by the above methods, the method comprising: constructing a plurality of feature prototypes of various targets based on first instance features of various targets extracted by a teacher model from partial images of a target domain data set; correcting target detection results of all images in a target domain data set acquired by a teacher model according to a plurality of feature prototypes of all types of targets to obtain pseudo labels of all the images; training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in the image to be detected by using the trained student model; the teacher model and the student model are obtained by training the target detection model by using the source domain data set in advance.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A passive domain adaptive target detection method, comprising:
constructing a plurality of feature prototypes of various targets based on first instance features of the various targets extracted by a teacher model from partial images of a target domain data set;
correcting target detection results of all images in the target domain data set acquired by the teacher model according to a plurality of feature prototypes of all the targets to obtain pseudo labels of all the images;
training a student model by taking each image of the target domain data set as a sample and taking a pseudo tag of each image as a tag, and detecting a target in an image to be detected by using the trained student model;
the teacher model and the student model are obtained by training a target detection model by using a source domain data set in advance.
2. The passive domain adaptive object detection method according to claim 1, wherein the constructing a plurality of feature prototypes of each type of object based on the first instance features of each type of object extracted by the teacher model from the partial image of the object domain data set comprises:
randomly extracting a partial image from the target domain dataset;
Detecting a first class label and a first detection frame of each target in the partial image based on the teacher model, and taking an image area in the first detection frame of the target as the first example characteristic;
according to the first class labels of the targets, determining targets belonging to the same class in the partial image;
and constructing a plurality of feature prototypes of each type of target according to the first example features of each type of target.
3. The passive domain adaptive object detection method of claim 2, wherein said constructing a plurality of feature prototypes for each of said objects from first instance features of said each of said objects comprises:
clustering the first example features of the targets for multiple times, wherein the number of clusters in each cluster is different;
and determining the contour score of each cluster, and taking the cluster center of each cluster in the clusters corresponding to the maximum contour score as the feature prototype of each type of target.
4. The passive domain adaptive target detection method according to claim 2, wherein correcting the target detection result of each image in the target domain data set obtained by the teacher model according to the feature prototypes of the various targets to obtain the pseudo tag of each image comprises:
Detecting a second class label, a second detection frame and confidence scores of the targets belonging to the second class label from each image of the target domain data set based on the teacher model;
taking the image area in the second detection frame of each target as a second example characteristic of each target;
correcting the second class labels of the targets and confidence scores of the targets belonging to the second class labels according to the similarity between the second example features of the targets and the feature prototypes of the targets;
and determining the pseudo tags of the targets according to the second detection frames of the targets, the corrected second class tags and the corrected confidence scores.
5. The passive domain adaptive object detection method according to claim 4, wherein correcting the second class labels of the objects and the confidence scores of the objects belonging to the second class labels according to the similarity between the second instance features of the objects and the feature prototypes of the objects comprises:
determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
According to the first class labels corresponding to the maximum values in the first maximum values corresponding to all the class targets, the first class labels are used as second class labels after correction of the targets;
and taking the maximum value in the first maximum values corresponding to all the category targets as the corrected confidence score of each target.
6. The passive domain adaptive object detection method of claim 4, wherein said characterizing an image area within a second detection frame of each object as a second instance of each object comprises:
comparing the confidence scores of the targets belonging to the second class labels with a preset threshold, wherein the preset threshold is determined according to the number of preset detection classes corresponding to the target detection model;
and taking the image area in the second detection frame of each target as a second example characteristic of each target under the condition that the confidence score is smaller than or equal to the preset threshold value.
7. The passive domain adaptive object detection method of claim 4, wherein training the student model using each image of the object domain dataset as a sample and a pseudo tag of each image as a tag comprises:
Detecting a third class label, a third detection frame and confidence scores of the targets belonging to the third class label from each image of the target domain dataset based on the student model;
taking the image area in the third detection frame of each target as a third example characteristic of each target;
determining a first loss of each target according to the loss between the third class label of each target in each image of the target domain data set and the corrected second class label and the loss between the third detection frame and the second detection frame;
determining a second loss of each target according to the similarity between the second example feature and the third example feature of each target and the feature prototypes of each target;
determining a third loss of each target according to the confidence score of each target belonging to the third category label and the corrected confidence score;
training the student model based on the first loss, the second loss, and the third loss.
8. The passive domain adaptive object detection method of claim 7, wherein said determining the second loss of each object based on the similarity between the second instance feature, the third instance feature, and the plurality of feature prototypes of each object comprises:
Determining a first maximum value in similarity between the second example feature of each object and a plurality of feature prototypes of each object;
determining a second maximum value in similarity between the third instance feature of each object and a plurality of feature prototypes of each object;
and determining a second loss of each target according to the first maximum value and the second maximum value.
9. The passive domain adaptive object detection method according to claim 4, wherein training the student model using the images of the object domain dataset as samples and the pseudo tags of the images as tags, further comprises:
determining a new clustering center of each target according to the second example characteristic of each target newly generated in each preset number of iterations in the training process;
selecting a feature prototype most similar to a new cluster center of each type of target from the feature prototypes of each type of target;
and updating the feature prototype of the selected various targets by using the new clustering centers of the various targets.
10. A passive domain adaptive target detection apparatus, comprising:
The construction module is used for constructing a plurality of feature prototypes of various targets based on first example features of the various targets extracted from partial images of the target domain data set by the teacher model;
the generating module is used for correcting target detection results of all images in the target domain data set acquired by the teacher model according to a plurality of feature prototypes of all the targets to obtain pseudo labels of all the images;
the training module is used for taking each image of the target domain data set as a sample, taking the pseudo tag of each image as a tag to train a student model, and detecting a target in an image to be detected by using the trained student model;
the teacher model and the student model are obtained by training a target detection model by using a source domain data set in advance.
CN202311332829.7A 2023-10-13 2023-10-13 Passive domain adaptive target detection method and device Pending CN117636086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311332829.7A CN117636086A (en) 2023-10-13 2023-10-13 Passive domain adaptive target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311332829.7A CN117636086A (en) 2023-10-13 2023-10-13 Passive domain adaptive target detection method and device

Publications (1)

Publication Number Publication Date
CN117636086A true CN117636086A (en) 2024-03-01

Family

ID=90036630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311332829.7A Pending CN117636086A (en) 2023-10-13 2023-10-13 Passive domain adaptive target detection method and device

Country Status (1)

Country Link
CN (1) CN117636086A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861616A (en) * 2020-12-31 2021-05-28 电子科技大学 Passive field self-adaptive target detection method
CN113221903A (en) * 2021-05-11 2021-08-06 中国科学院自动化研究所 Cross-domain self-adaptive semantic segmentation method and system
US11100373B1 (en) * 2020-11-02 2021-08-24 DOCBOT, Inc. Autonomous and continuously self-improving learning system
CN114581350A (en) * 2022-02-23 2022-06-03 清华大学 Semi-supervised learning method suitable for monocular 3D target detection task
CN114943965A (en) * 2022-05-31 2022-08-26 西北工业大学宁波研究院 Unsupervised domain self-adaptive remote sensing image semantic segmentation method based on course learning
CN115082757A (en) * 2022-07-13 2022-09-20 北京百度网讯科技有限公司 Pseudo label generation method, target detection model training method and device
WO2022242352A1 (en) * 2021-05-21 2022-11-24 北京沃东天骏信息技术有限公司 Methods and apparatuses for building image semantic segmentation model and image processing, electronic device, and medium
US20230154167A1 (en) * 2021-11-15 2023-05-18 Nec Laboratories America, Inc. Source-free cross domain detection method with strong data augmentation and self-trained mean teacher modeling
CN116310655A (en) * 2023-04-23 2023-06-23 中国人民解放军国防科技大学 Infrared dim target detection method and device based on semi-supervised mixed domain adaptation
CN116433909A (en) * 2023-04-16 2023-07-14 广西大学 Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN116524289A (en) * 2022-01-21 2023-08-01 华为技术有限公司 Model training method and related system
CN116645512A (en) * 2023-06-01 2023-08-25 航天科工深圳(集团)有限公司 Self-adaptive semantic segmentation method and device under severe conditions

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11100373B1 (en) * 2020-11-02 2021-08-24 DOCBOT, Inc. Autonomous and continuously self-improving learning system
CN112861616A (en) * 2020-12-31 2021-05-28 电子科技大学 Passive field self-adaptive target detection method
CN113221903A (en) * 2021-05-11 2021-08-06 中国科学院自动化研究所 Cross-domain self-adaptive semantic segmentation method and system
WO2022242352A1 (en) * 2021-05-21 2022-11-24 北京沃东天骏信息技术有限公司 Methods and apparatuses for building image semantic segmentation model and image processing, electronic device, and medium
US20230154167A1 (en) * 2021-11-15 2023-05-18 Nec Laboratories America, Inc. Source-free cross domain detection method with strong data augmentation and self-trained mean teacher modeling
CN116524289A (en) * 2022-01-21 2023-08-01 华为技术有限公司 Model training method and related system
CN114581350A (en) * 2022-02-23 2022-06-03 清华大学 Semi-supervised learning method suitable for monocular 3D target detection task
CN114943965A (en) * 2022-05-31 2022-08-26 西北工业大学宁波研究院 Unsupervised domain self-adaptive remote sensing image semantic segmentation method based on course learning
CN115082757A (en) * 2022-07-13 2022-09-20 北京百度网讯科技有限公司 Pseudo label generation method, target detection model training method and device
CN116433909A (en) * 2023-04-16 2023-07-14 广西大学 Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN116310655A (en) * 2023-04-23 2023-06-23 中国人民解放军国防科技大学 Infrared dim target detection method and device based on semi-supervised mixed domain adaptation
CN116645512A (en) * 2023-06-01 2023-08-25 航天科工深圳(集团)有限公司 Self-adaptive semantic segmentation method and device under severe conditions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付玉香;秦永彬;申国伟;: "基于迁移学习的多源数据隐私保护方法研究", 计算机工程与科学, no. 04, 15 April 2019 (2019-04-15) *
姚明海;黄展聪;: "基于主动学习的半监督领域自适应方法研究", 高技术通讯, no. 08, 15 August 2020 (2020-08-15) *

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
US11562591B2 (en) Computer vision systems and methods for information extraction from text images using evidence grounding techniques
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
CN109086654B (en) Handwriting model training method, text recognition method, device, equipment and medium
CN111652332B (en) Deep learning handwritten Chinese character recognition method and system based on two classifications
WO2019232843A1 (en) Handwritten model training method and apparatus, handwritten image recognition method and apparatus, and device and medium
CN107526785A (en) File classification method and device
CN110866530A (en) Character image recognition method and device and electronic equipment
CN106446896A (en) Character segmentation method and device and electronic equipment
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
CN114139676A (en) Training method of domain adaptive neural network
CN116192500A (en) Malicious flow detection device and method for resisting tag noise
CN114818963B (en) Small sample detection method based on cross-image feature fusion
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN115424093A (en) Method and device for identifying cells in fundus image
CN117153268A (en) Cell category determining method and system
CN116738330A (en) Semi-supervision domain self-adaptive electroencephalogram signal classification method
Sarraf French word recognition through a quick survey on recurrent neural networks using long-short term memory RNN-LSTM
CN109101984B (en) Image identification method and device based on convolutional neural network
CN113408418A (en) Calligraphy font and character content synchronous identification method and system
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN117636086A (en) Passive domain adaptive target detection method and device
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN110378405A (en) The Hyperspectral Remote Sensing Imagery Classification method of Adaboost algorithm based on transfer learning
CN115310491A (en) Class-imbalance magnetic resonance whole brain data classification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination