CN117830616A - Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag - Google Patents

Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag Download PDF

Info

Publication number
CN117830616A
CN117830616A CN202311807723.8A CN202311807723A CN117830616A CN 117830616 A CN117830616 A CN 117830616A CN 202311807723 A CN202311807723 A CN 202311807723A CN 117830616 A CN117830616 A CN 117830616A
Authority
CN
China
Prior art keywords
image
domain
network
pseudo tag
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311807723.8A
Other languages
Chinese (zh)
Inventor
耿杰
齐浩
陈文会
蒋雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202311807723.8A priority Critical patent/CN117830616A/en
Publication of CN117830616A publication Critical patent/CN117830616A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a remote sensing image unsupervised cross-domain target detection method based on a progressive pseudo tag, which mainly solves the problems that the efficiency of a dual-stage detector is lower than that of a single-stage detector and the detection precision of the existing unsupervised cross-domain target detection method directly applied to the field of multi-platform remote sensing images is poor under the Fast R-CNN method based on the dual-stage of the unsupervised domain self-adaptive target detection method. When the model starts training, the method of the invention uses a set higher threshold value to filter out the false labels with low quality due to the domain difference between the source domain and the target domain. With the increase of training times, the student network adaptively learns the domain invariant features between the two domains through the multi-scale image level contrast domain adaptation and the instance level contrast domain adaptation, so that the quality of generating pseudo labels by the teacher network is improved. In the invention, a nonlinear function is used as the weight of the threshold value to help the teacher network generate a proper pseudo tag, thereby improving the performance of the cross-domain target detection model.

Description

Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag
Technical Field
The invention relates to the field of computer vision, in particular to an image unsupervised cross-domain target detection method.
Background
The nature of performance degradation in cross-domain detection of multi-platform remote sensing images is due to domain distribution shifts between different platforms. Such a distribution shift is manifested by differences in view angle, illumination, resolution, etc. of the image, resulting in difficulty in generalizing the model to a new platform in the object detection task. The study of unsupervised domain adaptive target detection aims to solve the domain distribution offset problem. The three main methods are a method based on countermeasure learning, a self-training method based on pseudo tags, and a conversion method based on images. The domain self-adaptive target detection method based on the countermeasure learning aims to solve the problem of distribution difference between a source domain and a target domain. It assists in training the target detection model by introducing a domain arbiter. The task of the domain arbiter is to determine whether the input features are from the source domain or the target domain. The detector model is trained to generate features that can spoof domain discriminators such that the domain discriminators cannot accurately distinguish between features of source and target domains. When the detector and domain arbiter reach dynamic equilibrium, the detector can generate domain invariant features, thereby enhancing performance over the target domain. The self-training method based on the pseudo tag is an iterative training strategy. First, an object detection model is trained on source domain data. This model is then used to predict the target domain data and generate a high confidence prediction result (pseudo tag). These target domain data are then used with their corresponding pseudo tags for training of the target domain model. This iterative training progressively improves the performance of the target domain model due to the higher accuracy of the pseudo tag. The image-to-image based transformation method transforms the target domain image into a pattern resembling the source domain image or transforms the source domain image into a pattern resembling the target domain image by introducing an additional transformation model. This helps reduce the visual distribution difference between the source domain and the target domain. For example, image conversion from a target domain to a source domain may be achieved using a Generation Antagonism Network (GAN) such that the target domain image is more closely distributed to the source domain image. In this way the performance of the detector over the target domain can be improved. Most unsupervised domain adaptive target detection methods are based on dual-stage Fast R-CNN, whereas dual-stage detectors are inefficient compared to single-stage detectors. The existing unsupervised contrast domain self-adaptive method in the field of computer vision is directly applied to the field of multi-platform remote sensing images, and the detection accuracy is poor. The existing method uses an average teacher network in semi-supervised learning to carry out unsupervised domain self-adaption, and is directly applied to the field of multi-platform remote sensing images, and detection accuracy is poor.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a remote sensing image unsupervised cross-domain target detection method based on a progressive pseudo tag.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: generating a pseudo tag on the target domain data by utilizing a pre-trained average teacher network;
D S and D T Image training dataset, wherein D S The source domain image is a labeled image training dataset; d (D) T For the target domain image, this is the image dataset D without labels T The method comprises the steps of carrying out a first treatment on the surface of the A single-stage detector yolov5 is selected for each of a teacher network and a student network in the average teacher network; the student network is different from the teacher network in that a 4 th convolution layer, a 5 th convolution layer and a 6 th convolution layer of the student network are respectively connected with a gradient inversion layer and a domain discriminator; the average teacher network uses CSPDarknet53 as a feature extraction method;
step 1-1: go through D T Weak boost operation; the weak boost operation is: in D T For input, sequentially processing by using a random horizontal overturning and cutting method;
step 1-2: inputting the result obtained after the weak enhancement operation in the step 1-1 into a teacher network to obtain a result in D T Upper prediction result, D T The predicted result is the predicted boundary frame coordinates and the predicted classification result; network teacher in D T As a pseudo-result of the above predictionA label;
step 1-3: setting a pseudo tag dynamic optimization strategy to filter the pseudo tag;
step 2: training a student network using the pseudo tag;
step 2-1: will D T Performing strong enhancement operation; the strong enhancement is carried out by sequentially using random color dithering, gray scale and Gaussian blur;
step 2-2: taking predicted boundary frame coordinates and predicted classification results obtained after pseudo tag filtering in the step 1-3 as D after the step 2-1 strong enhancement operation T Is a pseudo tag of (2);
step 2-3: will D S D after the treatment of the step 2-2 T Inputting the student networks together;
step 2-4: d (D) S And D T Shallow layer characteristics f are output through 4 th convolution layers of student networks respectively S1 And f T1 Output middle layer feature f of 5 th convolution layer S2 And f T2 The 6 th convolution layer outputs a high-level feature f S3 And f T3 The 4 th convolution layer, the 5 th convolution layer and the 6 th convolution layer are respectively connected with a gradient inversion layer and a domain discriminator, f S1 ,f S2 ,f S3 And f T1 ,f T2 ,f T3 Respectively input into a corresponding gradient inversion layer and a domain discriminator, and output as f S ' 1 ,f S ' 2 ,f S ' 3 And f T ' 1 ,f T ' 2 ,f T ' 3 The domain arbiter is used to distinguish the domain labels, and the image level contrast domain adaptation loss is as follows:
to represent image-level contrast domain adaptation on shallow features,/for>Representing image-level contrast domain adaptation in mid-layer features,/->Representing image-level contrast domain adaptation at high-level features; d in n Domain labels representing the nth training image, if the input is from D S Then D is 0 from D T D is 1; />Representing that the point of the nth training image at the (h, w) position comes from D S Probability values of (2);
step 3: realizing domain self-adaption in a student network by utilizing contrast learning;
step 3-1: f output in step 2-4 S ' 1 ,f S ' 2 ,f S ' 3 And f T ' 1 ,f T ' 2 ,f T ' 3 Respectively obtaining D after passing through FPN network S Is a fused image feature f S And D T Is a fused image feature f T
Step 3-2: f (f) S According to class D S True tag classification of f T According to the classification of the pseudo tag generated in step 1-3; f of step 3-1 S And f T Image features of the same class as positive samplesAnd->Classifying non-identical image features as negative sample +.>And->ijk represents the jth image feature vector of the ith image, respectivelyk classes; constructing the image level contrast loss L InfoNCE
Step 4: training an average teacher network using the total loss function; the total loss L is the detector loss plus weighted image level contrast loss plus image level contrast domain adaptation at three scalesLambda super parameter takes value of 0.1, L det As a loss function of yolov5 detector, L InfoNCE Loss for image level contrast; the student network updates the parameters of the teacher network through the index moving average EMA;
θ t ←αθ t +(1-α)θ s
θ t and theta s Representing updated parameters for the teacher network and the student network, respectively, where α is the EMA decay rate and is set to 0.999.
Further, in the step 1-3, the method for setting the dynamic optimization strategy delta of the pseudo tag is as follows:
wherein δ is related to the training period score e, decreasing nonlinearly from 1 to 0.5; e, e t E is the current training period, epsilon is the super parameter;
delta is used in the teacher network as a weight for the confidence threshold for filtering low quality false labels.
Furthermore, in the setting of the pseudo tag dynamic optimization strategy delta, epsilon takes a value of 0.5, the confidence coefficient threshold value at the beginning of training is 0.8, the confidence coefficient threshold value of the current training period is 0.8 delta, and the following weight is reduced to 0.4 along with the increase of the training period; the teacher network outputs as a pseudo tag with a confidence threshold greater than 0.8 delta.
Further, in the step 3-2, the image level contrast loss L InfoNCE The method comprises the following steps:
k is the number of categories, and the number of categories,a set of image features representing an ith image of the source domain image>The j-th feature vector representing the i-th image of the source domain image corresponds to a category k,/or->Representing a set of positive samples of the same class k and a predictive probability value greater than a threshold delta for positive samples S Is composed of image features of delta S Taking image features of 0.5, negative samplesIs>There are two parts, one is a negative sample with a class other than k, and the other is a negative sample with a class k but with a prediction probability less than the positive sample threshold delta S Is a negative sample of (2);
a set of image features representing an ith image of the target domain image>The j-th feature vector representing the i-th image of the target domain corresponds to a category k,/or->Representing a set of positive samples, wherein the target domain image has no label information, using pseudo labels generated by the teacher network in the steps 1-3 as the label information, and if the categories are k and the predicted probability value is greater than the threshold delta of the positive samples T Image feature =0.5 constitutes a positive sample, image feature of negative sample +.>Is>There are two parts, one is a negative sample of class not k, the other is a negative sample of class k but the predictive probability is less than the positive sample threshold delta T Is a negative sample of (a).
The invention has the beneficial effects that: the same class of automobiles of the VisDrone data set and the DIOR data set are used as detection targets, the performance evaluation index of the detection targets is the commonly used precision mAP in the target detection task, and compared with mAPs detected by other four cross-domain targets, the method has the best detection effect in cross-domain target detection from the VisDrone data set to the DIOR data set;
TABLE 1
VisDrone->DIOR mAP
The method adopted by the invention 55.9
ConfMIX 49.6
SSDA 46.3
AcroFOD 46.8
MS-DA 52.7
The mAP of the existing MSDA algorithm with highest precision is 52.7%, and the method is improved to 55.9% and improved by 2.2%. The method has the advantage that the good cross-domain target detection performance of the model is shown.
Drawings
FIG. 1 is a schematic diagram of a detection model of the present invention;
fig. 2 is a cross-domain detection flow diagram.
Detailed Description
The invention will be further described with reference to the drawings and examples.
Step 1: generating pseudo tags on target domain data using a pretrained average teacher network
D S And D T Image training dataset, wherein D S Is the sourceDomain images, which are labeled image training datasets; d (D) T The image is a target domain image, and is an image dataset without labels; the teacher network and the backbone network of the student network in the average teacher network both select a single-stage detector yolov5; the student network is different from the teacher network in that a 4 th convolution layer, a 5 th convolution layer and a 6 th convolution layer of the student network are respectively connected with a gradient inversion layer and a domain discriminator; the average teacher network uses CSPDarknet53 as a feature extraction method of the backbone network;
step 1-1: go through D T Weak boost operation; the weak boost operation is: in D T For input, processing by using a random horizontal overturning and cutting method; the method comprises the steps of carrying out a first treatment on the surface of the
Step 1-2: inputting the result obtained after the weak enhancement operation in the step 1-1 into a teacher network to obtain a result in D T Upper prediction result, D T The predicted result is the predicted boundary frame coordinates and the predicted classification result; network teacher in D T The predicted result is used as a pseudo tag;
step 1-3: setting a dynamic optimization strategy of the pseudo tag, and improving the quality of the pseudo tag;
the method for setting the dynamic optimization strategy of the pseudo tag comprises the following steps:
wherein δ is related to the training period denoted e and decreases nonlinearly from 1 to 0.5; e, e t E is the current period, and epsilon is the super parameter;
using delta as a weight for a confidence threshold for filtering low quality pseudo tags in a teacher network;
epsilon takes a value of 0.5, the confidence coefficient threshold value at the beginning of training is 0.8, the confidence coefficient threshold value of the current training period is 0.8 delta, and the following weight is reduced to 0.4 along with the increase of the training period; the teacher network outputs as a pseudo tag with a confidence threshold greater than 0.8 delta.
Step 2: training a student network using the pseudo tag;
the student network adopts a yolov5 single-stage target detection algorithm commonly used in target detection, and uses CSPDarknet53 as the feature extraction of a backbone network;
CSPDarknet53 is a deep convolutional neural network with a series of convolutional layers, pooling layers, and residual connections; the layers form a feature extraction structure, and feature information of an input image is gradually extracted through the feature extraction structure; YOLOv5 extracts shallow layer features, middle layer features and high layer features of the image through CSPDarknet 53; information from different layers is fused through an FPN network, and finally a prediction result is obtained through an output end of yolov5;
gradient Reversal Layer gradient inversion layer allows the gradient direction to be automatically inverted during counter-propagation; the domain arbiter uses the input features to determine whether the sample is from a source domain or a target domain;
step 2-1: will D T Performing strong enhancement operation; the strong enhancement is carried out by sequentially using random color dithering, gray scale and Gaussian blur; the method comprises the steps of carrying out a first treatment on the surface of the
Step 2-2: taking predicted boundary frame coordinates and predicted classification results obtained after pseudo tag filtering in the step 1-3 as D after the step 2-1 strong enhancement operation T Is a pseudo tag of (2);
step 2-3: will D S D after the treatment of the step 2-2 T Inputting the student networks together;
step 2-4: output shallow layer characteristic f of 4 th convolution layer of student network 1 Output middle layer feature f of 5 th convolution layer 2 The 6 th convolution layer outputs a high-level feature f 3 The three convolution layers are respectively connected with a gradient inversion layer and a domain discriminator, f 1 ,f 2 ,f 3 Input into the corresponding gradient inversion layer and output from the domain discriminator as f 1 ',f 2 ',f 3 ' the domain arbiter is used to distinguish the domain labels, the image level contrast domain adaptation loss is as follows:
to represent image-level contrast domain adaptation on shallow features,/for>Representing image-level contrast domain adaptation in mid-layer features,/->Representing image-level contrast domain adaptation at high-level features; d in n Domain labels representing the nth training image, if the input is from D S Then D is 0 from D T D is 1; />Representing that the point of the nth training image at the (h, w) position comes from D S Probability values of (2);
in a classical average teacher network, since only a source domain image has tag information, the learned characteristics of the teacher network and a student network are easily biased to the characteristics of the source domain image; the multi-scale resistance learning is introduced into a student network in an average teacher network, and domain invariant features of a source domain image and a target domain image are learned; by the method, domain offset phenomenon can be effectively relieved, and the performance of cross-domain target detection is improved;
step 3: realizing domain self-adaption in a student network by utilizing contrast learning;
step 3-1: f output in step 2-4 S ' 1 ,f S ' 2 ,f S ' 3 And f T ' 1 ,f T ' 2 ,f T ' 3 Respectively obtaining D after passing through FPN network S Is a fused image feature f S And D T Is a fused image feature f T ;;
Step 3-2: f (f) S According to class D S True tag classification of f T According to the classification of the pseudo tag generated in step 1-3; f of step 3-1 S And f T Image features of the same class as positive samplesAnd->Classifying non-identical image features as negative sample +.>And->ijk represents the kth class of the jth image feature vector of the ith image, respectively; constructing image-level contrast learning domain adaptive loss according to the self-adaptive loss;
image level contrast loss L Inf The oNCE is:
k is the number of categories, and the number of categories,a set of image features representing an ith image of the source domain image>Representing the ith image of the source domain imageThe j-th feature vector corresponds to a category k, < >>Representing a set of positive samples of the same class k and a predictive probability value greater than a threshold delta for positive samples S Is composed of image features of delta S Taking image features of 0.5, negative samplesIs>There are two parts, one is a negative sample with a class other than k, and the other is a negative sample with a class k but with a prediction probability less than the positive sample threshold delta S Is a negative sample of (2);
a set of image features representing an ith image of the target domain image>The j-th feature vector representing the i-th image of the target domain corresponds to a category k,/or->Representing a set of positive samples therein, the target domain image having no label information, using pseudo labels generated by the teacher network of steps 1-3For label information, if the categories are k and the predicted probability value is greater than the threshold delta of positive samples T Image feature =0.5 constitutes a positive sample, image feature of negative sample +.>Is>There are two parts, one is a negative sample of class not k, the other is a negative sample of class k but the predictive probability is less than the positive sample threshold delta T Is a negative sample of (a).
In order to further solve the domain offset problem of the image level in the cross-domain target detection, the feature distances of the same category of the source domain image and the target domain image can be shortened by using contrast learning at the image feature level, so that the intra-category difference between the image features of the source domain image and the target domain image is reduced, and the performance of the cross-domain target detection is improved;
step 4: training an average teacher network using the total loss function; the total loss L is the detector loss plus weighted image level contrast loss plus image level contrast domain adaptation at three scales,lambda super parameter takes value of 0.1, L det A loss function for the yolov5 detector; the student network updates the parameters of the teacher network through the index moving average EMA;
θ t ←αθ t +(1-α)θ s
θ t and theta s Representing updated parameters for the teacher network and the student network, respectively, where α is the EMA decay rate and is set to 0.999.

Claims (4)

1. A remote sensing image non-supervision cross-domain target detection method based on progressive pseudo tags is characterized by comprising the following steps of: the method comprises the following steps:
step 1: generating a pseudo tag on the target domain data by utilizing a pre-trained average teacher network;
D S and D T Image training dataset, wherein D S The source domain image is a labeled image training dataset; d (D) T For the target domain image, this is the image dataset D without labels T The method comprises the steps of carrying out a first treatment on the surface of the A single-stage detector yolov5 is selected for each of a teacher network and a student network in the average teacher network; the student network is different from the teacher network in that a 4 th convolution layer, a 5 th convolution layer and a 6 th convolution layer of the student network are respectively connected with a gradient inversion layer and a domain discriminator; the average teacher network uses CSPDarknet53 as a feature extraction method;
step 1-1: go through D T Weak boost operation; the weak boost operation is: in D T For input, sequentially processing by using a random horizontal overturning and cutting method;
step 1-2: inputting the result obtained after the weak enhancement operation in the step 1-1 into a teacher network to obtain a result in D T Upper prediction result, D T The predicted result is the predicted boundary frame coordinates and the predicted classification result; network teacher in D T The predicted result is used as a pseudo tag;
step 1-3: setting a pseudo tag dynamic optimization strategy to filter the pseudo tag;
step 2: training a student network using the pseudo tag;
step 2-1: will D T Performing strong enhancement operation; the strong enhancement is carried out by sequentially using random color dithering, gray scale and Gaussian blur;
step 2-2: taking predicted boundary frame coordinates and predicted classification results obtained after pseudo tag filtering in the step 1-3 as D after the step 2-1 strong enhancement operation T Is a pseudo tag of (2);
step 2-3: will D S D after the treatment of the step 2-2 T Inputting the student networks together;
step 2-4: d (D) S And D T Shallow layer characteristics f are output through 4 th convolution layers of student networks respectively S1 And f T1 Output middle layer feature f of 5 th convolution layer S2 And f T2 The 6 th convolution layer outputs a high-level feature f S3 And f T3 The 4 th convolution layer, the 5 th convolution layer and the 6 th convolution layer are respectively connectedA gradient inversion layer and a domain discriminator f S1 ,f S2 ,f S3 And f T1 ,f T2 ,f T3 Respectively input into a corresponding gradient inversion layer and a domain discriminator, and output as f' S1 ,f′ S2 ,f′ S3 And f' T1 ,f′ T2 ,f′ T3 The domain arbiter is used to distinguish the domain labels, and the image level contrast domain adaptation loss is as follows:
to represent image-level contrast domain adaptation on shallow features,/for>Representing image-level contrast domain adaptation in mid-layer features,/->Representing image-level contrast domain adaptation at high-level features; d in n Domain labels representing the nth training image, if the input is from D S Then D is 0 from D T D is 1; />Representing that the point of the nth training image at the (h, w) position comes from D S Probability values of (2);
step 3: realizing domain self-adaption in a student network by utilizing contrast learning;
step 3-1: f 'output in step 2-4' S1 ,f′ S2 ,f′ S3 And f' T1 ,f′ T2 ,f′ T3 Respectively obtaining D after passing through FPN network S Is a fused image feature f S And D T Is a fused graph of (1)Image feature f T
Step 3-2: f (f) S According to class D S True tag classification of f T According to the classification of the pseudo tag generated in step 1-3; f of step 3-1 S And f T Image features of the same class as positive samplesAnd->Classifying non-identical image features as negative sample +.>And->ijk represents the kth class of the jth feature vector of the ith image, respectively; constructing the image level contrast loss L InfoNCE
Step 4: training an average teacher network using the total loss function; the total loss L is the detector loss plus weighted image level contrast loss plus image level contrast domain adaptation at three scalesLambda super parameter takes value of 0.1, L det As a loss function of yolov5 detector, L InfoNCE Loss for image level contrast; the student network updates the parameters of the teacher network through the index moving average EMA;
θ t ←αθ t +(1-α)θ s
θ t and theta s Representing updated parameters for the teacher network and the student network, respectively, where α is the EMA decay rate and is set to 0.999.
2. The method for detecting the remote sensing image unsupervised cross-domain target based on the progressive pseudo tag according to claim 1, wherein the method comprises the following steps: in the step 1-3, the method for setting the dynamic optimization strategy delta of the pseudo tag is as follows:
wherein δ is related to the training period score e, decreasing nonlinearly from 1 to 0.5; e, e t E is the current training period, epsilon is the super parameter;
delta is used in the teacher network as a weight for the confidence threshold for filtering low quality false labels.
3. The method for detecting the remote sensing image unsupervised cross-domain target based on the progressive pseudo tag according to claim 2, which is characterized by comprising the following steps: in the setting of the pseudo tag dynamic optimization strategy delta, epsilon takes a value of 0.5, the confidence coefficient threshold value at the beginning of training is 0.8, the confidence coefficient threshold value of the current training period is 0.8 delta, and the following weight is reduced to 0.4 along with the increase of the training period; the teacher network outputs as a pseudo tag with a confidence threshold greater than 0.8 delta.
4. The method for detecting the remote sensing image unsupervised cross-domain target based on the progressive pseudo tag according to claim 1, wherein the method comprises the following steps:
in the step 3-2, the image level contrast loss L InfoNCE The method comprises the following steps:
k is the number of categories, and the number of categories,a set of image features representing an ith image of the source domain image>The j-th feature vector representing the i-th image of the source domain image corresponds to a category k,/or->Representing a set of positive samples of the same class k and a predictive probability value greater than a threshold delta for positive samples S Is composed of image features of delta S Taking the image feature of the negative sample 0.5 +.>Is>There are two parts, one is a negative sample with a class other than k, and the other is a negative sample with a class k but with a prediction probability less than the positive sample threshold delta S Is a negative sample of (2);
a set of image features representing an ith image of the target domain image>The j-th feature vector representing the i-th image of the target domain corresponds to a category k,/or->Representing a set of positive samples, wherein the target domain image has no label information, using pseudo labels generated by the teacher network in the steps 1-3 as the label information, and if the categories are k and the predicted probability value is greater than the threshold delta of the positive samples T Image feature =0.5 constitutes a positive sample, image feature of negative sample +.>Is>There are two parts, one is a negative sample of class not k, the other is a negative sample of class k but the predictive probability is less than the positive sample threshold delta T Is a negative sample of (a).
CN202311807723.8A 2023-12-26 2023-12-26 Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag Pending CN117830616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311807723.8A CN117830616A (en) 2023-12-26 2023-12-26 Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311807723.8A CN117830616A (en) 2023-12-26 2023-12-26 Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag

Publications (1)

Publication Number Publication Date
CN117830616A true CN117830616A (en) 2024-04-05

Family

ID=90520441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311807723.8A Pending CN117830616A (en) 2023-12-26 2023-12-26 Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag

Country Status (1)

Country Link
CN (1) CN117830616A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097759A (en) * 2024-04-23 2024-05-28 齐鲁工业大学(山东省科学院) Cross-domain face counterfeiting detection method based on double-branch collaborative learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097759A (en) * 2024-04-23 2024-05-28 齐鲁工业大学(山东省科学院) Cross-domain face counterfeiting detection method based on double-branch collaborative learning

Similar Documents

Publication Publication Date Title
CN110555475A (en) few-sample target detection method based on semantic information fusion
CN104462494B (en) A kind of remote sensing image retrieval method and system based on unsupervised feature learning
CN106446896A (en) Character segmentation method and device and electronic equipment
CN104866810A (en) Face recognition method of deep convolutional neural network
CN113486764B (en) Pothole detection method based on improved YOLOv3
CN110991257B (en) Polarized SAR oil spill detection method based on feature fusion and SVM
CN111783841A (en) Garbage classification method, system and medium based on transfer learning and model fusion
CN117830616A (en) Remote sensing image unsupervised cross-domain target detection method based on progressive pseudo tag
CN112052817A (en) Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning
CN109543585A (en) Underwater optics object detection and recognition method based on convolutional neural networks
CN113343989B (en) Target detection method and system based on self-adaption of foreground selection domain
CN116977710A (en) Remote sensing image long tail distribution target semi-supervised detection method
CN110852358A (en) Vehicle type distinguishing method based on deep learning
CN114842343A (en) ViT-based aerial image identification method
CN117152503A (en) Remote sensing image cross-domain small sample classification method based on false tag uncertainty perception
CN116342942A (en) Cross-domain target detection method based on multistage domain adaptation weak supervision learning
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN113052215A (en) Sonar image automatic target identification method based on neural network visualization
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN116912568A (en) Noise-containing label image recognition method based on self-adaptive class equalization
CN117152606A (en) Confidence dynamic learning-based remote sensing image cross-domain small sample classification method
CN113808123B (en) Dynamic detection method for liquid medicine bag based on machine vision
CN114549909A (en) Pseudo label remote sensing image scene classification method based on self-adaptive threshold
CN113505712B (en) Sea surface oil spill detection method of convolutional neural network based on quasi-balance loss function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination