CN117789039A - Remote sensing image target detection method based on context information distinguishing and utilizing - Google Patents

Remote sensing image target detection method based on context information distinguishing and utilizing Download PDF

Info

Publication number
CN117789039A
CN117789039A CN202410213682.8A CN202410213682A CN117789039A CN 117789039 A CN117789039 A CN 117789039A CN 202410213682 A CN202410213682 A CN 202410213682A CN 117789039 A CN117789039 A CN 117789039A
Authority
CN
China
Prior art keywords
target
similarity
context
detection
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410213682.8A
Other languages
Chinese (zh)
Other versions
CN117789039B (en
Inventor
王永成
张玉溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Original Assignee
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Institute of Optics Fine Mechanics and Physics of CAS filed Critical Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority to CN202410213682.8A priority Critical patent/CN117789039B/en
Publication of CN117789039A publication Critical patent/CN117789039A/en
Application granted granted Critical
Publication of CN117789039B publication Critical patent/CN117789039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing. Comprising the following steps: s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map; s2: constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network; s3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network; s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result. The invention improves the detection accuracy of the ground object target of the remote sensing image.

Description

Remote sensing image target detection method based on context information distinguishing and utilizing
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing.
Background
The ground surface space background of the remote sensing image is wide and complex, a large amount of information is contained, the ground object targets in the wide background contain less information relative to the surrounding environment, the characteristic performance is poor, the remote sensing image is easily interfered by environmental factors such as illumination intensity, weather and the like, the image quality difference is large, and the detection difficulty of the ground object targets is large. In order to increase the effective information of detecting the ground object target, reduce the uncertainty of the ground object target, improve the detection precision, many scholars research the contribution of the context information to the feature expression and the target detection. In some cases, visual objects will often appear in a particular environment, sometimes with other related objects, i.e. the context information has a complementary effect on the object information. However, the complex spatial pattern of the remote sensing image formed by the intersection of the ground object target and the ground space background can result in the target object being submerged in the background, so not all the contextual information is helpful for detection. Context information that is too similar to the target area can bring information noise, weakening the characteristic expression capability of the target object. Therefore, how to fully and reasonably utilize the context information in the complex background of the remote sensing image to provide help for detecting the ground object target is a critical problem to be solved urgently.
Disclosure of Invention
The invention provides a remote sensing image target detection method based on context information distinguishing utilization, which aims to solve the defect that the prior art cannot reasonably utilize context information in a complex background in a remote sensing image, so that the context information cannot be effectively assisted in detection of a ground object target.
The invention provides a remote sensing image target detection method based on context information distinguishing and utilizing, which specifically comprises the following steps:
s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map;
s2: constructing a two-stage target detection network based on the distinguishing and utilizing of the context information, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on the supplementing of the context information and a detection module based on the suppressing of the context information;
s3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;
s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result.
Preferably, the backbone network adopts ConvNeXt network, the neck network adopts FPN network, and the first stage detection network adopts RPN network.
Preferably, the step S3 specifically includes the following steps:
s31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;
s32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target marking frame and a high-similarity target marking frame on the multi-scale feature map according to a similarity evaluation result;
s33: taking an image area marked by a low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by using a context information supplementing-based detection module to obtain a first detection value;
s34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value;
s35: and performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result.
Preferably, the step S32 specifically includes the following steps:
s321: the average gray level of the target advice region and the context region is calculated, and the luminance similarity L of the target advice region and the context region is calculated by the following formula:
(1);
wherein,for the gray average value of the target advice region, +.>As the gray average value of the context area,σis a minimum value for avoiding denominator 0;
s322: the contrast similarity D of the target suggested region and the context region is calculated by:
(2);
(3);
(4);
wherein,suggesting contrast of area for target, +.>For the contrast of the context area, +.>Suggesting the number of all pixels of the area for the target, +.>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yvalues for pixel points within the context area;
s323: the smoothness similarity P of the target suggested region and the context region is calculated by:
(5);
(6);
(7);
wherein,suggesting contrast of area for target, +.>Contrast for the context region;
s324: the texture feature similarity T of the target suggestion region and the context region is calculated by:
(8);
(9);
wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region,chi-square distance of LBP characteristic histogram for target suggestion region and context region;
s325: based on the luminance similarity, contrast similarity, smoothness similarity, and texture feature similarity, the overall similarity S of the target suggested region and the context region is calculated by:
(10);
s326: respectively calculating probability density distribution of brightness similarity, contrast similarity, smoothness similarity and texture feature similarity, and correspondingly obtaining median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;
s327: taking the product of the median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity as a threshold value of the overall similarity;
s328: and constructing a high-similarity target annotation frame in an image area with overall similarity higher than a threshold value, and constructing a low-similarity target annotation frame in an image area with overall similarity lower than the threshold value.
Preferably, step S33 specifically includes the steps of:
s331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of each target to-be-identified area by one time to obtain context supplementing areas;
s332: inputting the target region to be identified and the context supplementing region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first characteristic vector and a second characteristic vector;
s333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full connection layer to obtain a third feature vector;
s334: and respectively inputting the third feature vector into the classification full-connection layer and the regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.
Preferably, the step S34 specifically includes the following steps:
s341: extracting feature map A on high-similarity target feature map 1 And feature map B 1 Feature map A 1 Is 1/4 of the size of the input image, feature map B 1 Is 1/8 of the size of the input image;
s342: map A of the characteristics 1 And feature map B 1 Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A 2 And feature map B 2
S343: map A of the characteristics 2 Up-sampling operation is carried out to obtain a feature diagram B 1 Feature map A of the same size 3 Map A of the characteristics 3 And feature map B 2 Adding and processing by a softmax function to obtain a significance mask map;
s344: mask map and feature map B of saliency 1 Multiplying to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C;
s345: and respectively inputting the feature map C into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.
Preferably, the first convolution sub-module and the second convolution sub-module are each comprised of concatenated 3*3 and 1*1 convolution layers.
Preferably, the calculation formula of the saliency mask map is:
(11);
where M is a saliency mask map,for the convolution operation of the first convolution sub-module,f 2 for the convolution operation of the second convolution sub-module, < >>Is a characteristic diagram A 1 ,/>Is a characteristic diagram B 1 U is the up-sampling operation,σas a softmax function.
Preferably, the overall loss function is:
(12);
(13);
(14);
wherein,as the integral loss function, λ1, λ2, λ3 are loss balance coefficients, and λ1, λ2, λ3 are all set to 1,for the classification loss function, N is the number of positive samples, +.>Classification prediction value for the ith sample, < +.>Class label for the i-th sample, +.>For regression loss function->Equation is indicated in brackets for Ai Fosen, and i is positive sample, +.>Above 0, ai Fosen brackets indicate that the equation has a value of 1, otherwise a value of 0, +.>Is the position predictor of the i-th sample, < >>Position tag for the i-th sample, +.>、/>、/>The first stage detection loss, the second stage detection loss and the detection loss are respectively +.>Is a significant loss.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention provides an overall similarity evaluation formula, which evaluates the overall similarity degree of a target suggestion region and a context region by comprehensively considering the brightness, the contrast, the smoothness and the texture characteristics of the target suggestion region and the context region, thereby obtaining a target with low similarity to the context region and a target with high similarity to the context region.
(2) The invention provides a two-stage target detection network based on the differentiated utilization of context information, wherein in the second-stage detection network, context information is supplemented to a low-similarity target feature map through a detection module based on context information supplementation, and context information is restrained to a high-similarity target feature map through a detection module based on context information restraint, so that the full utilization of the context information is realized.
Drawings
Fig. 1 is a schematic flow chart of a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;
FIG. 2 is a network block diagram of a two-phase object detection network based on context information differentiated exploitation according to an embodiment of the present invention;
FIG. 3 is a network block diagram of a detection module based on context information supplementation provided in accordance with an embodiment of the present invention;
FIG. 4 is a network block diagram of a detection module based on context information suppression provided in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram of a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a detection result of a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a result of detecting a UCAS-AOD dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.
According to the invention, through comprehensively considering brightness, contrast, smoothness and texture characteristics, an overall similarity evaluation formula is designed, a target area is expanded to generate a context area, then the overall similarity evaluation formula is utilized to evaluate the similarity of the target area and the context area, and a low-similarity marking frame and a high-similarity marking frame are obtained according to a similarity evaluation result. The invention also provides a two-stage target detection network based on the context information distinguishing and utilizing, and the context information supplementing is carried out on the low-similarity characteristic diagram by utilizing a detection module based on the context information supplementing, and the context information is restrained on the high-similarity characteristic diagram by utilizing a detection module based on the context information restraining, so that the full utilization of the context information is realized.
Fig. 1 illustrates a flow of a remote sensing image object detection method based on context information discrimination and utilization according to an embodiment of the present invention, fig. 2 illustrates a network structure of a two-stage object detection network based on context information discrimination and utilization according to an embodiment of the present invention, fig. 3 illustrates a network structure of a detection module based on context information supplementation according to an embodiment of the present invention, and fig. 4 illustrates a network structure of a detection module based on context information suppression according to an embodiment of the present invention.
As shown in fig. 1 to fig. 4, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention specifically includes the following steps:
s1: and acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map.
The backbone network adopts ConvNeXt network, the neck network adopts FPN network, and the first stage detection network uses RPN network.
S2: and constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression.
S3: and constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network.
The step S3 specifically comprises the following steps:
s31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region.
S32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to the similarity evaluation result.
The step S32 specifically includes the following steps:
s321: the average gray level of the target advice region and the context region is calculated, and the luminance similarity L of the target advice region and the context region is calculated by the following formula:
(1);
wherein,for the gray average value of the target advice region, +.>As the gray average value of the context area,σis a minimum value for avoiding a denominator of 0.
S322: the contrast similarity D of the target suggested region and the context region is calculated by:
(2);
(3);
(4);
wherein,suggesting contrast of area for target, +.>For the contrast of the context area, +.>Suggesting the number of all pixels of the area for the target, +.>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yis the value of the pixel point within the context area.
S323: the smoothness similarity P of the target suggested region and the context region is calculated by:
(5);
(6);
(7);
wherein,suggesting contrast of area for target, +.>Is the contrast of the context area.
S324: the texture feature similarity T of the target suggestion region and the context region is calculated by:
(8);
(9);
wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region,chi-square distance of LBP feature histograms for the target suggestion region and the context region.
S325: based on the luminance similarity, contrast similarity, smoothness similarity, and texture feature similarity, the overall similarity S of the target suggested region and the context region is calculated by:
(10);
s326: and respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining the median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity.
S327: the product of the median of the luminance similarity, contrast similarity, smoothness similarity and texture feature similarity is taken as the threshold for overall similarity.
S328: and constructing a high-similarity target annotation frame in an image area with overall similarity higher than a threshold value, and constructing a low-similarity target annotation frame in an image area with overall similarity lower than the threshold value.
S33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature map as a low-similarity target feature map, and supplementing context information to the low-similarity target feature map by using a context information supplementing-based detection module to obtain a first detection value.
The step S33 specifically includes the following steps:
s331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of the target to-be-identified areas by one time to obtain context supplementing areas.
S332: and respectively inputting the target region to be identified and the context supplementing region into the first full-connection layer and the second full-connection layer to correspondingly obtain a first characteristic vector and a second characteristic vector.
S333: and adding the first feature vector and the second feature vector, and then processing the added first feature vector and the second feature vector through a third full connection layer to obtain a third feature vector.
S334: and respectively inputting the third feature vector into the classification full-connection layer and the regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.
S34: and taking the image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value.
The step S34 specifically includes the following steps:
s341: extracting feature map A on high-similarity target feature map 1 And feature map B 1 Feature map A 1 Is 1/4 of the size of the input image, feature map B 1 Is 1/8 of the size of the input image.
S342: map A of the characteristics 1 And feature map B 1 Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A 2 And feature map B 2
The first convolution sub-module and the second convolution sub-module are each comprised of a concatenated 3*3 convolution layer and 1*1 convolution layer.
S343: map A of the characteristics 2 Up-sampling operation is carried out to obtain a feature diagram B 1 Feature map A of the same size 3 Map A of the characteristics 3 And feature map B 2 After addition and processing by a softmax function, a saliency mask map is obtained.
S344: mask map and feature map B of saliency 1 Multiplying to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C.
S345: and respectively inputting the feature map C into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.
S35: and performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result.
S4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result.
The calculation formula of the saliency mask map is as follows:
(11);
where M is a saliency mask map,for the convolution operation of the first convolution sub-module,f 2 for the convolution operation of the second convolution sub-module, < >>Is a characteristic diagram A 1 ,/>Is a characteristic diagram B 1 U is the up-sampling operation,σas a softmax function.
Map A of the characteristics 3 And feature map B 2 And adding and processing by a softmax function, namely, suppressing the context information through a mask of a pixel level, highlighting a target area, weakening the influence of surrounding environment information which is easily confused with a target on detection, and thus realizing the improvement of the detection effect.
The overall loss function proposed by the embodiment of the invention consists of detection loss and pixel-level significance loss.
And training the two-stage target detection network based on the context information distinguishing and utilizing by utilizing the integral loss function until the preset iteration times are reached or convergence is carried out.
The overall loss function is:
(12);
(13);
(14);
wherein,as the integral loss function, λ1, λ2, λ3 are loss balance coefficients, and λ1, λ2, λ3 are all set to 1,for the classification loss function, N is the number of positive samples, +.>Classification prediction value for the ith sample, < +.>Class label for the i-th sample, +.>For regression loss function->Equation is indicated in brackets for Ai Fosen, and i is positive sample, +.>Above 0, ai Fosen brackets indicate that the equation has a value of 1, otherwise a value of 0, +.>Is the position predictor of the i-th sample, < >>Position tag for the i-th sample, +.>、/>、/>The first stage detection loss, the second stage detection loss and the detection loss are respectively +.>Is a significant loss.
Fig. 5 shows a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.
As shown in fig. 5, the remote sensing image object detection method based on the context information discrimination and utilization provided by the embodiment of the present invention should detect the following 15 kinds of objects, including an airplane (PL), a baseball field, a bridge, a sports field, a small vehicle, a large vehicle, a ship, a tennis court, a basketball court, a storage tank, a soccer field, a ring island, a port, a swimming pool, and a helicopter. In the DOTA data set, although the background of the remote sensing image is wide and complex, the scale difference of the targets is large, and the targets have any directions, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention can accurately mark the position of each target by using the rotary rectangular frame in most scenes, and the visual result achieves a satisfactory effect.
Fig. 6 shows a result of detecting a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.
As shown in fig. 6, the remote sensing image object detection method based on the context information discrimination provided by the embodiment of the present invention should detect 20 kinds of objects including an airplane, an airport, a baseball field, a basketball court, a bridge, a chimney, a dam, a highway service area, a highway toll gate, a golf course, a ground runway, a harbor, an Overpass (OP), a ship, a stadium, a storage tank, a tennis court, a train station, a vehicle, and a windmill. In the DIOR-R data set, although the target categories are various, the intra-category difference is large, the background is complex, and the detection difficulty is large, the remote sensing image target detection method based on the context information distinguishing and utilizing can still finish target detection based on a rotating frame with high quality, and the visual result achieves an ideal effect.
Fig. 7 shows a result of detecting UCAS-AOD datasets by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.
As shown in fig. 7, the remote sensing image target detection method based on the context information discrimination and utilization provided by the embodiment of the invention is used for detecting the automobile and the airplane in different scenes, and ideal visual results are obtained.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. The remote sensing image target detection method based on the context information distinguishing and utilizing is characterized by comprising the following steps:
s1: acquiring an input image, and sequentially inputting the input image into a main network and a neck network for processing to obtain a multi-scale feature map;
s2: constructing a two-stage target detection network based on the context information distinguishing utilization, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression;
s3: constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;
s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result.
2. The method for detecting the target of the remote sensing image based on the context information distinguishing and utilizing according to claim 1, wherein the backbone network adopts a ConvNeXt network, the neck network adopts an FPN network, and the first-stage detection network adopts an RPN network.
3. The method for detecting the target of the remote sensing image based on the distinguishing and utilizing of the context information according to claim 1, wherein the step S3 specifically comprises the following steps:
s31: inputting the multi-scale feature map to a trained first-stage detection network for convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;
s32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to a similarity evaluation result;
s33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by utilizing the context information supplementing-based detection module to obtain a first detection value;
s34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and performing context information suppression on the high-similarity target feature image by using the context information suppression-based detection module to obtain a second detection value;
s35: and performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result.
4. The method for detecting a target in a remote sensing image based on the differentiated use of context information according to claim 3, wherein the step S32 specifically comprises the steps of:
s321: calculating a gray average value of the target suggested area and the context area, and calculating a brightness similarity L of the target suggested area and the context area by the following formula:
(1);
wherein,for the target suggested area gray average, for example>As a gray average value of the context area,σis a minimum value for avoiding denominator 0;
s322: calculating the contrast similarity D of the target suggested region and the context region by the following formula:
(2);
(3);
(4);
wherein,suggesting a contrast of the area for said target, +.>For the contrast of the context area, < >>The number of all pixels of the proposed area for said target, -for>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yvalues for pixel points within the context area;
s323: calculating the smoothness similarity P of the target suggested region and the context region by the following formula:
(5);
(6);
(7);
wherein,suggesting a contrast of the area for said target, +.>Contrast for the context region;
s324: calculating the texture feature similarity T of the target suggestion region and the context region by the following formula:
(8);
(9);
wherein X is the LBP characteristic histogram of the target suggested area, Y is the LBP characteristic histogram of the context area,chi-square distance of LBP characteristic histogram of the target suggestion region and the context region;
s325: calculating the overall similarity S of the target suggested region and the context region according to the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity by the following formula:
(10);
s326: respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining a median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;
s327: taking the product of the luminance similarity, the contrast similarity, the smoothness similarity and the median of the texture feature similarity as a threshold value of the overall similarity;
s328: and constructing a high-similarity target labeling frame in the image area with the overall similarity higher than the threshold value, and constructing a low-similarity target labeling frame in the image area with the overall similarity lower than the threshold value.
5. The method for detecting a target in a remote sensing image based on the differentiated use of context information according to claim 3, wherein the step S33 specifically comprises the steps of:
s331: resampling the low-similarity target feature map to obtain target areas to be identified, and expanding the length and width of the target areas to be identified by one time to obtain context supplement areas;
s332: inputting the target region to be identified and the context supplement region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first feature vector and a second feature vector;
s333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full-connection layer to obtain a third feature vector;
s334: and respectively inputting the third feature vector to a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.
6. The method for detecting a target in a remote sensing image based on the distinguishing and utilizing of context information according to claim 3, wherein the step S34 specifically comprises the steps of:
s341: extracting feature map A on the high-similarity target feature map 1 And feature map B 1 The characteristic diagram A 1 Is 1/4 of the size of the input image, the feature map B 1 Is 1/8 of the size of the input image;
s342: the characteristic diagram A is processed 1 And the characteristic diagram B 1 Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A 2 And feature map B 2
S343: the characteristic diagram A is processed 2 Up-sampling operation is carried out to obtain a characteristic diagram B 1 Feature map A of the same size 3 The characteristic diagram A is processed 3 And the characteristic diagram B 2 Adding and processing by a softmax function to obtain a significance mask map;
s344: masking the saliency map and the feature map B 1 Multiplying to obtain a saliency feature map for the saliencyResampling the characteristic map to obtain a characteristic map C;
s345: and respectively inputting the feature map C to a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.
7. The method of claim 6, wherein the first convolution sub-module and the second convolution sub-module are each comprised of a concatenated 3*3 convolution layer and 1*1 convolution layer.
8. The method for detecting a target of a remote sensing image based on context information discrimination and utilization according to claim 6, wherein a calculation formula of the saliency mask map is:
(11);
where M is a saliency mask map,for the convolution operation of the first convolution sub-module,f 2 for the convolution operation of the second convolution sub-module, and (2)>For the characteristic diagram A 1 ,/>For the characteristic diagram B 1 U is the up-sampling operation,σas a softmax function.
9. The method for detecting a target of a remote sensing image based on the distinguishing and utilizing of context information according to claim 1, wherein the overall loss function is:
(12);
(13);
(14);
wherein,as the integral loss function, λ1, λ2, λ3 are all loss balance coefficients, and λ1, λ2, λ3 are all set to 1, +.>For the classification loss function, N is the number of positive samples, +.>Classification prediction value for the ith sample, < +.>Class label for the i-th sample, +.>For regression loss function->Equation is indicated in brackets for Ai Fosen, and i is positive sample, +.>Above 0, said Ai Fosen brackets indicate that the equation has a value of 1, otherwise the value is 0,/->Is the position predictor of the i-th sample, < >>Position tag for the i-th sample, +.>、/>、/>The first stage detection loss, the second stage detection loss and the detection loss are respectively +.>Is a significant loss.
CN202410213682.8A 2024-02-27 2024-02-27 Remote sensing image target detection method based on context information distinguishing and utilizing Active CN117789039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410213682.8A CN117789039B (en) 2024-02-27 2024-02-27 Remote sensing image target detection method based on context information distinguishing and utilizing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410213682.8A CN117789039B (en) 2024-02-27 2024-02-27 Remote sensing image target detection method based on context information distinguishing and utilizing

Publications (2)

Publication Number Publication Date
CN117789039A true CN117789039A (en) 2024-03-29
CN117789039B CN117789039B (en) 2024-05-28

Family

ID=90396727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410213682.8A Active CN117789039B (en) 2024-02-27 2024-02-27 Remote sensing image target detection method based on context information distinguishing and utilizing

Country Status (1)

Country Link
CN (1) CN117789039B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766108A (en) * 2021-01-08 2021-05-07 西安电子科技大学 SAR image target detection method based on context information
CN116188983A (en) * 2023-02-27 2023-05-30 中国科学院长春光学精密机械与物理研究所 Target detection method, device, equipment and storage medium based on remote sensing image
US20230184927A1 (en) * 2021-12-15 2023-06-15 Anhui University Contextual visual-based sar target detection method and apparatus, and storage medium
CN117079139A (en) * 2023-10-11 2023-11-17 耕宇牧星(北京)空间科技有限公司 Remote sensing image target detection method and system based on multi-scale semantic features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766108A (en) * 2021-01-08 2021-05-07 西安电子科技大学 SAR image target detection method based on context information
US20230184927A1 (en) * 2021-12-15 2023-06-15 Anhui University Contextual visual-based sar target detection method and apparatus, and storage medium
CN116188983A (en) * 2023-02-27 2023-05-30 中国科学院长春光学精密机械与物理研究所 Target detection method, device, equipment and storage medium based on remote sensing image
CN117079139A (en) * 2023-10-11 2023-11-17 耕宇牧星(北京)空间科技有限公司 Remote sensing image target detection method and system based on multi-scale semantic features

Also Published As

Publication number Publication date
CN117789039B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
CN111507271B (en) Airborne photoelectric video target intelligent detection and identification method
CN113362329B (en) Method for training focus detection model and method for recognizing focus in image
CN110599537A (en) Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system
Kang et al. Extended random walker for shadow detection in very high resolution remote sensing images
CN111709416B (en) License plate positioning method, device, system and storage medium
CN106683119B (en) Moving vehicle detection method based on aerial video image
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN110569738B (en) Natural scene text detection method, equipment and medium based on densely connected network
CN112288008B (en) Mosaic multispectral image disguised target detection method based on deep learning
CN111368769A (en) Ship multi-target detection method based on improved anchor point frame generation model
CN116645592B (en) Crack detection method based on image processing and storage medium
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN114332644B (en) Large-view-field traffic density acquisition method based on video satellite data
CN111833353A (en) Hyperspectral target detection method based on image segmentation
Huang et al. A correlation context-driven method for sea fog detection in meteorological satellite imagery
CN114943888A (en) Sea surface small target detection method based on multi-scale information fusion, electronic equipment and computer readable medium
Zhao et al. An aircraft detection method based on improved mask R-CNN in remotely sensed imagery
CN112365508A (en) SAR remote sensing image water area segmentation method based on visual attention and residual error network
CN117789039B (en) Remote sensing image target detection method based on context information distinguishing and utilizing
CN116758411A (en) Ship small target detection method based on remote sensing image pixel-by-pixel processing
CN116385477A (en) Tower image registration method based on image segmentation
CN115761223A (en) Remote sensing image instance segmentation method by using data synthesis
CN114913504A (en) Vehicle target identification method of remote sensing image fused with self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant