CN117789039A

CN117789039A - Remote sensing image target detection method based on context information distinguishing and utilizing

Info

Publication number: CN117789039A
Application number: CN202410213682.8A
Authority: CN
Inventors: 王永成; 张玉溪
Original assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Current assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date: 2024-02-27
Filing date: 2024-02-27
Publication date: 2024-03-29
Anticipated expiration: 2044-02-27
Also published as: CN117789039B

Abstract

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing. Comprising the following steps: s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map; s2: constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network; s3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network; s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result. The invention improves the detection accuracy of the ground object target of the remote sensing image.

Description

Remote sensing image target detection method based on context information distinguishing and utilizing

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image target detection method based on context information distinguishing and utilizing.

Background

The ground surface space background of the remote sensing image is wide and complex, a large amount of information is contained, the ground object targets in the wide background contain less information relative to the surrounding environment, the characteristic performance is poor, the remote sensing image is easily interfered by environmental factors such as illumination intensity, weather and the like, the image quality difference is large, and the detection difficulty of the ground object targets is large. In order to increase the effective information of detecting the ground object target, reduce the uncertainty of the ground object target, improve the detection precision, many scholars research the contribution of the context information to the feature expression and the target detection. In some cases, visual objects will often appear in a particular environment, sometimes with other related objects, i.e. the context information has a complementary effect on the object information. However, the complex spatial pattern of the remote sensing image formed by the intersection of the ground object target and the ground space background can result in the target object being submerged in the background, so not all the contextual information is helpful for detection. Context information that is too similar to the target area can bring information noise, weakening the characteristic expression capability of the target object. Therefore, how to fully and reasonably utilize the context information in the complex background of the remote sensing image to provide help for detecting the ground object target is a critical problem to be solved urgently.

Disclosure of Invention

The invention provides a remote sensing image target detection method based on context information distinguishing utilization, which aims to solve the defect that the prior art cannot reasonably utilize context information in a complex background in a remote sensing image, so that the context information cannot be effectively assisted in detection of a ground object target.

The invention provides a remote sensing image target detection method based on context information distinguishing and utilizing, which specifically comprises the following steps:

s1: acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map;

s2: constructing a two-stage target detection network based on the distinguishing and utilizing of the context information, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on the supplementing of the context information and a detection module based on the suppressing of the context information;

s3: constructing an overall loss function, and training a two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;

s4: inputting the image to be detected into a trained two-stage target detection network for detection, and obtaining a final detection result.

Preferably, the backbone network adopts ConvNeXt network, the neck network adopts FPN network, and the first stage detection network adopts RPN network.

Preferably, the step S3 specifically includes the following steps:

s31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;

s32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target marking frame and a high-similarity target marking frame on the multi-scale feature map according to a similarity evaluation result;

s33: taking an image area marked by a low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by using a context information supplementing-based detection module to obtain a first detection value;

s34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value;

s35: and performing non-maximum suppression on the first detection value and the second detection value to obtain a final detection result.

Preferably, the step S32 specifically includes the following steps:

s321: the average gray level of the target advice region and the context region is calculated, and the luminance similarity L of the target advice region and the context region is calculated by the following formula:

（1）；

wherein,for the gray average value of the target advice region, +.>As the gray average value of the context area,σis a minimum value for avoiding denominator 0;

s322: the contrast similarity D of the target suggested region and the context region is calculated by:

（2）；

（3）；

（4）；

wherein,suggesting contrast of area for target, +.>For the contrast of the context area, +.>Suggesting the number of all pixels of the area for the target, +.>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yvalues for pixel points within the context area;

s323: the smoothness similarity P of the target suggested region and the context region is calculated by:

（5）；

（6）；

（7）；

wherein,suggesting contrast of area for target, +.>Contrast for the context region;

s324: the texture feature similarity T of the target suggestion region and the context region is calculated by:

（8）；

（9）；

wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region,chi-square distance of LBP characteristic histogram for target suggestion region and context region;

s325: based on the luminance similarity, contrast similarity, smoothness similarity, and texture feature similarity, the overall similarity S of the target suggested region and the context region is calculated by:

（10）；

s326: respectively calculating probability density distribution of brightness similarity, contrast similarity, smoothness similarity and texture feature similarity, and correspondingly obtaining median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;

s327: taking the product of the median values of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity as a threshold value of the overall similarity;

s328: and constructing a high-similarity target annotation frame in an image area with overall similarity higher than a threshold value, and constructing a low-similarity target annotation frame in an image area with overall similarity lower than the threshold value.

Preferably, step S33 specifically includes the steps of:

s331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of each target to-be-identified area by one time to obtain context supplementing areas;

s332: inputting the target region to be identified and the context supplementing region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first characteristic vector and a second characteristic vector;

s333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full connection layer to obtain a third feature vector;

s334: and respectively inputting the third feature vector into the classification full-connection layer and the regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.

Preferably, the step S34 specifically includes the following steps:

s341: extracting feature map A on high-similarity target feature map ₁ And feature map B ₁ Feature map A ₁ Is 1/4 of the size of the input image, feature map B ₁ Is 1/8 of the size of the input image;

s342: map A of the characteristics ₁ And feature map B ₁ Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A ₂ And feature map B ₂ ；

S343: map A of the characteristics ₂ Up-sampling operation is carried out to obtain a feature diagram B ₁ Feature map A of the same size ₃ Map A of the characteristics ₃ And feature map B ₂ Adding and processing by a softmax function to obtain a significance mask map;

s344: mask map and feature map B of saliency ₁ Multiplying to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C;

s345: and respectively inputting the feature map C into a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.

Preferably, the first convolution sub-module and the second convolution sub-module are each comprised of concatenated 3*3 and 1*1 convolution layers.

Preferably, the calculation formula of the saliency mask map is:

（11）；

where M is a saliency mask map,for the convolution operation of the first convolution sub-module,f ₂ for the convolution operation of the second convolution sub-module, < >>Is a characteristic diagram A ₁ ，/>Is a characteristic diagram B ₁ U is the up-sampling operation,σas a softmax function.

Preferably, the overall loss function is:

（12）；

（13）；

（14）；

wherein,as the integral loss function, λ1, λ2, λ3 are loss balance coefficients, and λ1, λ2, λ3 are all set to 1,for the classification loss function, N is the number of positive samples, +.>Classification prediction value for the ith sample, < +.>Class label for the i-th sample, +.>For regression loss function->Equation is indicated in brackets for Ai Fosen, and i is positive sample, +.>Above 0, ai Fosen brackets indicate that the equation has a value of 1, otherwise a value of 0, +.>Is the position predictor of the i-th sample, < >>Position tag for the i-th sample, +.>、/>、/>The first stage detection loss, the second stage detection loss and the detection loss are respectively +.>Is a significant loss.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention provides an overall similarity evaluation formula, which evaluates the overall similarity degree of a target suggestion region and a context region by comprehensively considering the brightness, the contrast, the smoothness and the texture characteristics of the target suggestion region and the context region, thereby obtaining a target with low similarity to the context region and a target with high similarity to the context region.

(2) The invention provides a two-stage target detection network based on the differentiated utilization of context information, wherein in the second-stage detection network, context information is supplemented to a low-similarity target feature map through a detection module based on context information supplementation, and context information is restrained to a high-similarity target feature map through a detection module based on context information restraint, so that the full utilization of the context information is realized.

Drawings

Fig. 1 is a schematic flow chart of a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

FIG. 2 is a network block diagram of a two-phase object detection network based on context information differentiated exploitation according to an embodiment of the present invention;

FIG. 3 is a network block diagram of a detection module based on context information supplementation provided in accordance with an embodiment of the present invention;

FIG. 4 is a network block diagram of a detection module based on context information suppression provided in accordance with an embodiment of the present invention;

fig. 5 is a schematic diagram of a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a detection result of a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a result of detecting a UCAS-AOD dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.

According to the invention, through comprehensively considering brightness, contrast, smoothness and texture characteristics, an overall similarity evaluation formula is designed, a target area is expanded to generate a context area, then the overall similarity evaluation formula is utilized to evaluate the similarity of the target area and the context area, and a low-similarity marking frame and a high-similarity marking frame are obtained according to a similarity evaluation result. The invention also provides a two-stage target detection network based on the context information distinguishing and utilizing, and the context information supplementing is carried out on the low-similarity characteristic diagram by utilizing a detection module based on the context information supplementing, and the context information is restrained on the high-similarity characteristic diagram by utilizing a detection module based on the context information restraining, so that the full utilization of the context information is realized.

Fig. 1 illustrates a flow of a remote sensing image object detection method based on context information discrimination and utilization according to an embodiment of the present invention, fig. 2 illustrates a network structure of a two-stage object detection network based on context information discrimination and utilization according to an embodiment of the present invention, fig. 3 illustrates a network structure of a detection module based on context information supplementation according to an embodiment of the present invention, and fig. 4 illustrates a network structure of a detection module based on context information suppression according to an embodiment of the present invention.

As shown in fig. 1 to fig. 4, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention specifically includes the following steps:

s1: and acquiring an input image, and sequentially inputting the input image into a backbone network and a neck network for processing to obtain a multi-scale feature map.

The backbone network adopts ConvNeXt network, the neck network adopts FPN network, and the first stage detection network uses RPN network.

S2: and constructing a two-stage target detection network based on the context information distinguishing and utilizing, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression.

S3: and constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network.

The step S3 specifically comprises the following steps:

s31: inputting the multi-scale feature map into a trained first-stage detection network to perform convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region.

S32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to the similarity evaluation result.

The step S32 specifically includes the following steps:

（1）；

wherein,for the gray average value of the target advice region, +.>As the gray average value of the context area,σis a minimum value for avoiding a denominator of 0.

（2）；

（3）；

（4）；

wherein,suggesting contrast of area for target, +.>For the contrast of the context area, +.>Suggesting the number of all pixels of the area for the target, +.>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yis the value of the pixel point within the context area.

（5）；

（6）；

（7）；

wherein,suggesting contrast of area for target, +.>Is the contrast of the context area.

（8）；

（9）；

wherein X is the LBP characteristic histogram of the target advice region, Y is the LBP characteristic histogram of the context region,chi-square distance of LBP feature histograms for the target suggestion region and the context region.

（10）；

s326: and respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining the median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity.

S327: the product of the median of the luminance similarity, contrast similarity, smoothness similarity and texture feature similarity is taken as the threshold for overall similarity.

S33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature map as a low-similarity target feature map, and supplementing context information to the low-similarity target feature map by using a context information supplementing-based detection module to obtain a first detection value.

The step S33 specifically includes the following steps:

s331: resampling the low-similarity target feature map to obtain target to-be-identified areas, and expanding the length and width of the target to-be-identified areas by one time to obtain context supplementing areas.

S332: and respectively inputting the target region to be identified and the context supplementing region into the first full-connection layer and the second full-connection layer to correspondingly obtain a first characteristic vector and a second characteristic vector.

S333: and adding the first feature vector and the second feature vector, and then processing the added first feature vector and the second feature vector through a third full connection layer to obtain a third feature vector.

S34: and taking the image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and inhibiting the context information of the high-similarity target feature image by using a detection module based on the context information inhibition to obtain a second detection value.

The step S34 specifically includes the following steps:

s341: extracting feature map A on high-similarity target feature map ₁ And feature map B ₁ Feature map A ₁ Is 1/4 of the size of the input image, feature map B ₁ Is 1/8 of the size of the input image.

S342: map A of the characteristics ₁ And feature map B ₁ Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A ₂ And feature map B ₂ 。

The first convolution sub-module and the second convolution sub-module are each comprised of a concatenated 3*3 convolution layer and 1*1 convolution layer.

S343: map A of the characteristics ₂ Up-sampling operation is carried out to obtain a feature diagram B ₁ Feature map A of the same size ₃ Map A of the characteristics ₃ And feature map B ₂ After addition and processing by a softmax function, a saliency mask map is obtained.

S344: mask map and feature map B of saliency ₁ Multiplying to obtain a saliency feature map, and resampling the saliency feature map to obtain a feature map C.

The calculation formula of the saliency mask map is as follows:

（11）；

Map A of the characteristics ₃ And feature map B ₂ And adding and processing by a softmax function, namely, suppressing the context information through a mask of a pixel level, highlighting a target area, weakening the influence of surrounding environment information which is easily confused with a target on detection, and thus realizing the improvement of the detection effect.

The overall loss function proposed by the embodiment of the invention consists of detection loss and pixel-level significance loss.

And training the two-stage target detection network based on the context information distinguishing and utilizing by utilizing the integral loss function until the preset iteration times are reached or convergence is carried out.

The overall loss function is:

（12）；

（13）；

（14）；

Fig. 5 shows a result of detecting a DOTA dataset by using a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 5, the remote sensing image object detection method based on the context information discrimination and utilization provided by the embodiment of the present invention should detect the following 15 kinds of objects, including an airplane (PL), a baseball field, a bridge, a sports field, a small vehicle, a large vehicle, a ship, a tennis court, a basketball court, a storage tank, a soccer field, a ring island, a port, a swimming pool, and a helicopter. In the DOTA data set, although the background of the remote sensing image is wide and complex, the scale difference of the targets is large, and the targets have any directions, the remote sensing image target detection method based on the context information distinguishing and utilizing provided by the embodiment of the invention can accurately mark the position of each target by using the rotary rectangular frame in most scenes, and the visual result achieves a satisfactory effect.

Fig. 6 shows a result of detecting a DIOR-R dataset by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 6, the remote sensing image object detection method based on the context information discrimination provided by the embodiment of the present invention should detect 20 kinds of objects including an airplane, an airport, a baseball field, a basketball court, a bridge, a chimney, a dam, a highway service area, a highway toll gate, a golf course, a ground runway, a harbor, an Overpass (OP), a ship, a stadium, a storage tank, a tennis court, a train station, a vehicle, and a windmill. In the DIOR-R data set, although the target categories are various, the intra-category difference is large, the background is complex, and the detection difficulty is large, the remote sensing image target detection method based on the context information distinguishing and utilizing can still finish target detection based on a rotating frame with high quality, and the visual result achieves an ideal effect.

Fig. 7 shows a result of detecting UCAS-AOD datasets by a remote sensing image target detection method based on context information discrimination and utilization according to an embodiment of the present invention.

As shown in fig. 7, the remote sensing image target detection method based on the context information discrimination and utilization provided by the embodiment of the invention is used for detecting the automobile and the airplane in different scenes, and ideal visual results are obtained.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The remote sensing image target detection method based on the context information distinguishing and utilizing is characterized by comprising the following steps:

s1: acquiring an input image, and sequentially inputting the input image into a main network and a neck network for processing to obtain a multi-scale feature map;

s2: constructing a two-stage target detection network based on the context information distinguishing utilization, wherein the two-stage target detection network comprises a first-stage detection network and a second-stage detection network, and the second-stage detection network comprises a detection module based on context information supplement and a detection module based on context information suppression;

s3: constructing an overall loss function, and training the two-stage target detection network by utilizing the overall loss function to obtain a trained two-stage target detection network;

2. The method for detecting the target of the remote sensing image based on the context information distinguishing and utilizing according to claim 1, wherein the backbone network adopts a ConvNeXt network, the neck network adopts an FPN network, and the first-stage detection network adopts an RPN network.

3. The method for detecting the target of the remote sensing image based on the distinguishing and utilizing of the context information according to claim 1, wherein the step S3 specifically comprises the following steps:

s31: inputting the multi-scale feature map to a trained first-stage detection network for convolution operation to obtain a target suggestion region, and expanding the length and width of the target suggestion region by one time to obtain a context region;

s32: creating an overall similarity evaluation formula, performing similarity evaluation on the target suggestion region and the context region by using the overall similarity evaluation formula, and constructing a low-similarity target labeling frame and a high-similarity target labeling frame on the multi-scale feature map according to a similarity evaluation result;

s33: taking an image area marked by the low-similarity target marking frame on the multi-scale feature image as a low-similarity target feature image, and supplementing context information to the low-similarity target feature image by utilizing the context information supplementing-based detection module to obtain a first detection value;

s34: taking an image area marked by the high-similarity target marking frame on the multi-scale feature image as a high-similarity target feature image, and performing context information suppression on the high-similarity target feature image by using the context information suppression-based detection module to obtain a second detection value;

4. The method for detecting a target in a remote sensing image based on the differentiated use of context information according to claim 3, wherein the step S32 specifically comprises the steps of:

s321: calculating a gray average value of the target suggested area and the context area, and calculating a brightness similarity L of the target suggested area and the context area by the following formula:

（1）；

wherein,for the target suggested area gray average, for example>As a gray average value of the context area,σis a minimum value for avoiding denominator 0;

s322: calculating the contrast similarity D of the target suggested region and the context region by the following formula:

（2）；

（3）；

（4）；

wherein,suggesting a contrast of the area for said target, +.>For the contrast of the context area, < >>The number of all pixels of the proposed area for said target, -for>For the number of all pixels of the context area,xvalues for pixel points within the target suggested region,iandjrespectively the abscissa and the ordinate of the pixel point,yvalues for pixel points within the context area;

s323: calculating the smoothness similarity P of the target suggested region and the context region by the following formula:

（5）；

（6）；

（7）；

wherein,suggesting a contrast of the area for said target, +.>Contrast for the context region;

s324: calculating the texture feature similarity T of the target suggestion region and the context region by the following formula:

（8）；

（9）；

wherein X is the LBP characteristic histogram of the target suggested area, Y is the LBP characteristic histogram of the context area,chi-square distance of LBP characteristic histogram of the target suggestion region and the context region;

s325: calculating the overall similarity S of the target suggested region and the context region according to the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity by the following formula:

（10）；

s326: respectively calculating probability density distribution of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity, and correspondingly obtaining a median value of the brightness similarity, the contrast similarity, the smoothness similarity and the texture feature similarity;

s327: taking the product of the luminance similarity, the contrast similarity, the smoothness similarity and the median of the texture feature similarity as a threshold value of the overall similarity;

s328: and constructing a high-similarity target labeling frame in the image area with the overall similarity higher than the threshold value, and constructing a low-similarity target labeling frame in the image area with the overall similarity lower than the threshold value.

5. The method for detecting a target in a remote sensing image based on the differentiated use of context information according to claim 3, wherein the step S33 specifically comprises the steps of:

s331: resampling the low-similarity target feature map to obtain target areas to be identified, and expanding the length and width of the target areas to be identified by one time to obtain context supplement areas;

s332: inputting the target region to be identified and the context supplement region into a first full-connection layer and a second full-connection layer respectively, and correspondingly obtaining a first feature vector and a second feature vector;

s333: adding the first feature vector and the second feature vector, and then processing the added first feature vector and the added second feature vector through a third full-connection layer to obtain a third feature vector;

s334: and respectively inputting the third feature vector to a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, so as to obtain a first detection value.

6. The method for detecting a target in a remote sensing image based on the distinguishing and utilizing of context information according to claim 3, wherein the step S34 specifically comprises the steps of:

s341: extracting feature map A on the high-similarity target feature map ₁ And feature map B ₁ The characteristic diagram A ₁ Is 1/4 of the size of the input image, the feature map B ₁ Is 1/8 of the size of the input image;

s342: the characteristic diagram A is processed ₁ And the characteristic diagram B ₁ Respectively inputting the characteristic images into a first convolution sub-module and a second convolution sub-module to carry out convolution processing, and correspondingly obtaining characteristic images A ₂ And feature map B ₂ ；

S343: the characteristic diagram A is processed ₂ Up-sampling operation is carried out to obtain a characteristic diagram B ₁ Feature map A of the same size ₃ The characteristic diagram A is processed ₃ And the characteristic diagram B ₂ Adding and processing by a softmax function to obtain a significance mask map;

s344: masking the saliency map and the feature map B ₁ Multiplying to obtain a saliency feature map for the saliencyResampling the characteristic map to obtain a characteristic map C;

s345: and respectively inputting the feature map C to a classification full-connection layer and a regression full-connection layer to identify the category and the position of the target labeling frame, and obtaining a second detection value.

7. The method of claim 6, wherein the first convolution sub-module and the second convolution sub-module are each comprised of a concatenated 3*3 convolution layer and 1*1 convolution layer.

8. The method for detecting a target of a remote sensing image based on context information discrimination and utilization according to claim 6, wherein a calculation formula of the saliency mask map is:

（11）；

where M is a saliency mask map,for the convolution operation of the first convolution sub-module,f ₂ for the convolution operation of the second convolution sub-module, and (2)>For the characteristic diagram A ₁ ，/>For the characteristic diagram B ₁ U is the up-sampling operation,σas a softmax function.

9. The method for detecting a target of a remote sensing image based on the distinguishing and utilizing of context information according to claim 1, wherein the overall loss function is:

（12）；

（13）；

（14）；

wherein,as the integral loss function, λ1, λ2, λ3 are all loss balance coefficients, and λ1, λ2, λ3 are all set to 1, +.>For the classification loss function, N is the number of positive samples, +.>Classification prediction value for the ith sample, < +.>Class label for the i-th sample, +.>For regression loss function->Equation is indicated in brackets for Ai Fosen, and i is positive sample, +.>Above 0, said Ai Fosen brackets indicate that the equation has a value of 1, otherwise the value is 0,/->Is the position predictor of the i-th sample, < >>Position tag for the i-th sample, +.>、/>、/>The first stage detection loss, the second stage detection loss and the detection loss are respectively +.>Is a significant loss.