CN117893561B

CN117893561B - Infrared tiny target detection algorithm based on local contrast computing method

Info

Publication number: CN117893561B
Application number: CN202410288616.7A
Authority: CN
Inventors: 刘晋源; 陈子航; 仲维; 姜智颖; 刘日升
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-06-07
Anticipated expiration: 2044-03-14
Also published as: CN117893561A

Abstract

The invention belongs to the field of image processing and computer vision, and relates to an infrared tiny target detection algorithm based on a local contrast computing method. The invention improves the existing local contrast method, provides a novel local contrast calculating method, designs an infrared small target detection method which is efficient and can be deployed on an edge platform, and provides a deformable attention module for separating obvious features and representative features and reducing the complexity of calculation by researching the local contrast characteristics. Furthermore, to aggregate global and local dependencies, the present invention proposes cross-aggregation that combines global processing modules and convolution modules with supervision of target edges. The invention can efficiently and rapidly realize the infrared tiny target detection function, and is an efficient and rapid infrared tiny target detection algorithm combining the advantages of the traditional local contrast method and the deep learning characteristic processing.

Description

Infrared tiny target detection algorithm based on local contrast computing method

Technical Field

The invention belongs to the field of image processing and computer vision, and relates to an infrared tiny target detection algorithm based on a local contrast computing method.

Background

With the continued advancement of computer vision technology, a vast amount of visual information is actively acquired, transmitted, and analyzed. One of the current focus of research is how to make computers efficiently process these video data. In the last century, infrared imaging systems have been widely used in the military field. Compared with a radar system, the infrared imaging system adopts a passive detection mode, so that the target can be detected and the concealment of the infrared imaging system can be kept. Compared with a visible light system, the infrared detection technology has the characteristics of strong penetrating power, long imaging distance, excellent anti-interference performance and the like, and plays an important role in early warning, detection and the like. With the development of the age, infrared systems are widely used in the non-military fields such as medical imaging, traffic management, marine search and rescue, and the like. In infrared imaging systems, the accuracy of target detection and tracking has been a key factor in system evaluation. The infrared tiny target detection is used as one of key technologies in the field, and can detect an abnormally small target in an early stage, thereby being beneficial to early warning of potential danger and taking corresponding countermeasures. Therefore, accurate detection of fine targets in infrared images is a popular problem to be solved in the current computer vision and image processing fields.

Compared with the target to be detected which is handled by the general target detection model, the infrared tiny target has a series of characteristics: background noise and clutter are more in the infrared image, so that low contrast ratio and low signal to noise ratio are caused, a target is easy to submerge in a complex background, and a general target detection model is difficult to effectively detect; because the infrared imaging system is far away from the target, the infrared target usually occupies more than ten pixels, sometimes even only one to two pixels, and the general target detection model is difficult to acquire information such as shape, texture, spatial structure and the like from the infrared imaging system for auxiliary detection; the types, shapes and sizes of different targets are greatly changed in different scenes and conditions, and the use of a general target detection model easily causes higher false alarm rate and reduces the detection performance of the model; because the application scene of the infrared imaging system generally requires real-time detection, the inference speed of the infrared fine target detection model is high. However, the current infrared fine target detection model mainly pursues high accuracy, but ignores real-time.

Thus, according to the above analysis, infrared fine target detection faces a number of challenges. On one hand, the thought of a general target detection model based on convolutional neural network and deep learning is difficult to be applied to infrared tiny target detection; on the other hand, the existing infrared tiny target model still has a plurality of defects in terms of target accuracy and reasoning instantaneity. Therefore, it is important to develop a real-time, efficient, infrared small target detection method that can be deployed on an edge platform.

Disclosure of Invention

The invention provides an infrared tiny target detection algorithm of a local contrast computing method, which is characterized in that a model is deployed on an edge computing platform, the edge computing platform acquires single-frame or batch infrared images by using infrared imaging equipment or remote uploading images, the infrared images are detected through the model, an infrared tiny target detection image is acquired, the detection image is subjected to post-processing, and finally the detection image is output to corresponding display equipment or returned to a remote platform.

The technical scheme of the invention is as follows:

an infrared tiny target detection algorithm of a local contrast computing method comprises the following steps:

1) The edge computing platform acquires single-frame or batch infrared images by using an infrared imaging device or a remote uploading image, and preprocesses the infrared images.

2) Firstly, the model calculates an edge characteristic diagram of the infrared image, and then the infrared image and the edge characteristic diagram are respectively put into a main branch and an edge branch of the model. In the main branch, the infrared image features sequentially pass through the encoder part and the decoder part to obtain an image main branch feature map.

For the encoder part, the method comprises a basic downsampling processing module, a plurality of convolution processing modules based on a local contrast computing method and a downsampling global information processing module, wherein an infrared image firstly generates infrared image representation features through the basic downsampling processing module, then the infrared image representation features are processed through the convolution processing modules, the first layer convolution processing module performs downsampling and enhancement processing on the infrared image representation features through the convolution processing modules based on the local contrast computing method to obtain lower-level features, the second layer convolution processing module processes the upper-layer lower-level features to obtain lower-level features, meanwhile downsampling is performed on the infrared image representation features, and the global image features are obtained by combining the infrared image representation features with the image features of the convolution processing module part.

And the decoder part comprises a plurality of decoding modules and a feature fusion module, wherein the first layer decoding module carries out up-sampling on the global features by using a deconvolution module to obtain high-level features, the second layer decoding module above carries out processing on the high-level features of the upper layer to obtain the high-level features of the lower layer, and then the low-level features of the convolution processing module of the same layer in the coding module and the current high-level features are fused by the feature fusion module to obtain refined high-level features. Finally, the image main branch feature map is obtained through the series of decoding modules.

3) In the edge branching, image edge characteristics are embedded into a representation processing module and a plurality of edge processing modules to obtain an image edge branching characteristic diagram. The specific process is as follows: the edge characteristic map firstly acquires edge representation characteristics through an embedded representation processing module, then the edge representation characteristics are processed through an edge processing module, wherein each layer of edge representation characteristics or refined edge characteristics of the upper layer are processed through a convolution layer and are combined with the same-layer advanced characteristics in the main branch, and the refined edge characteristics are acquired through a gating module. And finally, outputting an image edge branch characteristic diagram through the thinned edge characteristic.

4) And finally, combining the feature images output by the main branch and the edge branch by the model, generating a final infrared tiny target detection image by the detection module, performing post-processing on the detection image, and outputting the processed single or batch detection images to corresponding display equipment or returning to a remote platform.

The invention has the beneficial effects that: the invention provides an infrared small target detection method based on a local contrast calculation method, which is designed, and the background area is determined according to the current image space characteristic information, so that the background and the target are separated more accurately, and the model overcomes the difficulties of dim infrared small target and low signal-to-noise ratio, so that the effect on the local contrast is enhanced, and the performance of the model is improved. Meanwhile, the calculated amount is ensured to be smaller, and the balance between the calculated amount and the performance of the model is realized. And the model is deployed to an edge platform, so that infrared small target detection under a real edge scene is realized.

Drawings

FIG. 1 is a basic flow chart of an infrared fine target detection algorithm deployed on an edge platform.

Fig. 2 is a detailed flow chart of the present invention.

Fig. 3 is a local contrast plot.

Fig. 4 is a partial contrast calculation explanatory diagram.

FIG. 5 is a graph of the effect detected on ISRSTD-1k dataset.

Detailed Description

The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.

The invention provides an infrared small target detection method based on a local contrast computing method, wherein the basic flow is shown in figure 1, and the method specifically comprises the following steps:

1) Overall flow of model: firstly, a model uses a Sobel operator to calculate an image edge characteristic x ₁₄ of an infrared image x ₁, then the infrared image x ₁ and the image edge characteristic x ₁₄ are respectively put into a main branch and an edge branch, a characteristic diagram output by the main branch and the edge branch is obtained, and a final infrared tiny target detection diagram is generated by the two part characteristic diagrams, wherein the detailed flow is shown in figure 2.

2) The main process of the encoder is as follows: the infrared image feature x ₁ firstly obtains global image features through a basic downsampling processing module, a plurality of convolution processing modules based on a local contrast computing method and a downsampled global information processing module. The method comprises the following steps:

2-1) downsampling the image by three convolutional layers and one max-pooling layer for the base downsampling process module. The output characteristic x ₂ is defined as:

x₂＝F_max(Conv_stem(x₁))

Wherein Conv _stem (·) and F _max (·) represent three convolutional layers and the max-pooling layer, respectively. Then, the output characteristic x ₂ is subjected to target characteristic enhancement and nonlinear transformation based on a local contrast computing module, so that characteristics x ₃、x₄ and x ₅ with less noise and clutter are obtained.

2-2) Convolution processing based on a local contrast computing method, which comprises the following specific procedures: for image featuresThe method comprises the following steps of taking C as the characteristic channel number, taking H and W as the characteristic length and width, and taking two-layer convolution processing modules twice, wherein each convolution processing module comprises a convolution layer, a batch normalization layer, RELU layers and residual processing, and a channel attention layer and a spatial attention layer based on a local contrast computing method are further arranged in the latter layer convolution processing module, and are defined as follows:

y＝RELU(BN(Conv(x)))+x

z＝LC(CA(RELU(BN(Conv(y)))))+y

Wherein Conv (-), BN (-) and RELU (-) represent the convolution layer, batch normalization layer and RELU layer respectively, CA (-) and LC (-) are the channel attention layer and the spatial attention layer based on the local contrast computing method respectively, MLP (-) is a nonlinear process, sigma (-) is a Sigmoid function, For multiplication of element ranges, P _max (·) and P _avg (·) are the maximum pooling layer and the average pooling layer, and y is the intermediate feature processed by the first two-layer convolution processing module. Regarding LC (·), the specific method is as follows: which performs local contrast calculation by a deformable convolution module with predetermined convolution kernel parameters and convolution kernel size of 3 x 3, resulting in local contrast attention map/>And then, carrying out nonlinear processing after passing through two convolution layers, and finally adding local contrast to the feature F through a Sigmoid function to strengthen the target feature, namely:

D＝LC_DCN(F)

wherein LC _DCN (·) is the deformable convolution module with the predetermined convolution kernel parameters, and the intermediate result is obtained after the deformable convolution module Is a feature in each channel in D. min (·) is the minimum in each spatial position. The characteristic/>, which is enhanced by local contrast, is finally obtained through the processingThe convolution kernel parameters are set to calculate the local contrast, the main flow is shown in fig. 4, and the calculation process is as follows: first, the module obtains the score feature/>, through the original feature FThe specific method comprises the steps of respectively carrying out average value and maximum value calculation on the input characteristic F in the channel direction, then carrying out weighted calculation, wherein the weighted coefficient is a learning parameter, and then using a specific deformable convolution module for processing, wherein the specific calculation method comprises the following steps:

Wherein DCN _LC (·) is a specific deformable convolution module, specifically set to: the output channels are 4, the convolution kernel size is 3×3, and one direction is set in each of the 4 output channels, the diagonal weights in the direction are 1, and the rest area weights are 0. To ensure that the local contrast is calculated for 4 diagonal directions Ω= { (+, 0), (-, 0)), (++, (-, -), ((0, +), (0, -)), (-, +), (+, -)) and the center position in a 3x 3 two-dimensional space, as shown in particular in fig. 3. Meanwhile, for the purpose that both DCN _LC (S) and DCN _LC(S² are in the same position, the model can learn the offset and the modulation quantity and is obtained by calculating the original characteristic F only.

2-3) Regarding a global information processing module, firstly, the feature x ₂ is subjected to downsampling processing to obtain the feature x ₆, and then the feature x ₇ is obtained through the global information processing module, wherein the global information processing module is composed of a multi-head self-attention module and a multi-layer perceptron module, and the main process is as follows: assuming that the current model is built by L-layer modules together, for the current first-layer module, l= … L, the vector sequence z _l of the first layer is:

z′_l＝MSA(LN(z_l-1))+z_l-1

z_l＝MLP(LN(z′_l))+z′_l

Wherein z' _l is the intermediate calculation result, z _l-1 is the vector sequence of the first-1 layer, LN (·) is the layer normalization module, MSA (·) is the multi-head attention module, and MLP (·) is the multi-layer perceptron module. For the multi-head attention module, the detailed calculation method is as follows:

Y＝[Y₁,…,Y_H]A

Wherein the method comprises the steps of For inputting features,/>H is the number of attention heads, and is the output characteristic of the multi-head attention moduleThe feature transformation matrix, Y _h is the h attention head output feature, and d _h is the dimension length after feature transformation. /(I)Output characteristics of connection for H attention heads,/>To output a transformation matrix. For the multi-layer perceptron module, the calculation method is as follows:

Y＝(GELU(XW_in))W_out

Where W _in is the transform matrix of the hidden layer, W _out is the transform matrix of the output layer, and GELU (·) is a gaussian error linear unit.

2-4) The final encoder connects the feature x ₅ in the final convolution processing module with the feature x ₇ of the global information processing module and obtains the final global feature x ₉ through a convolution linear transformation module, namely:

x₉＝Conv_1×1(Cat(x₅,x₇))

Wherein Cat (·) is a characteristic connection operation, conv _1×1 (·) is a convolution linear transformation module, specifically a module in which two convolution layers are combined with a ReLU layer.

3) For the decoder part, the model first deconvolves the global feature x ₉, doubling the feature size, resulting in the advanced feature x ₁₀. The feature fusion module carries out feature fusion on the high-level feature x ₁₀ and the low-level feature x ₄ with the same size, and finally obtains refined feature x ₁₁ through processing of a primary convolution module. The specific flow is as follows:

y＝Fuse(DConv(x_high),x_low)

z＝RELU(BN(Conv(y)))+y

Wherein DConv (·) is deconvolution, fuse (·) is a feature fusion module, x _high and x _low are high-level features and low-level features, respectively. Regarding the feature fusion module, the procedure is described as follows:

x_row＝σ(F_r(F_b(x_low)))F_b(x_high)+F_b(x_high)

x_column＝σ(F_c(F_b(x_low)))F_b(x_high)+F_b(x_high)

y＝x_row+x_column

Wherein F _b (·) represents a bottleneck module, consisting of two convolution modules with convolution kernels of 1×1, for filtering high frequency noise. x _row and x _column are transverse and vertical features, and F _r (·) and F _c (·) represent transverse and vertical attention calculations, respectively, mainly by using a deformable convolution module with convolution kernels of 1×3 and 3×1 to perform calculations, thereby realizing the transverse and vertical attention calculations. Similarly, the low-level feature x ₃ and the high-level feature x ₁₂ are subjected to feature fusion module processing to obtain a feature x ₁₃, wherein the feature x ₁₃ is a main branch feature map.

4) The image edge feature x ₁₄ is first processed by the embedded representation processing module, then it passes through the edge processing module together with the advanced feature x ₉ from the main branch encoder to extract the edge feature x ₁₅. Similarly, the edge feature x ₁₅ and the advanced feature x ₁₁ extract the edge feature x ₁₆ through the edge processing module, and the edge feature x ₁₆ and the main branch feature map x ₁₃ extract the edge branch feature map x ₁₇ through the edge processing module; wherein the embedding means that the processing module is a convolution layer of 3 x 3 size.

The edge processing module mainly comprises two convolution processing modules of a space attention layer and a channel attention layer, and calculates a final result in a Taylor finite difference-based calculation mode, wherein the specific calculation method comprises the following steps:

b＝u_gate+a-3(a′-a)

a′＝SA(CA(RELU(BN(Conv(a)))))+a

u_gate＝Conv_gate(x_high)

Wherein SA (-) is a convolution operation with a spatial attention layer Conv _s (-) is a convolution operation in spatial attention, conv _gate (-) is a gated convolution operation, a is an input feature, a' is an intermediate feature processed by a first convolution processing module, and b is an output feature.

5) Regarding the final detection module, the model calculates the output feature x ₁₈ after edge feature refinement from the main branch feature map x ₁₃ and the edge branch feature map x ₁₇, which is specifically calculated as follows:

and finally, outputting the characteristic x ₁₈ through a segmentation head module, and finally outputting the target prediction graph. Meanwhile, the model also detects the target contour of the edge branching feature map x ₁₇ so as to enhance the performance of the model. The results of the model test on IRSTD-1k dataset are shown in FIG. 5, which shows that small infrared targets are accurately detected.

Claims

1. An infrared tiny target detection algorithm of a local contrast calculation method is characterized by comprising the following steps:

1) The edge computing platform acquires single-frame or batch infrared images by using infrared imaging equipment or remote uploading images, and pre-processes the infrared images;

2) Firstly, a model calculates an edge feature map for an infrared image, and then the infrared image and the edge feature map are respectively put into a main branch and an edge branch of the model; in the main branch, the infrared image characteristic sequentially passes through an encoder part and a decoder part to obtain an image main branch characteristic diagram;

For the encoder part, the method comprises a downsampling processing module, a plurality of convolution processing modules based on a local contrast computing method and a downsampling global information processing module, wherein an infrared image firstly generates infrared image representation features through the downsampling processing module, then the infrared image representation features are processed through the convolution processing modules, the first layer convolution processing module performs downsampling and enhancement processing on the infrared image representation features through the convolution processing modules based on the local contrast computing method to obtain lower-level features, the second layer convolution processing module processes the upper-layer lower-level features to obtain lower-level features, simultaneously downsamples the infrared image representation features, and combines the infrared image representation features with the image features of the convolution processing module part to obtain global image features;

For the decoder part, the method comprises a plurality of decoding modules and a feature fusion module, wherein a first layer decoding module carries out up-sampling on global features by using a deconvolution module to obtain advanced features, a second layer decoding module above carries out processing on the advanced features of the upper layer to obtain advanced features of the lower layer, and then the lower-level features of a convolution processing module of the same layer in the coding module and the current advanced features are fused by the feature fusion module to obtain refined advanced features; finally, obtaining an image main branch feature map through a series of decoding modules;

3) In the edge branching, the image edge characteristic obtains an image edge branching characteristic diagram through embedding a representation processing module and a plurality of edge processing modules, and the specific process is as follows: the edge characteristic map firstly acquires edge representing characteristics through an embedded representing processing module, then the edge representing characteristics are processed through an edge processing module, wherein each layer of edge representing characteristics or refined edge characteristics of the upper layer are processed through a convolution layer and are combined with the same-layer advanced characteristics in the main branch, and the refined edge characteristics are acquired through a gating module; finally, outputting an image edge branch feature map through the thinned edge features;

4) The model finally combines the feature images output by the main branch and the edge branch, generates a final infrared tiny target detection image by a detection module, carries out post-processing on the detection image, and outputs the processed single or batch detection image to corresponding display equipment or returns to a remote platform;

in the step 2), the main process of the encoder is as follows: firstly, the image feature x ₁ obtains global image features through a downsampling processing module, a plurality of convolution processing modules based on a local contrast computing method and a downsampled global information processing module; the method comprises the following steps:

2-1) downsampling the image by three convolution layers and a maximum pooling layer for the downsampling processing module; the output characteristic x ₂ is defined as:

x₂＝F_max(Conv_stem(x₁))

Wherein Conv _stem (·) and F _max (·) represent three convolutional layers and a max-pooling layer, respectively; then, the output characteristic x ₂ carries out target characteristic enhancement and nonlinear transformation based on a local contrast computing module to obtain characteristics x ₃、x₄ and x ₅ with less noise and clutter;

2-2) convolution processing based on a local contrast computing method, which comprises the following specific procedures: for image features The method comprises the following steps of taking C as the characteristic channel number, taking H and W as the characteristic length and width, and taking two-layer convolution processing modules twice, wherein each convolution processing module comprises a convolution layer, a batch normalization layer, RELU layers and residual processing, and a channel attention layer and a spatial attention layer based on a local contrast computing method are further arranged in the latter layer convolution processing module, and are defined as follows:

y＝RELU(BN(Conv(x)))+x

z＝LC(CA(RELU(BN(Conv(y)))))+y

Wherein Conv (-), BN (-) and RELU (-) represent the convolution layer, batch normalization layer and RELU layer respectively, CA (-) and LC (-) are the channel attention layer and the spatial attention layer based on the local contrast computing method respectively, MLP (-) is a nonlinear process, sigma (-) is a Sigmoid function, For multiplication of the element range, P _max (-) and P _avg (-) are the maximum pooling layer and the average pooling layer, and y is the intermediate feature processed by the two-layer convolution processing module in the first two times; regarding LC (·), the specific method is as follows: local contrast calculation by a deformable convolution module with predetermined convolution kernel parameters and convolution kernel size of 3×3, generates local contrast attention map/>And then, carrying out nonlinear processing after passing through two convolution layers, and finally adding local contrast to the feature F through a Sigmoid function to strengthen the target feature, namely:

D＝LC_DCN(F)

wherein LC _DCN (·) is the deformable convolution module with the predetermined convolution kernel parameters, and the intermediate result is obtained after the deformable convolution module Features in each channel in D; min (·) is the minimum in each spatial position; the characteristic/>, which is enhanced by local contrast, is finally obtained through the processingThe convolution kernel parameters are set to calculate the local contrast, the calculation process is as follows: first, a score feature/>, is obtained from the original feature FThe specific method comprises the steps of respectively carrying out average value and maximum value calculation on an input characteristic F in a channel direction, then carrying out weighted calculation once, wherein a weighting coefficient is a learning parameter, and then using a deformable convolution module for processing, wherein the specific calculation method comprises the following steps:

Wherein DCN _LC (·) is a deformable convolution module, specifically set to: the output channels are 4, the convolution kernel size is 3 multiplied by 3, one direction is set in each of the 4 output channels, the diagonal weights in the direction are 1, and the weights of the other areas are 0; to ensure that local contrast is calculated for 4 diagonal directions Ω= { (+, 0), (-, 0)), (++, (-, -), ((0, +), (0, -)), (-, +), (+, -)) and the center position in a3×3 two-dimensional space; meanwhile, for the purpose that the DCN _LC (S) and the DCN _LC(S² are all in the same position, the method can learn offset and modulation quantity and is obtained by calculating the original characteristic F;

2-3) regarding a global information processing module, firstly, performing downsampling processing on a feature x ₂ to obtain a feature x ₆, and then performing global information processing module to obtain a global feature x ₇, wherein the global information processing module consists of a multi-head self-attention module and a multi-layer perceptron module, and the main process is as follows: assuming that the current model is built by L layers together, for the current first layer module, l= … L, the vector sequence z _l of the first layer is:

z′_l＝MSA(LN(z_l-1))+z_l-1

z_l＝MLP(LN(z′_l))+z′_l

Wherein z' _l is an intermediate calculation result, z _l-1 is a vector sequence of a first-1 layer, LN (·) is a layer normalization module, MSA (·) is a multi-head attention module, and MLP (·) is a multi-layer perceptron module; for the multi-head attention module, the calculation method is as follows:

Y＝[Y₁,…,Y_H]A

Wherein the method comprises the steps of For inputting features,/>For the output characteristics of the multi-head attention module, H is the number of attention heads,The feature conversion matrix is respectively shown, Y _h is the h attention head output feature, and d _h is the dimension length after feature conversion; /(I)Output characteristics of connection for H attention heads,/>A conversion matrix is output; for the multi-layer perceptron module, the calculation method is as follows:

Y＝(GELU(XW_in))W_out

Wherein W _in is the conversion matrix of the hidden layer, W _out is the conversion matrix of the output layer, and GELU (°) is a Gaussian error linear unit;

x₉＝Conv_1×1(Cat(x₅,x₇))

Wherein Cat (·) is a feature join operation; conv _1×1 (·) is a convolution linear transformation module, which is a module combining two convolution layers with a ReLU layer.

2. The method for detecting an infrared fine object according to claim 1, wherein in the step 2), for the decoder, the model first processes the global feature x ₉ through deconvolution layer to double the feature size, thereby obtaining the advanced feature x ₁₀; the feature fusion module carries out feature fusion on the high-level feature x ₁₀ and the low-level feature x ₄ with the same size, and finally, the refined high-level feature x ₁₁ is obtained through processing by the primary convolution module; the specific flow is as follows:

y＝Fuse(DConv(x_high),x_low)

z＝RELU(BN(Conv(y)))+y

Wherein DConv (·) is a deconvolution layer, fuse (·) is a feature fusion module, x _high and x _low are high-level features and low-level features, respectively; regarding the feature fusion module, the process is described as follows:

x_row＝σ(F_r(F_b(x_low)))F_b(x_high)+F_b(x_high)

x_column＝σ(F_c(F_b(x_low)))F_b(x_high)+F_b(x_high)

y＝x_row+x_column

wherein F _b (·) represents a bottleneck module, which is composed of two convolution modules with convolution kernels of 1×1, and is used for filtering high-frequency noise; x _row and x _column are transverse and vertical features, and F _r (-) and F _c (-) respectively represent transverse and vertical attention calculations, and the main mode is to calculate by using a deformable convolution module with convolution kernels of 1 x 3 and 3 x 1 respectively, so as to realize the transverse and vertical attention calculations; the low-level feature x ₃ and the high-level feature x ₁₂ are processed through a feature fusion module to obtain a feature x ₁₃, wherein the feature x ₁₃ is a main branch feature map.

3. The infrared fine object detection algorithm of a local contrast computing method according to claim 1, wherein in the step 3), the image edge feature x ₁₄ is processed by the embedded representation processing module, and then passed through the edge processing module together with the advanced feature x ₉ from the main branch encoder to extract the edge feature x ₁₅; similarly, the edge feature x ₁₅ and the advanced feature x ₁₁ extract the edge feature x ₁₆ through the edge processing module, and the edge feature x ₁₆ and the main branch feature map x ₁₃ extract the edge branch feature map x ₁₇ through the edge processing module; wherein the embedded representation processing module is a convolution layer with the size of 3 x 3;

b＝u_gate+a-3(a′-a)

a′＝SA(CA(RELU(BN(Conv(a))))))+a

u_gate＝Conv_gate(x_high)

4. The method for detecting an infrared fine object according to claim 1, wherein in the step 4), regarding the detection module, the main branch feature map x ₁₃ and the edge branch feature map x ₁₇ are first modeled to calculate the output feature x ₁₈ with refined edge features, and the specific calculation method is as follows:

Finally, the output characteristic x ₁₈ passes through a segmentation head module to finally output a target prediction graph; meanwhile, the model also detects the target contour of the edge branching feature map x ₁₇ so as to enhance the performance of the model.