CN115272814B - Long-distance space self-adaptive multi-scale small target detection method - Google Patents

Long-distance space self-adaptive multi-scale small target detection method Download PDF

Info

Publication number
CN115272814B
CN115272814B CN202211188231.0A CN202211188231A CN115272814B CN 115272814 B CN115272814 B CN 115272814B CN 202211188231 A CN202211188231 A CN 202211188231A CN 115272814 B CN115272814 B CN 115272814B
Authority
CN
China
Prior art keywords
fusion
result
feature
layer
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211188231.0A
Other languages
Chinese (zh)
Other versions
CN115272814A (en
Inventor
甘胜丰
胡磊
刘世超
李露
闵高
雷维新
张仁
周蓓
徐朝玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Nanchang Institute of Technology
Original Assignee
Huazhong Agricultural University
Nanchang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University, Nanchang Institute of Technology filed Critical Huazhong Agricultural University
Priority to CN202211188231.0A priority Critical patent/CN115272814B/en
Publication of CN115272814A publication Critical patent/CN115272814A/en
Application granted granted Critical
Publication of CN115272814B publication Critical patent/CN115272814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote space self-adaptive multi-scale small target detection method, which comprises two stages: a multi-scale target detection model determining stage and a multi-scale target detection model predicting stage; in the multi-scale target detection model determining stage, a multi-scale target detection model structure corresponding to the type of task is obtained by carrying out data set analysis on different target detection tasks; in a multi-scale target detection model prediction stage, directly calling a corresponding structure through a corresponding target detection task type; and when the type of the detection task is unknown, obtaining a multi-scale target detection model structure of the corresponding detection task through an OSTU algorithm and a decision tree, and completing prediction. The invention has the beneficial effects that: the method can carry out various target detection in real time and in a self-adaptive manner, improves the universality of target detection and ensures the precision of target detection.

Description

Long-distance space self-adaptive multi-scale small target detection method
Technical Field
The invention relates to the field of image target detection, in particular to a remote space self-adaptive multi-scale small target detection method.
Background
The target detection is one of important tasks of computer vision, and under the drive of deep learning, a target detection model gradually becomes mature and stable, and is successfully applied to the fields of national defense safety, intelligent transportation, industrial automation and the like. At present, a general target detection model is optimized on a public data set, and the quality of the model is judged by using detection indexes of the public data set. However, in an actual application scenario, the difference between the scene data set and the common data set is large, and it is often necessary to make the model more efficient by adjusting the model.
For example, in the detection of large workpieces, the model has the characteristics of large area and small quantity of targets to be identified, and the targets can obtain a very good detection effect through the universal model.
For another example, aerial remote sensing images and unmanned aerial vehicle high-altitude images are generally shot from a height of hundreds of meters to nearly ten thousand meters, many targets in the images are small targets (dozens of or even several pixels), so that the target information amount is not large, the image view field is large (usually, the coverage range of several square kilometers) and the view field may contain various backgrounds, strong interference is generated on target detection, and the target is difficult to distinguish from the background or similar targets.
At present, common excellent target detection models include YOLOX, YOLOV5, fast RCNN, centeret and the like, a significant difference exists between the detection performance of a small target and a large target, the detection performance of the small target is usually only half of that of the large target, and the model is difficult to be applied to target detection in the field of remote sensing.
Disclosure of Invention
Aiming at the technical problems of poor universality, low precision and low efficiency of small target detection in aerial remote sensing images, the invention provides a remote space self-adaptive multi-scale small target detection method, which can adjust corresponding feature fusion structures and multi-scale detection heads to the characteristics of data sets of different detection targets to greatly optimize the model efficiency.
The method comprises two stages, which are respectively:
a multi-scale target detection model determining stage and a multi-scale target detection model predicting stage;
the multi-scale target detection model determining stage comprises the following processes:
s1, constructing a multi-scale target detection model, wherein the multi-scale target detection model comprises three parts, namely a two-layer feature fusion structure, a three-layer feature fusion structure and a four-layer feature fusion structure; the two-layer feature fusion structure, the three-layer feature fusion structure and the four-layer feature fusion structure are trained in advance;
s2, acquiring the type of a target detection task and a corresponding training set, and labeling targets needing to be detected in the training set by adopting a target boundary box to obtain the coordinate of the upper left corner of target information as (x 1 ,y 1 ) The coordinate of the lower right corner is (x 2 ,y 2 );
S3, calculating the ratio of the area of the target boundary frame to the area of the image: (x 2 -x 1 )*(y 2 -y 1 )/W*H(ii) a WhereinWHRespectively the width and the height of the images in the training set;
s4: the evolution of the ratio of the area of the target bounding box to the area of the image is less than a preset first threshold valuea 1 When the target is a small target; the evolution of the ratio of the area of the target boundary frame to the area of the image is larger than a preset second threshold valuea 2 When the target is a large target; the ratio of the area of the target bounding box to the image area is deriveda 1 And witha 2 In between, is a common target;
s5, determining a multi-scale target detection model structure by adopting a decision tree method, which specifically comprises the following steps:
calculating the occupation ratios of a large target, a small target and a common target respectivelyC 1C 2C 3 According to the occupation ratios of the targets and the set occupation ratio threshold, judging the adaptive structure of the multi-scale target detection model, which specifically comprises the following steps: when the ratio of the small target number exceeds the preset percentage of the whole datapThen, adjusting the multi-scale target detection model into a four-layer feature fusion structure; when the ratio of the large target quantity exceeds the preset percentagepWhen the multi-scale target detection model is adjusted to be a two-layer feature fusion structure, otherwise, the multi-scale target detection model is adjusted to be a two-layer feature fusion structure
The whole multi-scale target detection model is of a three-layer characteristic fusion structure;
a multi-scale target detection model prediction stage:
s6: acquiring target data to be predicted;
s7: if the target data to be predicted belong to the target detection task type of the multi-scale target detection model determining stage, calling a target detection model structure correspondingly determined by the multi-scale target detection model determining stage to directly predict to obtain a target prediction result;
s8: if the target data to be predicted does not belong to the target detection task type at the multi-scale target detection model determination stage, the target data to be predicted is processed by adopting an OSTU threshold segmentation method, the image is divided into a background part and a foreground part according to the gray characteristic of the image, the type of the target to be predicted is determined according to the ratio of the foreground target pixel value to the whole image pixel value, the proportion of various predicted targets is counted, the structure of the multi-scale target detection model is determined again according to the method in the step S5, and the corresponding structure is called to complete target detection.
The beneficial effects provided by the invention are as follows:
aiming at large target detection, the method is adjusted into two-layer scale prediction, the parameter quantity can be greatly reduced, the real-time detection of the edge equipment end is realized, and a novel feature fusion algorithm structure is provided, aiming at small target detection, the detection precision can be greatly improved by adding a little extra time overhead, and the method has extremely high application value to actual industrial scenes;
in addition, the method can be applied to the dynamic detection process, for example, along with the rise of the aerial photographing height of the unmanned aerial vehicle, the initial small target is a bicycle, and along with the increase of the aerial photographing height, the small target gradually changes into a house, namely, the small target in the method is a dynamic or relative concept;
finally, the method and the device can carry out various target detection in real time and in a self-adaptive mode, improve the universality of target detection and ensure the precision of target detection.
Drawings
FIG. 1 is a simple flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a two-layer feature fusion architecture;
FIG. 3 is a schematic diagram of a four-layer feature fusion architecture;
FIG. 4 is a diagram of context hopping connection feature fusion;
FIG. 5 is a schematic structural diagram of an SSHF receptive field superposition module;
FIG. 6 is a schematic representation of a data set after being subjected to a decision tree, which should be classified as one of the categories;
FIG. 7 is a schematic diagram of a decision result;
FIG. 8 is a diagram illustrating the effect of the OSTU threshold splitting algorithm;
FIG. 9 is a schematic diagram of a detailed process of the method of the present invention;
fig. 10 is a schematic diagram of the small object detection effect obtained by using a conventional object detection three-layer network structure;
fig. 11 is a schematic diagram of the small target detection effect obtained by using the improved four-layer feature fusion structure of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a simplified flow chart of the method of the present invention.
The invention provides a remote space self-adaptive multi-scale small target detection method, which comprises the following two stages:
a multi-scale target detection model determining stage and a multi-scale target detection model predicting stage;
the multi-scale target detection model determining stage comprises the following processes:
s1, constructing a multi-scale target detection model, wherein the multi-scale target detection model comprises three parts, namely a two-layer feature fusion structure, a three-layer feature fusion structure and a four-layer feature fusion structure; the two-layer feature fusion structure, the three-layer feature fusion structure and the four-layer feature fusion structure are trained in advance;
it should be noted that the multi-scale target detection model in the present application includes 3 parts, which are respectively a two-layer feature fusion structure, a three-layer feature fusion structure, and a four-layer feature fusion structure. The three different feature fusion structures are respectively used for large target detection, common target detection and small target detection;
three different configurations are set forth in sequence below.
Referring to fig. 2, fig. 2 is a schematic diagram of a two-layer feature fusion structure;
the two-layer feature fusion structure comprises: the system comprises a backbone network, a CA attention mechanism module, a two-layer feature fusion module and a decoupling output module;
the method comprises the steps that an input image is subjected to downsampling feature extraction through a backbone network, and two downsampling feature layers with different scales from shallow to deep are obtained and are respectively a first feature layer and a second feature layer;
the first characteristic layer and the second characteristic layer respectively pass through a CA attention mechanism module to obtain a first enhancement characteristic and a second enhancement characteristic;
the first enhanced feature and the second enhanced feature are subjected to feature fusion through a feature fusion module to obtain a fusion feature;
the fusion characteristics are processed by a decoupling output module to obtain a large target detection result.
Specifically, the two-layer feature fusion module includes: the device comprises a convolution unit, a transposition convolution unit, two Contact + CSPLAyer structures and a downsampling unit; the specific process of the feature fusion module for feature fusion is as follows:
after the second enhanced features sequentially pass through the convolution unit and the transposition convolution unit, the obtained first convolution result is fused with the first enhanced features to obtain a first fusion result;
the first fusion result is divided into two branches through a Contact + CSPLAyer structure, and one branch is directly decoupled through a decoupling output module to obtain first decoupling information; the other branch carries out down-sampling by a down-sampling unit to obtain a down-sampling characteristic of a first fusion result;
fusing a second convolution result obtained after the downsampling feature of the first fusion result and the second enhancement feature pass through a convolution unit to obtain a second fusion result;
after the second fusion result passes through another Contact + CSPLAyer structure, the second fusion result is directly processed by a decoupling output module to obtain second decoupling information;
and superposing the first decoupling information and the second decoupling information to obtain a large target detection result.
When a large target in a data set meets the condition of a decision tree, the two-layer network structure is started, because the highest layer in the common three-layer scale mainly aims at the detection of a smaller target, and the output of the three-layer scale is found to have a large amount of calculation redundancy on a large target detection task according to experiments, the use of the two-layer scale structure can reduce the large amount of calculation redundancy on the large target detection, thereby accelerating the reasoning speed.
As an example of the two-layer feature fusion structure, in the two-layer feature fusion structure, CSPDarknet53 is used as a main network for feature extraction, and the input image has a size of 640 × 640, in fig. 2, feat1 represents a feature layer (first feature layer) obtained by down-sampling the input image 16 times through the main network, the size is 40 × 40, and feat2 represents a feature layer (second feature layer) obtained by down-sampling the input image 32 times, and the size is 20 × 20.
FIG. 2 is a convolution unit for reducing the number of passes; the transposition convolution unit is used for expanding the width and the height of the characteristic layer, and the up-sampling operation is carried out by utilizing a convolution mode, so that the characteristic splicing of an upper layer and a lower layer is facilitated;
the Contact + CSPLAyer structure divides the original input into two branches, respectively carries out convolution operation to reduce the number of channels by half, then carries out a plurality of residual error structure operations on one branch, and then splices the two branches to enable the model to learn more characteristics;
the down-sampling unit is used for compressing the width and height of the characteristic layer by down-sampling operation;
the decoupling output module is used for outputting a decoupling head, and different convolution operations are used for respectively obtaining the output of the category, the confidence coefficient and the coordinate predicted value.
A lightweight CA attention mechanism is added behind feat1 and feat2, a CA attention mechanism module aims to enhance the expression capability of the learning characteristics of the mobile network, and can convert and change any intermediate characteristic tensor in the network and output the tensor with the same size.
The CA attention mechanism enables the lightweight network to pay attention in a larger area by embedding position information into channel attention, avoids generating a large amount of calculation overhead, can capture not only cross-channel information, but also information of direction perception and position perception, enables the model to pay more attention to the area of a target, reduces background interference, improves model precision, and does not change the width and height of a feature layer and the number of channels.
And (3) performing the feature fusion operation shown in fig. 2 after the two feature layers of feat1 and feat2 pass through the CA attention module, and then inputting the two feature layers obtained after feature fusion into a YoloHead to output a decoupling head so as to obtain prediction information.
The three-layer feature fusion structure adopts a PAFPN structure in a yolo network series;
in the present application, a three-layer feature fusion structure is used to detect objects of common size; as an example, the three-layer feature fusion structure in the present application is a PAFPN structure in yolo network series, which is a general structure of yolo series, and therefore, it is not described in too much detail here, and those skilled in the art can select other types of general structures according to the actual situation.
Referring to fig. 3, fig. 3 is a schematic diagram of a four-layer feature fusion structure;
the four-layer feature fusion structure comprises: the system comprises a backbone network, a CA attention mechanism module, a four-layer feature fusion module, an SSHF receptive field superposition module and a decoupling output module;
the method comprises the steps that an input image is subjected to downsampling feature extraction through a backbone network, and four downsampling feature layers with different scales from shallow to deep are obtained and are respectively a first feature layer, a second feature layer, a third feature layer and a fourth feature layer;
the first characteristic layer and the second characteristic layer respectively pass through a CA attention mechanism module to obtain a first enhancement characteristic and a second enhancement characteristic;
after feature fusion is carried out on the first enhanced feature, the second enhanced feature, the third feature layer and the fourth feature layer through a four-layer feature fusion module, a fusion feature is obtained;
the fusion characteristics are processed by an SSHF receptive field superposition module and a decoupling output module in sequence to obtain a small target detection result.
Specifically, the four-layer feature fusion module includes: the system comprises three convolution units, three transposition convolution units, six Contact + CSPLAyer structures and three down-sampling units; the four-layer feature fusion module adopts a context jump connection mechanism, and the specific fusion process is as follows:
the fourth feature layer passes through a first convolution unit to obtain a first convolution result; after the first convolution result passes through a first transposition convolution unit, a first transposition convolution result is obtained; the first conversion convolution result and the third characteristic layer obtain a first fusion result through a first Contact + CSP layer structure;
the first fusion result is divided into two branches, and in the first branch, the first fusion result passes through a second convolution unit to obtain a second convolution result; in the second branch, the first fusion result and the first convolution result are input into a third Contact + CSPLAyer structure together;
the second convolution result passes through a second transposition convolution unit to obtain a second transposition convolution result; the second transposition convolution result, the second enhancement feature and the first convolution result pass through a second Contact + CSPLAyer structure to obtain a second fusion result;
the second fusion result is divided into two branches; in the first branch, the second fusion result sequentially passes through a third convolution unit and a third transposition convolution unit to obtain a third transposition convolution result; in the other branch, the second fusion result is input into a fourth Contact + CSP layer structure;
the first enhanced feature, the third transposed convolution result, the first fusion result and the first convolution result pass through a third Contact + CSPLAyer structure to obtain a third fusion result;
the third fusion result is divided into two branches, and in the first branch, the third fusion result sequentially passes through the SSHF receptive field superposition module and the first decoupling output module to obtain first decoupling information; in the other branch, the third fusion result is subjected to first downsampling operation through a first downsampling unit, and the obtained first downsampling result is input into a fourth Contact + CSP layer structure;
the second fusion result and the first down-sampling result pass through a fourth Contact + CSPLAyer structure to obtain a fourth fusion result; the fourth fusion result is divided into two branches, and in the first branch, the fourth fusion result sequentially passes through the SSHF receptive field superposition module and the second decoupling output module to obtain second decoupling information; in the other branch, the fourth fusion result is subjected to second down-sampling operation through a second down-sampling unit, and the obtained second down-sampling result is input into a fifth Contact + CSP layer structure;
the second convolution result and the second down-sampling result pass through a fifth Contact + CSPLAyer structure to obtain a fifth fusion result; the fifth fusion result is divided into two branches, and in the first branch, the fifth fusion result passes through a third decoupling output module to obtain third decoupling information; in the other branch, the fifth fusion result is subjected to third down-sampling operation through a third down-sampling unit, and the obtained third down-sampling result is input into a sixth Contact + CSP layer structure;
the first convolution result and the third down-sampling result pass through a sixth Contact + CSPLAyer structure to obtain a sixth fusion result; the sixth fusion result passes through a fourth decoupling output module to obtain fourth decoupling information;
and superposing the first decoupling information, the second decoupling information, the third decoupling information and the fourth decoupling information to obtain a small target detection result.
As an embodiment of the four-layer feature fusion structure, the four-layer feature fusion structure capable of being used universally is specially designed for small target detection tasks, and the test effect on a data set of high-altitude images of the unmanned aerial vehicle is superior to that of an original structure.
The CSPDarknet53 is used for feature extraction of the trunk network, a feature layer with a smaller downsampling multiple is additionally extracted during feature extraction of the trunk network, compared with other three feature layers, the spatial information of the layer is richer and is suitable for detecting and detecting a tiny target, and the four feature layers are respectively represented by flat 1 (a first feature layer), flat 2 (a second feature layer), flat 3 (a third feature layer) and flat 4 (a fourth feature layer) from shallow depth to deep. That is, feat1-feat4 are four feature layers with different scales obtained by feature extraction of the trunk network CSPDarknet 53;
and aiming at two shallow feature layers of flat 1-flat 2, a CA attention module is added to reduce the interference of the background.
In fig. 3, the convolution unit is a general convolution module for reducing the number of channels; the transposition convolution unit is used for expanding the width and the height of the characteristic layer, and the up-sampling operation is carried out by utilizing a convolution mode, so that the splicing of the characteristics of the upper layer and the lower layer is facilitated;
the Contact + CSPLAyer structure divides the original input into two branches, respectively carries out convolution operation to reduce the number of channels by half, then carries out a plurality of residual error structure operations on one branch, and then splices the two branches to enable the model to learn more characteristics;
the down-sampling unit is used for compressing the width and height of the characteristic layer and performing down-sampling operation;
the SSHF receptive field superposition module is a characteristic strengthening module and is used for fusing the characteristics of receptive fields with different sizes;
the decoupling output module is used for outputting a decoupling head, and different convolution operations are used for respectively obtaining the output of the category, the confidence coefficient and the coordinate predicted value.
It should be noted that, in order to efficiently improve the precision of the tiny target and the small target, the present application focuses on the improvement of feat1 and feat2 in the feature fusion part.
Because the semantic information of the shallow feature layer is weak, the feature information of the shallow feature layer is extended by adopting a cross-layer fusion mode of context information, please refer to fig. 4, where fig. 4 is a schematic diagram of context jump connection feature fusion. Wherein the emphasized part in fig. 4 is the jumper connection of the right half. The left half of fig. 4 is divided into 4 feature layers with different scales, wherein feat1-feat2 are not connected to the right Concat + CSPlayer via the CA attention mechanism module, which does not mean that fig. 4 is inconsistent with fig. 3, and fig. 4 is only a schematic illustration, so the CA attention mechanism module is omitted.
The context jump join feature fusion is specifically described as follows:
firstly, transpose convolution is used for replacing common up-sampling operation, so that the feature loss of artificial feature engineering to small targets is reduced, besides layer-by-layer fusion of similar feature pyramids, feat4 is subjected to up-sampling for three times and then fused with feat1, feat3 is subjected to up-sampling for two times and then fused with feat2, and therefore spatial information and semantic information of feat1 and feat2 are rich. The multi-scale features of different network depths are extracted in the training process by using the up-sampling and jumping connection, the high-resolution images are processed by using the shallow network, and the low-resolution images are processed by using the deep network, so that more semantic information is extracted while the position information of the small target is kept as much as possible, and the detection performance of the small target is improved under the condition of reducing the calculation cost.
In addition, it should be noted that, in order to further enhance the detection performance of the small target by performing feature enhancement on the shallow feature layer, an SSHF receptive field superposition module is added in the present application.
The module uses convolution kernel serial structures of different sizes to stack, strengthens the receptive field on the characteristic layer, and makes the characteristic information of two shallow characteristic layers richer.
Please refer to fig. 5, fig. 5 is a schematic structural diagram of the SSHF receptive field stacking module.
The traditional SSHF receptive field superposition module adopts convolution kernels of 3x3,5x5 and 7x7 to extract the characteristics, and the context modeling in the mode increases the receptive field of the corresponding layer, is in direct proportion to the step length of the corresponding layer, is in proportion to the target scale of each detection module, and increases the target scale in each detection module.
In the application, in order to reduce the number of model parameters and improve the inference speed, compared with the conventional SSHF receptive field superposition module, the application adopts a plurality of 3x3 convolution kernels stacked in series to replace 5x5 and 7x7 convolution kernels.
After the feature fusion is completed, four feature layers with different sizes can be obtained, the input feature layers feat obtained after the feature fusion are input into an SSHF receptive field superposition module, and then different receptive fields are extractedThe obtained feature layer is spliced and concat superposed to obtain an output feature layer with enhanced features
Figure 493097DEST_PATH_IMAGE001
. Output feature layer
Figure 985258DEST_PATH_IMAGE002
And outputting the target prediction result through a decoupling head of the decoupling output module.
The above part explains three different structures and structural principles of the multi-scale target detection model of the present application; three different structures can be trained in advance and used;
the following further introduces an analysis process of the data set, and the purpose of analyzing the data set is to obtain which structure of the multi-scale target detection model the data set of the type is suitable for, so that the structure can be directly adopted for prediction in a prediction stage for the data of the same type;
s2, acquiring the type of a target detection task and a corresponding training set, and labeling a target to be detected in the training set by adopting a target bounding box to obtain a coordinate of the upper left corner of target information (A)x 1 ,y 1 ) The coordinates of the lower right corner are: (x 2 ,y 2 );
Regarding the target detection task type, the present application is explained as follows:
firstly, a high-altitude unmanned aerial vehicle is used for shooting a target image in an actual scene, and different detection targets under the visual angle of the unmanned aerial vehicle are respectively collected for distinguishing detection tasks.
When urban greening construction is set as a task (namely, the type of the target detection task is urban greening construction), target images such as trees, green lands and the like need to be acquired at high altitude through an unmanned aerial vehicle;
when an urban traffic intelligent planning task is used (namely, the type of the target detection task is urban traffic intelligent planning), a high-altitude unmanned aerial vehicle is required to be used for acquiring road conditions of various roads, and vehicles and pedestrians are used as detection targets.
Aiming at different target detection task types, the structures of the multi-scale target detection models adopted by the method are different, and the method relates to the process of analyzing a data set, namely steps S2-S5;
through the steps S2-S5, the structure of what multi-scale target detection model is adopted by the urban greening construction task can be determined, or the structure of what multi-scale target detection model is adopted by the urban traffic intelligent planning task can be determined; in the subsequent prediction stage, if the prediction task to be carried out is known in advance or is a task established by urban greening or an urban traffic intelligent planning task, the corresponding multi-scale target detection model structure is directly called.
In different detection task types, the method and the device distinguish according to different ratios of the area of the target boundary frame to the area of the image, analyze the corresponding data set, and accordingly divide the detection tasks into a large target detection task, a common target detection task and a small target detection task. The specific classification process is shown in the following steps S3-S5;
in addition, the image is preprocessed before the image is labeled. In an actual detection scene, illumination change is a very common factor which can affect the identification performance, so that data enhancement operation of brightness and contrast change needs to be performed on collected image data, the diversity of the image data is enhanced, the actual environment of a target is simulated, and the robustness of a model is improved.
The means employed for image enhancement in this application are as follows:
generally, the contrast and brightness of an image are changed in a pixel-by-pixel manner, an input original image is set to be f (x), and the image with the changed contrast and brightness is set to beg(x) The regulating formula isg(x)=αf(x)+βWhereinαThe contrast ratio is adjusted by adjusting the contrast ratio,βand adjusting the brightness, namely performing brightness and contrast change operation on the acquired image by adjusting the values of alpha and beta, and finishing the image enhancement operation according to the actual condition.
S3, calculating the ratio of the area of the target boundary frame to the area of the image: (x 2 -x 1 )*(y 2 -y 1 )/W*H(ii) a WhereinWHRespectively the width and the height of the images in the training set;
s4: the evolution of the ratio of the area of the target boundary frame to the area of the image is less than a preset first threshold valuea 1 When the target is a small target; the evolution of the ratio of the area of the target boundary frame to the area of the image is larger than a preset second threshold valuea 2 When the target is a large target; the ratio of the area of the target bounding box to the image area is squareda 1 Anda 2 in between, is a common target; in the embodiment of the present application,a 1 the content of the organic acid is 0.03,a 2 is 0.2; of course, in some other embodiments, the threshold of the ratio of the target bounding box area to the image area may be set by itself according to the actual situation or different detection tasks, or may be set and adjusted automatically in an adaptive manner according to an adaptive algorithm. This is only schematically illustrated in the present application.
It should be noted that, in the steps S3 to S4, the number of various types of targets in the data set is determined, but it is not determined which target detection model the data set is suitable for; determining which detection model is set forth in step S5;
s5, determining a multi-scale target detection model structure by adopting a decision tree method, which specifically comprises the following steps:
calculating the occupation ratios of a large target, a small target and a common target respectivelyC 1C 2C 3 According to the occupation ratio of each target and the set occupation ratio threshold, judging the self-adaptive structure of the multi-scale target detection model, which specifically comprises the following steps: when the small target number ratio exceeds the preset percentage of the whole datapThen, adjusting the multi-scale target detection model into a four-layer feature fusion structure; when the ratio of the large target quantity exceeds the preset percentagepIf not, adjusting the multi-scale target detection model into a three-layer feature fusion structure; in the embodiment of the present application,ptaking 33.33%;
referring to FIGS. 6-7, FIG. 6 is a schematic diagram of a data set that should be categorized into one type after being subjected to a decision tree;
it is simply understood that the inputs to the decision tree algorithm are: the occupation ratios of different detection tasks are 3 occupation ratios which are respectively a small target occupation ratio, a common target occupation ratio and a large target occupation ratio;
the output of the decision tree algorithm is: the number of different multi-scale target detection models is 3, and the models are respectively a two-layer structure model, a three-layer structure model and a four-layer structure model.
Regarding the decision tree, the decision tree is a basic classification and regression method, and is in a tree structure, and is composed of a root node, non-leaf nodes, branches and leaf nodes, wherein the root node represents a first selection point, the non-leaf nodes represent tests on a feature attribute, each branch represents the output of the feature attribute on a value range, and each leaf node stores a category to represent the final decision result.
Firstly, a decision tree is built for a past detection task data set, and a decision tree model is built according to a given training data set, so that the decision tree model can correctly classify examples; the ratio of the number of various targets of the detection task in the past period is a characteristic value, and the number of model layers is a label value.
In the present application, the decision tree algorithm uses ID3 to learn the decision tree, i.e. using information gain as a judgment condition.
Information gain = information entropy-conditional entropy
The entropy is usually used to describe the average value of the information amount brought by the whole random distribution, and has more statistical properties. The specific learning process is as follows:
set of hypothetical samplesDTo middleiThe proportion of the class sample isp i (i =1,2, \ 8943;, N), whereinNIs the total number of sample classes, then the sample setDThe information entropy of (a) is:
Figure 492463DEST_PATH_IMAGE004
assume that samples are assembledDBy attributeaPartition is made, assuming attributesaIs provided withvThe possible values are thenaWith split-out attributesvA subset (i.e. in a tree)vA branch) thereinVRepresenting the total number of subsets (branches), each possible set of values beingD v (|D v L and LD| represents the number of elements in the set), thenaThe method for calculating the conditional entropy of the attribute comprises the following steps:
Figure 95483DEST_PATH_IMAGE006
the information gain expression is then:
Figure 4533DEST_PATH_IMAGE008
and selecting the attribute with the maximum information gain as a classification attribute by using an information entropy principle, recursively expanding branches of the decision tree, and completing the construction of the decision tree.
FIG. 7 is a schematic diagram of a decision result; generally speaking, the target number ratio is used as a root node, the hierarchical structure is used as a leaf node, a decision is made according to the target number ratio, and the decision result is as follows:
when the ratio of the number of the small targets exceeds 1/3 of the whole data, adjusting the small targets into a four-layer feature fusion structure; and when the ratio of the number of the large targets exceeds 1/3, adjusting the large targets into a two-layer feature fusion structure, otherwise, adjusting the large targets into a three-layer feature fusion structure. Of course, in some other embodiments, the duty ratio threshold may be self-adjusting and is not intended to be limiting.
A multi-scale target detection model prediction stage:
s6: acquiring target data to be predicted;
s7: if the target data to be predicted belong to the target detection task type of the multi-scale target detection model determining stage, calling a target detection model structure correspondingly determined by the multi-scale target detection model determining stage to directly predict to obtain a target prediction result;
s8: if the target data to be predicted does not belong to the target detection task type in the multi-scale target detection model determination stage, the target data to be predicted is processed by adopting an Otsu threshold segmentation method, the image is divided into a background part and a foreground part according to the gray characteristic of the image, the type of the target to be predicted is determined according to the ratio of the foreground target pixel value to the whole image pixel value, the occupation ratio of various predicted targets is counted, the structure of the multi-scale target detection model is determined again according to the method in the step S5, and the corresponding structure is called to complete target detection.
Firstly, as an embodiment, for example, for field type detection in an unmanned aerial vehicle image, the task type is a task type corresponding to a multi-scale target detection model determining stage, and in the determining stage, the task for field type detection is determined and classified as a large target detection task, so that in a predicting stage, a two-layer feature fusion structure is directly called for prediction. The experimental results are shown in table 1 below.
TABLE 1 field test results
Figure 404946DEST_PATH_IMAGE010
Aiming at the known target detection task type, for example, the task of removing 27448is taken as a large target detection task by a decision tree, and the decision uses a two-layer model structure to realize the training and prediction of the task. During training, a data set enters a feature fusion part after feature layers are extracted through a backbone network, two-layer feature fusion is carried out on two feature layers extracted through the backbone network, and then the two feature layers are decoupled and output to obtain predicted valuespreThe two-layer model structure has 400 frames (20 × 20) of 2000 prediction frames, corresponding to an anchor frame size of 32 × 32, and 1600 prediction frames (40 × 40), corresponding to an anchor frame size of 16 × 16. When the information of 2000 prediction frames exists, each picture also has the information of a labeled target frame, namely the predicted value can be obtainedpreCalculating the loss with the label value target to obtain the classification losscls_lossAnd regression lossreg_lossTotal loss ofLoss=cls_loss+reg_loss. And inputting the data into an optimizer for back propagation, and updating model parameters to complete a training process.
And loading the weight of the two-layer model structure when predicting the task target and using the model to perform inference prediction to obtain target information.
Similarly, the three-layer network model structure has 8400 prediction frames, 400 frames (20 × 20), and the size of the corresponding anchor frame is 32 × 32.
In the same principle, the middle branch has 1600 prediction boxes (40 × 40), corresponding to anchor boxes of 16 × 16. The lowest branch has 6400 prediction boxes (80 × 80), and the size of the corresponding anchor box is 8 × 8;
the four-layer network model structure has 34000 prediction frames, wherein the first layer has 25600 frames (160 x 160), the corresponding anchor frame size is 4 x 4, and the other 8400 prediction frames have the same structure as the three-layer model structure.
For an unknown target detection task type, preprocessing and analyzing an input image to be detected by using an OSTU threshold segmentation algorithm, and dividing the image into a background part and a foreground part according to the gray characteristic of the image; referring to fig. 8, fig. 8 is a schematic diagram illustrating the effect of the OSTU threshold segmentation algorithm; in fig. 8, the foreground and background of the preliminary target are distinguished, whether the foreground target is a large target or a small target is determined according to the ratio of the foreground target pixel value to the image pixel value, then the number ratio of various targets is counted, the same parameters as those of the data set analysis are obtained, the parameters are input into a decision tree structure for decision, and the image is inferred by using a model structure output by the decision tree.
Referring to fig. 9 for a summary of the whole process, fig. 9 is a schematic diagram illustrating the detailed process of the method of the present invention;
for example, in the existing a, B, and.. X task types, each type has a corresponding data set, and the target detection model structures corresponding to the a, B, and.. X tasks are obtained through the foregoing S2 to S5 sections of the present application;
in the prediction stage, if the predicted task type is known to be one of A, B and. If the type of the target detection task is unknown in the prediction stage, obtaining the ratio of large, medium and small targets in the data to be predicted by adopting an OSTU threshold segmentation algorithm, and further obtaining a target detection model structure for prediction through decision tree analysis.
Finally, please refer to fig. 10 and fig. 11 as an example; FIG. 10 is a schematic diagram of the small target detection effect obtained using a conventional target detection three-layer network architecture; FIG. 11 is a schematic diagram of the small target detection effect obtained by using the improved four-layer feature fusion structure of the present application; the method has the advantages that the prediction is carried out in the image shot by the unmanned aerial vehicle at the same height, when the target in the image is compact and the area is small, more targets can be detected by using the four-layer model structure than the original general detection model, and the accuracy is higher.
In general, the key technical points of the application are as follows:
1. adaptive multiscale structure adjustment method (adaptive target detection task in this application)
Because the current target detection model is a general model based on public standard data, the optimization is better, and the method is suitable for various target detection in daily life.
However, in practical engineering application, a target data set to be detected often has corresponding characteristics, and the efficiency of using a general model is often limited, so that the model efficiency can be greatly improved by designing a corresponding structure according to the target characteristics of the data set.
Before a data set enters a model, data annotation information is transmitted, the area of a boundary box of all target targets is calculated, large, medium and small targets are distinguished through the ratio of the area of the boundary box of the target to the area of an image, usually, the evolution of the ratio of the area of the boundary box of the target to the area of the image is smaller than 0.03 and defined as small targets, the evolution of the ratio of the area of the boundary box of the target to the area of the image is larger than 0.2 and defined as large targets, the quantity of all the small targets and the large targets is subjected to statistical analysis, corresponding decision tree structures are designed, and the scale structures of the model are automatically decided and adjusted.
2. Micro target detection model (four-layer feature fusion structure in the application)
Because the shallow layer network receptive field of the convolutional neural network in deep learning is small, the spatial resolution is higher, the target position is accurate, and the convolutional neural network is suitable for detecting small targets, but the characteristic semantic information representation capability is weak, and the recall rate is low; although the semantic information extracted by the deep network with a large receptive field is more and more rich, because the small target has a small pixel ratio, the feature map is generally reduced after several times of downsampling processing, and the effective area for detecting the small target cannot be distinguished, a feature map fusion mechanism focusing on small target detection is designed, a shallow feature map with high resolution and a deep feature map with rich semantic information are fused, and the small target detection precision is improved.
3. Cross-layer fusion method for context information of four-layer feature layer
When the features of the main network are extracted, a feature layer with a smaller downsampling multiple is additionally extracted, compared with other three feature layers, the spatial information of the feature layer is richer, and the feature layer is suitable for detecting and detecting a tiny target, and the four feature layers are respectively represented by flat 1, flat 2, flat 3 and flat 4 from shallow depth to deep. The method comprises the steps of performing up-sampling on feat4 for three times, then fusing the feat4 with feat1, performing up-sampling on feat3 for two times, fusing the feat4 with feat1, performing up-sampling on feat4 for two times, and then fusing the feat4 with feat2, so that spatial information and semantic information of the feat1 and the feat2 are rich.
4. SSHF receptor field superposition mechanism
In order to reduce the number of model parameters and increase the inference speed, a series stack of several convolution kernels of 3x3 is used instead of convolution kernels of 5x5 and 7x 7.
The beneficial effects of the invention are: the method can carry out various target detection in real time and in a self-adaptive manner, improves the universality of target detection and ensures the precision of target detection.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A remote space self-adaptive multi-scale small target detection method is characterized by comprising the following steps: the method comprises two stages, which are respectively:
a multi-scale target detection model determining stage and a multi-scale target detection model predicting stage;
the multi-scale target detection model determining stage comprises the following processes:
s1, constructing a multi-scale target detection model, wherein the multi-scale target detection model comprises three parts, namely a two-layer feature fusion structure, a three-layer feature fusion structure and a four-layer feature fusion structure; the two-layer feature fusion structure, the three-layer feature fusion structure and the four-layer feature fusion structure are trained in advance;
s2, acquiring the type of a target detection task and a corresponding training set, and labeling a target to be detected in the training set by adopting a target bounding box to obtain a coordinate of the upper left corner of target information (A)x 1 ,y 1 ) The coordinates of the lower right corner are: (x 2 ,y 2 );
S3, calculating the ratio of the area of the target boundary frame to the area of the image: (x 2 -x 1 )*(y 2 -y 1 )/W*H(ii) a WhereinWHRespectively the width and the height of the images in the training set;
s4: the evolution of the ratio of the area of the target boundary frame to the area of the image is less than a preset first threshold valuea 1 When the target is a small target; the evolution of the ratio of the area of the target bounding box to the area of the image is larger than a preset second threshold valuea 2 When the target is a large target; the ratio of the area of the target bounding box to the image area is deriveda 1 Anda 2 in between, is a common target;
s5, determining a multi-scale target detection model structure by adopting a decision tree method, which specifically comprises the following steps:
calculating the occupation ratios of a large target, a small target and a common target respectivelyC 1C 2C 3 According to the occupation ratio of each target and the set occupation ratio threshold, judging the self-adaptive structure of the multi-scale target detection model, which specifically comprises the following steps: when the ratio of the small target number exceeds the preset percentage of the whole datapAdjusting the multi-scale target detection model into a four-layer feature fusion structure; when the number of the large targets is in proportionOver a predetermined percentagepIf not, adjusting the multi-scale target detection model into a three-layer feature fusion structure;
a multi-scale target detection model prediction stage:
s6: acquiring target data to be predicted;
s7: if the target data to be predicted belong to the target detection task type of the multi-scale target detection model determining stage, calling a target detection model structure correspondingly determined by the multi-scale target detection model determining stage to directly predict to obtain a target prediction result;
s8: if the target data to be predicted does not belong to the target detection task type in the multi-scale target detection model determination stage, the target data to be predicted is processed by adopting an Otsu threshold segmentation method, the image is divided into a background part and a foreground part according to the gray characteristic of the image, the type of the target to be predicted is determined according to the ratio of the foreground target pixel value to the whole image pixel value, the occupation ratio of various predicted targets is counted, the structure of the multi-scale target detection model is determined again according to the method in the step S5, and the corresponding structure is called to complete target detection.
2. A method for remote spatially adaptive multi-scale small object detection as claimed in claim 1, wherein: the two-layer feature fusion structure includes: the system comprises a backbone network, a CA attention mechanism module, a two-layer feature fusion module and a decoupling output module;
the three-layer feature fusion structure adopts a PAFPN structure in a yolo network series;
the four-layer feature fusion structure comprises: the system comprises a backbone network, a CA attention mechanism module, a four-layer feature fusion module, an SSHF receptive field superposition module and a decoupling output module; the SSHF receptive field superposition module is formed by sequentially connecting 3 convolution kernels with 3 multiplied by 3.
3. A method for detecting a small object with adaptive multiscale distance space as claimed in claim 2, wherein: the two-layer feature fusion structure is as follows:
the method comprises the steps that an input image is subjected to downsampling feature extraction through a backbone network, and two downsampling feature layers with different scales from shallow to deep are obtained and are respectively a first feature layer and a second feature layer;
the first characteristic layer and the second characteristic layer respectively pass through a CA attention mechanism module to obtain a first enhancement characteristic and a second enhancement characteristic;
the first enhanced feature and the second enhanced feature are subjected to feature fusion through a feature fusion module to obtain a fusion feature;
the fusion characteristics are processed by a decoupling output module to obtain a large target detection result.
4. A method for remote spatially adaptive multi-scale small object detection as claimed in claim 3, wherein: the two-layer feature fusion module comprises: the device comprises a convolution unit, a transposition convolution unit, two Contact + CSPLAyer structures and a down-sampling unit; the specific process of the feature fusion module for feature fusion is as follows:
after the second enhanced features sequentially pass through the convolution unit and the transposition convolution unit, the obtained first convolution result is fused with the first enhanced features to obtain a first fusion result;
the first fusion result is divided into two branches through a Contact + CSPLAyer structure, and one branch is directly decoupled through a decoupling output module to obtain first decoupling information; the other branch is subjected to down-sampling by a down-sampling unit to obtain down-sampling characteristics of a first fusion result;
the downsampling feature of the first fusion result and the second enhancement feature are fused through a second convolution result obtained after a convolution unit, and a second fusion result is obtained;
after the second fusion result passes through another Contact + CSPLAyer structure, the second fusion result is directly processed by a decoupling output module to obtain second decoupling information;
and superposing the first decoupling information and the second decoupling information to obtain a large target detection result.
5. A method for remote spatially adaptive multi-scale small object detection as claimed in claim 2, wherein: the four-layer feature fusion structure is as follows;
the method comprises the steps that an input image is subjected to downsampling feature extraction through a backbone network, and four downsampling feature layers with different scales from shallow to deep are obtained and are respectively a first feature layer, a second feature layer, a third feature layer and a fourth feature layer;
the first characteristic layer and the second characteristic layer respectively pass through a CA attention mechanism module to obtain a first enhancement characteristic and a second enhancement characteristic;
after feature fusion is carried out on the first enhanced feature, the second enhanced feature, the third feature layer and the fourth feature layer through a four-layer feature fusion module, a fusion feature is obtained;
the fusion characteristics are processed by an SSHF receptive field superposition module and a decoupling output module in sequence to obtain a small target detection result.
6. A method for remote spatially adaptive multi-scale small object detection as claimed in claim 5, wherein: the four-layer feature fusion module comprises: the system comprises three convolution units, three transposition convolution units, six Contact + CSPLAyer structures and three downsampling units; the four-layer feature fusion module adopts a context jump connection mechanism, and the specific fusion process is as follows:
the fourth feature layer passes through a first convolution unit to obtain a first convolution result; after the first convolution result passes through a first transposition convolution unit, a first transposition convolution result is obtained; the first transposition convolution result and the third feature layer pass through a first Contact + CSP layer structure to obtain a first fusion result;
the first fusion result is divided into two branches, and in the first branch, the first fusion result passes through a second convolution unit to obtain a second convolution result; in the second branch, the first fusion result and the first convolution result are input into a third Contact + CSPLAyer structure together;
the second convolution result passes through a second transposition convolution unit to obtain a second transposition convolution result; the second transposition convolution result, the second enhancement feature and the first convolution result pass through a second Contact + CSPLAyer structure to obtain a second fusion result;
the second fusion result is divided into two branches; in the first branch, the second fusion result sequentially passes through a third convolution unit and a third transposition convolution unit to obtain a third transposition convolution result; in the other branch, the second fusion result is input into a fourth Contact + CSP layer structure;
the first enhanced feature, the third transposition convolution result, the first fusion result and the first convolution result pass through a third Contact + CSPLAyer structure to obtain a third fusion result;
the third fusion result is divided into two branches, and in the first branch, the third fusion result sequentially passes through the SSHF receptive field superposition module and the first decoupling output module to obtain first decoupling information; in the other branch, the third fusion result is subjected to first downsampling operation through a first downsampling unit, and the obtained first downsampling result is input into a fourth Contact + CSP layer structure;
the second fusion result and the first down-sampling result pass through a fourth Contact + CSPLAyer structure to obtain a fourth fusion result; the fourth fusion result is divided into two branches, and in the first branch, the fourth fusion result sequentially passes through the SSHF receptive field superposition module and the second decoupling output module to obtain second decoupling information; in the other branch, the fourth fusion result is subjected to second down-sampling operation through a second down-sampling unit, and the obtained second down-sampling result is input into a fifth Contact + CSP layer structure;
the second convolution result and the second downsampling result pass through a fifth Contact + CSPLAyer structure to obtain a fifth fusion result; the fifth fusion result is divided into two branches, and in the first branch, the fifth fusion result passes through a third decoupling output module to obtain third decoupling information; in the other branch, the fifth fusion result is subjected to third down-sampling operation through a third down-sampling unit, and the obtained third down-sampling result is input into a sixth Contact + CSP layer structure;
the first convolution result and the third down-sampling result pass through a sixth Contact + CSPLAyer structure to obtain a sixth fusion result; the sixth fusion result passes through a fourth decoupling output module to obtain fourth decoupling information;
and superposing the first decoupling information, the second decoupling information, the third decoupling information and the fourth decoupling information to obtain a small target detection result.
CN202211188231.0A 2022-09-28 2022-09-28 Long-distance space self-adaptive multi-scale small target detection method Active CN115272814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211188231.0A CN115272814B (en) 2022-09-28 2022-09-28 Long-distance space self-adaptive multi-scale small target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211188231.0A CN115272814B (en) 2022-09-28 2022-09-28 Long-distance space self-adaptive multi-scale small target detection method

Publications (2)

Publication Number Publication Date
CN115272814A CN115272814A (en) 2022-11-01
CN115272814B true CN115272814B (en) 2022-12-27

Family

ID=83755864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211188231.0A Active CN115272814B (en) 2022-09-28 2022-09-28 Long-distance space self-adaptive multi-scale small target detection method

Country Status (1)

Country Link
CN (1) CN115272814B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016512A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image small target detection method based on feedback type multi-scale training
CN113313118A (en) * 2021-06-25 2021-08-27 哈尔滨工程大学 Self-adaptive variable-proportion target detection method based on multi-scale feature fusion
WO2021203863A1 (en) * 2020-04-10 2021-10-14 腾讯科技(深圳)有限公司 Artificial intelligence-based object detection method and apparatus, device, and storage medium
CN113673621A (en) * 2021-08-30 2021-11-19 哈尔滨工业大学(威海) Quasi-circular target detection method based on convolutional neural network and MAML algorithm
CN113762209A (en) * 2021-09-22 2021-12-07 重庆邮电大学 Multi-scale parallel feature fusion road sign detection method based on YOLO
US11205098B1 (en) * 2021-02-23 2021-12-21 Institute Of Automation, Chinese Academy Of Sciences Single-stage small-sample-object detection method based on decoupled metric
CN113989616A (en) * 2021-10-26 2022-01-28 北京锐安科技有限公司 Target detection method, device, equipment and storage medium
CN114494893A (en) * 2022-04-18 2022-05-13 成都理工大学 Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network
WO2022134624A1 (en) * 2020-12-22 2022-06-30 亿咖通(湖北)技术有限公司 Pedestrian target detection method, electronic device and storage medium
CN114782311A (en) * 2022-03-14 2022-07-22 华南理工大学 Improved multi-scale defect target detection method and system based on CenterNet
CN115063573A (en) * 2022-06-14 2022-09-16 湖北工业大学 Multi-scale target detection method based on attention mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863301A (en) * 2022-05-07 2022-08-05 西南科技大学 Small target detection method for aerial image of unmanned aerial vehicle

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021203863A1 (en) * 2020-04-10 2021-10-14 腾讯科技(深圳)有限公司 Artificial intelligence-based object detection method and apparatus, device, and storage medium
CN112016512A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image small target detection method based on feedback type multi-scale training
WO2022134624A1 (en) * 2020-12-22 2022-06-30 亿咖通(湖北)技术有限公司 Pedestrian target detection method, electronic device and storage medium
US11205098B1 (en) * 2021-02-23 2021-12-21 Institute Of Automation, Chinese Academy Of Sciences Single-stage small-sample-object detection method based on decoupled metric
CN113313118A (en) * 2021-06-25 2021-08-27 哈尔滨工程大学 Self-adaptive variable-proportion target detection method based on multi-scale feature fusion
CN113673621A (en) * 2021-08-30 2021-11-19 哈尔滨工业大学(威海) Quasi-circular target detection method based on convolutional neural network and MAML algorithm
CN113762209A (en) * 2021-09-22 2021-12-07 重庆邮电大学 Multi-scale parallel feature fusion road sign detection method based on YOLO
CN113989616A (en) * 2021-10-26 2022-01-28 北京锐安科技有限公司 Target detection method, device, equipment and storage medium
CN114782311A (en) * 2022-03-14 2022-07-22 华南理工大学 Improved multi-scale defect target detection method and system based on CenterNet
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network
CN114494893A (en) * 2022-04-18 2022-05-13 成都理工大学 Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN115063573A (en) * 2022-06-14 2022-09-16 湖北工业大学 Multi-scale target detection method based on attention mechanism

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Multi-Scale Object Detection Using Feature Fusion Recalibration Network;Ziyuan Guo et al;《IEEE》;20200313;全文 *
Multi-Scale Object Detection with Feature Fusion and Region Objectness Network;Wenjie Guan et al;《2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20180913;全文 *
单阶段模型下多尺度及卷积特征融合的目标检测算法研究;韩同强;《中国优秀硕士学位论文全文数据库》;20220315;全文 *
基于多尺度加权特征融合网络的地铁行人目标检测算法;董小伟等;《电子与信息学报》;20211231;全文 *
基于多尺度特征融合的自适应无人机目标检测;刘芳等;《光学学报》;20200525(第10期);全文 *

Also Published As

Publication number Publication date
CN115272814A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN111784685A (en) Power transmission line defect image identification method based on cloud edge cooperative detection
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN110569814B (en) Video category identification method, device, computer equipment and computer storage medium
CN105550701A (en) Real-time image extraction and recognition method and device
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN105574550A (en) Vehicle identification method and device
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN111767927A (en) Lightweight license plate recognition method and system based on full convolution network
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN111368775A (en) Complex scene dense target detection method based on local context sensing
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion
CN115311508A (en) Single-frame image infrared dim target detection method based on depth U-type network
CN113763364B (en) Image defect detection method based on convolutional neural network
CN109508639B (en) Road scene semantic segmentation method based on multi-scale porous convolutional neural network
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN114550023A (en) Traffic target static information extraction device
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
CN117058386A (en) Asphalt road crack detection method based on improved deep Labv3+ network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant