CN116612382A - Urban remote sensing image target detection method and device - Google Patents

Urban remote sensing image target detection method and device Download PDF

Info

Publication number
CN116612382A
CN116612382A CN202310405274.8A CN202310405274A CN116612382A CN 116612382 A CN116612382 A CN 116612382A CN 202310405274 A CN202310405274 A CN 202310405274A CN 116612382 A CN116612382 A CN 116612382A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
bounding box
obtaining
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310405274.8A
Other languages
Chinese (zh)
Inventor
蓝金辉
张铖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202310405274.8A priority Critical patent/CN116612382A/en
Publication of CN116612382A publication Critical patent/CN116612382A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for detecting an urban remote sensing image target, comprising the following steps: obtaining an urban remote sensing image, and preprocessing the image to obtain a subgraph; inputting the subgraph into a mixed attention backbone network for feature extraction to obtain a feature map; constructing a dual detection network, and processing the feature map to obtain a prediction rotation boundary frame; obtaining the deviation between the predicted rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value; and reserving an optimal boundary box, and outputting a final detection result. The method and the device can realize accurate detection of urban remote sensing targets with different directions under the overlooking view aiming at satellite or airborne urban remote sensing images, and have universality.

Description

Urban remote sensing image target detection method and device
Technical Field
The application relates to the technical field of target detection in computer vision, in particular to a method and a device for detecting an urban remote sensing image target.
Background
At present, remote sensing image target detection facing targets in different directions is always a difficult problem in the field. In the existing remote sensing image target detection technology, the detection method can be mainly divided into a method based on traditional machine learning and a method based on deep learning. Traditional machine learning based methods search for objects and sort through sliding windows on a given image and typically require manual design of features. The deep learning-based method generally comprises four steps of image feature extraction, image feature fusion, object classification regression and back propagation.
For the traditional remote sensing image target detection task, most algorithms generally adopt a sliding window method to acquire candidate areas and perform classification and identification of interested targets on the basis of the candidate areas, the method needs manual design features in advance, the designed features sometimes cannot effectively extract image feature information, and the remote sensing image has targets with different scales and different angle directions, so that the application of the traditional target detection method on the remote sensing image is further hindered. With the continuous development of deep learning technology, more and more deep learning algorithms are applied to the remote sensing field, the special big data migration learning method of deep learning enables the remote sensing information extraction technology to be further improved, the bottom traditional features such as textures and shapes of remote sensing images are greatly utilized, and classification and identification of remote sensing targets are enabled to be quicker and more accurate by extracting semantic features of the remote sensing images, so that the accuracy of remote sensing image target detection is greatly improved. However, in recent years, most of remote sensing image target detection algorithms are still universal horizontal frame detection methods, and although the problems of multiple scales and multiple targets of the remote sensing image can be solved to a certain extent, the problems of multiple angles of the target of the remote sensing image, especially the problems of background redundancy interference caused by objects and inaccurate positioning of a boundary frame, can not be well solved. When the object is closely discharged, a plurality of boundary frames are overlapped in a large amount due to the defects of the general horizontal frame, and useless information such as excessive background is introduced, so that the detection effect of the remote sensing image is greatly affected. Therefore, the research of the remote sensing image target detection algorithm based on the rotating frame is an important research direction in the remote sensing field.
Disclosure of Invention
The application provides a method and a device for detecting an urban remote sensing image target, which are used for solving the technical problem that the prior art is difficult to obtain a better detection effect on the urban remote sensing target with changeable angles.
In order to solve the technical problems, the application provides the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for detecting an urban remote sensing image target, including:
obtaining an urban remote sensing image, and preprocessing the image to obtain a subgraph;
inputting the subgraph into a mixed attention backbone network for feature extraction to obtain a feature map;
constructing a dual detection network, and processing the feature map to obtain a prediction rotation boundary frame;
obtaining the deviation between the predicted rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value;
and reserving an optimal boundary box, and outputting a final detection result.
Further, the urban remote sensing image is a visible light image shot by a satellite or an airborne sensor;
the preprocessing is to cut the original picture into a plurality of small pictures to be input into a network attention backbone network, and the sub-picture prediction result is combined into a large picture through jigsaw and post-processing.
Further, the feature extraction includes:
urban remote sensing image I a Input into a deep convolutional neural network model, global and local information of an object in an image is input by mixing self-attentionLine feature extraction and final output of information integration feature map I b
Further, a detection decoupling network in a dual detection network is constructed, and category information and position angle information of the target are respectively predicted through split classification regression operation, including:
for input feature map I b Classification processing is carried out to obtain category information C of the target, regression operation is carried out to obtain position and angle information (x, Y, w, h and theta) of the target, wherein x and Y are respectively the abscissa and the ordinate of the center point of the boundary frame, h and w are respectively the length and the width of the boundary frame, and theta is the rotation angle of the boundary frame.
Further, constructing an angle correction network in the dual detection network, and simultaneously obtaining corrected angle information of the target through angle regression operation to obtain a corrected predicted rotation bounding box, wherein the method comprises the following steps:
map I of the characteristics b And (3) similarly, inputting the angle correction network to perform regression operation to obtain correction angle information theta', solving L1 norms of theta and theta to obtain deviation delta theta, if delta theta is larger than a preset threshold value x, giving theta to obtain the purpose of correcting the rotation angle, otherwise, keeping unchanged, fusing the obtained position and angle information, and finally outputting the predicted rotation boundary frame.
Further, a smooth-z loss function is provided to obtain a deviation between a predicted rotation bounding box and a true value, and a new bounding box is obtained by iteratively optimizing a loss value, including:
based on the obtained predicted rotation boundary box, obtaining an initial loss value between the predicted rotation boundary box and a true value by using a loss function; then re-extracting characteristic points of the urban remote sensing image Ia, and iterating for preset times to obtain I a Loss value set under real label supervisionSelect->As a loss value L a To update the position angle information; wherein, the liquid crystal display device comprises a liquid crystal display device,/>representing the loss value L obtained by the nth iteration a And (2) N represents the preset iteration times, and max represents the iteration times with the minimum loss value. And finally converging the loss function to obtain a new boundary box.
Further, the reserving the optimal bounding box, outputting the final detection result, includes:
and regenerating a new bounding box list for all the obtained bounding boxes, and then sequencing the bounding boxes through formula calculation to obtain the coordinates and confidence scores of the optimal bounding boxes.
In a second aspect, an embodiment of the present application further provides an urban remote sensing image target detection device, where the method for implementing any one of the embodiments of the first aspect of the present application includes: the device comprises an acquisition module, a detection module and a selection module.
The acquisition module is used for acquiring the urban remote sensing image, and preprocessing the image to acquire a subgraph.
The detection module comprises a mixed attention unit, a dual detection network unit and an optimization unit. The mixed attention unit is used for extracting the characteristics of the image and acquiring a characteristic diagram. And the dual detection network unit obtains a prediction rotation boundary box by processing the feature map. And the optimizing unit acquires the deviation between the predicted rotation bounding box and the true value through a smooth-z loss function, and acquires a new bounding box through iterative optimization of the loss value.
The selection module is used for reserving an optimal boundary box and outputting a final detection result.
The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the embodiments of the first aspect of the present application.
The embodiment of the application also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of being run on the processor, wherein the processor realizes the method according to any embodiment of the first aspect of the application when executing the computer program.
The technical scheme provided by the application has the beneficial effects that at least:
according to the urban remote sensing image target detection method and device, firstly, an image subgraph is obtained through preprocessing, the target is subjected to feature extraction through a mixed attention backbone network, then, the feature graph is subjected to classification regression and angle correction through a dual detection network, the deviation between a predicted value and a true value is optimized through a smoothz loss function, and finally, a boundary box is subjected to optimal screening, so that accurate target detection of the urban remote sensing image is achieved. The detection method provided by the application can realize accurate detection of the rotating target aiming at the urban remote sensing image.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for detecting an urban remote sensing image target according to an embodiment of the application;
FIG. 2 is a schematic diagram of a mixed attention network for obtaining feature maps according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a dual detection network for obtaining a prediction rotation bounding box according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a method for optimizing and updating a smooth-z loss function of a new bounding box according to an embodiment of the present application;
FIG. 5 is a flowchart of an optimal bounding box screening method according to an embodiment of the present application;
fig. 6 is an embodiment of the urban remote sensing image target detection device of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The embodiment of the application provides a city remote sensing image target detection method, as shown in fig. 1, which comprises the following steps:
step 110, acquiring an urban remote sensing image, and preprocessing the image to acquire a subgraph;
step 120, inputting the subgraph into a mixed attention backbone network for feature extraction to obtain a feature map;
130, constructing a dual detection network, and processing the feature map to obtain a prediction rotation boundary frame;
step 140, obtaining the deviation between the predicted rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value;
and 150, reserving an optimal boundary box and outputting a final detection result.
Aiming at the problem of remote sensing image target detection, the embodiment provides a novel urban remote sensing image target detection method which can be realized by electronic equipment. The method comprises the steps of inputting urban remote sensing images into a computer, extracting image features based on mixed attention, carrying out position angle regression and angle correction information on feature images by using a dual detection network, solving rotating frame deviation by using a smooth-z loss function, screening boundary frames by adopting an optimal boundary frame screening method, and outputting a final rotating detection frame.
Further, as shown in fig. 2, to implement step 120, the sub-graph input mixed attention backbone network performs feature extraction to obtain a feature graph, which specifically includes:
and extracting the characteristics of the input city remote sensing subgraphs by adopting a method of a mixed attention mechanism.
The main process of the mixed attention mechanism is that an input feature map is subjected to feature compression through convolution operation, the compressed feature map is input into a multi-head self-attention module to extract spatial features of an image key region, then the feature map is expanded to the original size through up-sampling operation and is spliced and connected with the input feature map, and the fused feature map is subjected to the operation again to obtain local features. The formula for self-attention is as follows:
wherein Q represents an inquiry vector, K represents a key vector, V represents a weight, d k Representing the vector k i And T represents the transposed matrix.
And simultaneously carrying out maximum value pooling and average value pooling on the feature images, carrying out channel addition operation on the pooled feature images, and then carrying out convolution and activation operation on the added feature images to obtain image global features. And the two new feature graphs are spliced into one feature graph, namely the mixed attention processing process.
Further, as shown in fig. 3, to implement step 130, a dual detection network is constructed, and the feature map is processed to obtain a prediction rotation bounding box, which specifically includes:
and constructing a detection decoupling network in the dual detection network, and respectively predicting the category information and the position angle information of the target through split classification regression operation.
Classifying the feature map, wherein the specific mode is as follows: by treating the tag class of the object as a discrete value, the object class can be treated as a classification problem. The network detection head adopts a classifier, the output of the classifier also has corresponding output quantity according to the quantity of target categories to be predicted, the output quantity respectively corresponds to the prediction scores of the positive samples belonging to a certain category, and the urban remote sensing image training set is assumed to be { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ) X, where x i An output feature vector representing the detection head; y is i Representing the true value of the sample, wherein the value is a preset label true category; n is the number of training set samples. After network forward propagation, the output of the classifier can be expressed as
Wherein T represents the transpose of the matrix, w i Is a weight parameter for connecting the neuron in the detection head with the ith output neuron of the softmax classifier; h is a w (x 1 ) Is a probability vector, the sum of each item in the vector is 1, each item represents the probability value of the sample belonging to the corresponding category, and the category with the highest probability is taken as the classification result.
And carrying out regression operation on the feature map, wherein the specific mode is as follows: firstly modeling the translation change, the size change and the angle change of the target frame, then adopting L2 norm regularization to carry out least square linear regression so as to avoid overfitting data with high-dimensional characteristics, and finally outputting the target frame after finishing. The purpose of the bounding box regression is to learn a mapping of the correct label box (group-trunk) to the region candidate box (Region Proposal box). P= (P) x ,P y ,P w ,P h ,P θ ) Where P represents the region candidate frame, x, y, w, h, θ represent the abscissa of the center of the rectangular frame of the image and the length and width of the rectangular frame and the rotation angle of the rectangular frame, respectively. G= (G) x ,G y ,G w ,G h ,G θ (ii) wherein G represents the correct label box. Through five learnable functions S x (P),S y (P),S w (P),S h (P),S θ (P) converting the region candidate frame P into the correct label frame G, in which case the converted bounding frame G' does not normally completely coincide with the correct label frame G due to the presence of errors. The P to G' transformation includes a translation transformation and a scale transformation of the bounding box. S is S x (P),S y (P) corresponds to the following bounding box translation transformation: g'. x =P w S x (P)+P x ,G′ y =P h S y (P)+P y ,S w (P),S h (P) corresponds to the following bounding box translation transformation:
G′ w =P w exp(S w (P)),G′ h =P h exp(S h (P)),
S θ (P) corresponds to the following angular transformation: g'. θ =P θ +S w (P)+kπ。
And constructing an angle correction network in the dual detection network, and obtaining corrected angle information of the target through angle regression operation to obtain a corrected prediction rotation boundary frame.
The specific method is as follows: map I of the characteristics b Similarly, the correction angle information θ 'is obtained by performing regression operation in the correction network, the respective deviations of θ and θ' and the true value are obtained by obtaining the L1 norm, if the deviation obtained by θ 'is minimum, θ' is given to θ to obtain the purpose of correcting the rotation angle, otherwise, the rotation angle is kept unchanged, and the angle correction transformation can be expressed as Δθ=min ((P) θ′ -G θ ),(P θ -G θ ) Using an angle-modifying transformation Δθ to replace S w (P) obtaining the corrected angle G' θ
Further, as shown in fig. 4, to implement step 140, a deviation between the predicted rotation bounding box and the true value is obtained by using a smooth-z loss function, and a new bounding box is obtained by iteratively optimizing the loss value, which specifically includes:
and obtaining the deviation between the prediction rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value.
The design process of the smoth-z loss function is as follows:
when the rotating frame boundary is calculated in the training process, the correlation of the universal detection head and the horizontal boundary frame is weak in the classifying and regression process of the universal detection head, so that the classification score and the regression positioning cannot be effectively correlated together, and the obtained result is not reliable enough. Therefore, the association degree of the matching degree and the measurement is adopted in the anchor frame allocation, the regression loss is promoted to further converge, and the regression parameters are as follows:
wherein x, y, w, h and θ respectively represent the center coordinates, width, height and angle of the real frame; x is x a ,y a ,w a ,h a ,θ a Respectively representing the center coordinates, width, height and angle of the anchor frame; x ', y ', w ', h ', θ ' represent the center coordinates, width, height and angle of the prediction bounding box, respectively; l (L) x Representing the x deviation of the anchor frame from the true value, l y ,l w ,l y Lθ is the same; l's' x Representing the x-deviation of the prediction bounding box from the anchor box, l' y ,l′ w ,l′ h ,l′ θ And the same is true.
Based on the above, in this embodiment, the optimization iterative optimization execution process is as follows:
(1) Calculating the state and the activation value of each layer until the last layer;
(2) Calculating the error of each layer, wherein the error calculating process is advanced from the last layer;
(3) Calculating a gradient of each neuron connection weight;
(4) The parameters are updated according to the gradient descent law.
The above steps are iterated until the stopping criterion is met.
Further, as shown in fig. 5, in order to implement step 150, an optimal bounding box screening method is proposed to keep an optimal bounding box, and a final detection result is output, which specifically includes:
first, a new bounding box list is regenerated for each target bounding box, and then the coordinates and confidence scores of the optimal bounding box are obtained through formula calculation. Wherein the confidence of the optimal bounding box is set to be the average confidence of all the boxes forming it, the coordinates of the optimal bounding box are a weighted sum of the coordinates of the boxes constituting it, wherein the weights are the confidence scores of the corresponding boxes, and the calculation formula is as follows
Wherein C is the confidence of the optimal bounding box, C i For the confidence of the ith detection frame in the list, A is the optimal selection coefficient, (x, y) is the coordinates of the fusion frame in the updated list, and N represents the number of boundary frames. Thus, a box with a higher confidence may contribute more to the fused box coordinates than a box with a lower confidence.
In order to implement the method according to any one of the embodiments of the first aspect of the present application, an embodiment of the present application further provides an apparatus for detecting an urban remote sensing image target, as shown in fig. 6, including:
the acquiring module 610 acquires an urban remote sensing image, and performs preprocessing on the image to acquire a subgraph, as shown in step 110;
the detection module 600 further comprises a mixed attention unit 620, a dual detection network unit 630, an optimization unit 640. The mixed attention unit is configured to perform feature extraction on an image to obtain a feature map, in step 120;
the dual detection network element obtains a prediction rotation bounding box by processing the feature map, as in step 130.
The optimizing unit obtains the deviation between the predicted rotation bounding box and the true value through the smooth-z loss function, and obtains a new bounding box through iterative optimization of the loss value, as shown in step 140.
The selection module 650 is configured to retain an optimal bounding box and output a final detection result, as in step 160.
Further, the modules of the apparatus of the present application are used to implement the further optimized embodiments of steps 110 to 150, see fig. 2 to 6 and the related description, and are not described herein again.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Furthermore, it should be noted that the present application can be provided as a method, an apparatus, or a computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
It is finally pointed out that the above description of the preferred embodiments of the application, it being understood that although preferred embodiments of the application have been described, it will be obvious to those skilled in the art that, once the basic inventive concepts of the application are known, several modifications and adaptations can be made without departing from the principles of the application, and these modifications and adaptations are intended to be within the scope of the application. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.

Claims (10)

1. The city remote sensing image target detecting method is characterized by comprising the following steps:
obtaining an urban remote sensing image, and preprocessing the image to obtain a subgraph;
inputting the subgraph into a mixed attention backbone network for feature extraction to obtain a feature map;
constructing a dual detection network, and processing the feature map to obtain a prediction rotation boundary frame;
obtaining the deviation between the predicted rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value;
and reserving an optimal boundary box, and outputting a final detection result.
2. The method for detecting the urban remote sensing image target according to claim 1, wherein the urban remote sensing image is a visible light image shot by a satellite or an airborne sensor;
the preprocessing is to cut the original picture into a plurality of small pictures to be input into a network attention backbone network, and the sub-picture prediction result is combined into a large picture through jigsaw and post-processing.
3. The method for detecting an urban remote sensing image target according to claim 2, wherein the feature extraction comprises:
urban remote sensing image I a Inputting the image data into a deep convolutional neural network model, extracting features of global and local information of a target in the image through mixed self-attention, and finally outputting an information integration feature map I b
4. The method for detecting an urban remote sensing image target according to claim 1, wherein constructing a detection decoupling network in a dual detection network, respectively predicting category information and position angle information of the target by splitting classification regression operation, comprises:
for input feature map I b Classification processing is carried out to obtain category information C of the target, regression operation is carried out to obtain position and angle information (x, y, w, h and theta) of the target, wherein x, y are respectively the abscissa and the ordinate of the center point of the boundary frame, h and w are respectively the length and the width of the boundary frame, and theta is the rotation angle of the boundary frame.
5. The method for detecting an urban remote sensing image target according to claim 1, wherein constructing an angle correction network in a dual detection network, and obtaining corrected angle information of the target through an angle regression operation, and obtaining a corrected predicted rotation bounding box, comprises:
map I of the characteristics b And (3) similarly, inputting the angle correction network to perform regression operation to obtain correction angle information theta', solving L1 norms of theta and theta to obtain deviation delta theta, if delta theta is larger than a preset threshold value x, giving theta to obtain the purpose of correcting the rotation angle, otherwise, keeping unchanged, fusing the obtained position and angle information, and finally outputting the predicted rotation boundary frame.
6. The method for detecting an urban remote sensing image target according to claim 5, wherein the step of obtaining the deviation between the predicted rotation bounding box and the true value by using a smoothz loss function, and obtaining a new bounding box by iteratively optimizing the loss value, comprises:
based on the obtained predicted rotation boundary box, obtaining an initial loss value between the predicted rotation boundary box and a true value by using a loss function; then for city remote sensing image I a Re-extracting the characteristic points, and iterating for preset times to obtain I a Loss value set under real label supervisionSelect->As a loss value L a To update the position angle information; wherein (1)>Representing the loss value L obtained by the nth iteration a N represents the preset iteration times, and max represents the iteration times with the minimum loss value; and finally converging the loss function to obtain a new boundary box.
7. The method of claim 1, wherein the retaining the optimal bounding box and outputting the final detection result comprises:
and regenerating a new bounding box list for all the obtained bounding boxes, and then sequencing the bounding boxes through formula calculation to obtain the coordinates and confidence scores of the optimal bounding boxes.
8. An urban remote sensing image target detection device for implementing the method of any one of claims 1 to 7, comprising:
the acquisition module is used for acquiring an urban remote sensing image, and preprocessing the image to acquire a subgraph;
the detection module comprises a mixed attention unit, a dual detection network unit and an optimization unit;
the mixed attention unit is used for extracting the characteristics of the image and acquiring a characteristic diagram.
And the dual detection network unit obtains a prediction rotation boundary box by processing the feature map.
And the optimizing unit acquires the deviation between the predicted rotation bounding box and the true value through a smooth-z loss function, and acquires a new bounding box through iterative optimization of the loss value.
And the selection module is used for reserving an optimal boundary box and outputting a final detection result.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor to perform the method of any one of claims 1 to 7 when the computer program is executed by the processor.
CN202310405274.8A 2023-04-17 2023-04-17 Urban remote sensing image target detection method and device Pending CN116612382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310405274.8A CN116612382A (en) 2023-04-17 2023-04-17 Urban remote sensing image target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310405274.8A CN116612382A (en) 2023-04-17 2023-04-17 Urban remote sensing image target detection method and device

Publications (1)

Publication Number Publication Date
CN116612382A true CN116612382A (en) 2023-08-18

Family

ID=87684388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310405274.8A Pending CN116612382A (en) 2023-04-17 2023-04-17 Urban remote sensing image target detection method and device

Country Status (1)

Country Link
CN (1) CN116612382A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636078A (en) * 2024-01-25 2024-03-01 华南理工大学 Target detection method, target detection system, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636078A (en) * 2024-01-25 2024-03-01 华南理工大学 Target detection method, target detection system, computer equipment and storage medium
CN117636078B (en) * 2024-01-25 2024-04-19 华南理工大学 Target detection method, target detection system, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109857889B (en) Image retrieval method, device and equipment and readable storage medium
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
EP3340106B1 (en) Method and system for assigning particular classes of interest within measurement data
CN113705478B (en) Mangrove single wood target detection method based on improved YOLOv5
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111027547A (en) Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN110599537A (en) Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN113780296A (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN112232241A (en) Pedestrian re-identification method and device, electronic equipment and readable storage medium
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN110543872A (en) unmanned aerial vehicle image building roof extraction method based on full convolution neural network
CN111798417A (en) SSD-based remote sensing image target detection method and device
CN117152554A (en) ViT model-based pathological section data identification method and system
CN116612382A (en) Urban remote sensing image target detection method and device
CN117351352A (en) SAR ship image target recognition method based on lightweight YOLOv5 network model
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN116071570A (en) 3D target detection method under indoor scene
CN115359091A (en) Armor plate detection tracking method for mobile robot
CN113537397B (en) Target detection and image definition joint learning method based on multi-scale feature fusion
CN113920311A (en) Remote sensing image segmentation method and system based on edge auxiliary information
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
CN114639013A (en) Remote sensing image airplane target detection and identification method based on improved Orient RCNN model
CN114005017A (en) Target detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination