CN113033315A - Rare earth mining high-resolution image identification and positioning method - Google Patents

Rare earth mining high-resolution image identification and positioning method Download PDF

Info

Publication number
CN113033315A
CN113033315A CN202110219415.8A CN202110219415A CN113033315A CN 113033315 A CN113033315 A CN 113033315A CN 202110219415 A CN202110219415 A CN 202110219415A CN 113033315 A CN113033315 A CN 113033315A
Authority
CN
China
Prior art keywords
image
rare earth
prediction
earth mining
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110219415.8A
Other languages
Chinese (zh)
Inventor
李恒凯
肖松松
王利娟
武镇邦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Science and Technology
Original Assignee
Jiangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Science and Technology filed Critical Jiangxi University of Science and Technology
Priority to CN202110219415.8A priority Critical patent/CN113033315A/en
Publication of CN113033315A publication Critical patent/CN113033315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of rare earth mining, in particular to a rare earth mining high-resolution image identification and positioning method, which comprises the following steps of S1: acquiring and preprocessing remote sensing image data; step S2: building a YOLOv3 model; step S3: YOLOv3 algorithm adjustment; step S4: and outputting a result by the model, and marking pixel position information containing the prediction bounding box, namely pixel coordinate information relative to the upper left corner of the image on the remote sensing image in a point form. After the method is adopted, the Yolov3 target detection algorithm is improved, and the attention mechanism is embedded into the feature extraction network, so that the gradient with the attention effect can flow into a deeper network, the extraction capability of the gradient on the key features is improved on the premise of not influencing the detection speed, and the model is quickly and stably converged by improving the loss function.

Description

Rare earth mining high-resolution image identification and positioning method
Technical Field
The invention relates to the technical field of rare earth mining, in particular to a rare earth mining high-resolution image identification and positioning method.
Background
The south ion adsorption type rare earth mining area is one of the most important mining areas for mining rare earth resources in China. The rare earth mine area is wide in related range and is located in remote mountain areas, and the common monitoring method is low in efficiency and poor in timeliness.
The field survey is the work foundation of rare earth mining monitoring, and the existing monitoring means of rare earth mining mainly comprises ground survey, satellite remote sensing monitoring and unmanned aerial vehicle remote sensing monitoring. The high-spatial-resolution images can clearly express the spatial structure and surface texture characteristics of a ground object target, can distinguish more precise compositions in the ground object, and are applied to the rare earth mining process and the ground surface environment disturbance identification, but the field investigation and the satellite remote sensing monitoring have the limitations of low efficiency and poor timeliness. With the rapid development of deep learning in the field of target detection, a target detection algorithm based on a neural network shows good performance, and becomes a research hotspot in recent years. The target detection algorithm based on the neural network can be divided into Two types, the first type is a Two-stage detection algorithm, firstly, an image input region recommendation network (RPN) is used for generating candidate regions, and then, the candidate regions are finely classified. Representative algorithms are R-CNN, Fast R-CNN, and the like. The algorithm generally has the advantage of high precision, but the detection process is divided into two steps, so that the problems of low detection speed, high storage cost, incapability of compressing the model and the like exist. The second type is One-stage detection algorithm, which treats the target detection task as a single regression problem and is an end-to-end target detection algorithm. The One-stage detection algorithm is superior to the two-stage detection algorithm in detection speed, but the positioning precision is lower than the two-stage detection algorithm. Representative algorithms are YOLO, YOLOv3, SSD, etc. Among them, the YOLOv3 algorithm is widely concerned because of its fast detection speed, good small target detection effect and strong versatility.
The Chinese patent CN 110147778A discloses a rare earth mining identification method, starting from the state of a sedimentation tank and the spatial distribution relation thereof in the process of mining the ion adsorption type rare earth mine, a deep learning model based on a high spatial resolution remote sensing image is constructed, and the identification and detection of the mining state of the rare earth are realized. The model adopts a deep learning algorithm of a convolutional neural network combining a feature pyramid network FPN and bilinear interpolation ROIAlign, in addition, aiming at the ore leaching liquid feature existing in a sedimentation tank in the ion rare earth mining process, a water body index NDWI of a remote sensing image is added as an input training model, and then the model is used for ion type rare earth mining identification; the combined recognition effect of the FPN + ROIAlign + NDWI is the best, the high recognition accuracy can be realized, and the technical support can be provided for the supervision of the ionic rare earth mining.
Disclosure of Invention
The invention aims to provide a method for rapidly and accurately monitoring the mining state of a rare earth mining area.
In order to solve the above technical problems, the present invention provides a method for recognizing and positioning rare earth mining high-resolution images, comprising the following steps,
step S1: remote sensing image data is obtained and preprocessed, radiation correction, geometric correction and image fusion preprocessing are carried out after the remote sensing image data is obtained, then the remote sensing image data is led out to be an RGB three-channel image, and finally the image is cut to remove the part which does not contain the detection target in the image;
step S2: building a YOLOv3 model, firstly determining a network structure of a YOLOv3 algorithm, and then determining a loss function in a YOLOv3 algorithm, wherein the loss function comprises a first part of target positioning loss, a second part of target confidence loss and a last part of target classification loss;
step S3: adjusting a YOLOv3 algorithm, and replacing an original target positioning Loss function by using CIOU Loss when a prediction frame and a real frame are not intersected; embedding CBAM in Darknet-53 of the YOLOv3 algorithm network structure;
step S4: and outputting a result by the model, and marking pixel position information containing the prediction bounding box, namely pixel coordinate information relative to the upper left corner of the image on the remote sensing image in a point form.
Preferably, after the image is cut in step S1, the data set is expanded by using a method of inverting, rotating, mirroring, luminance, chrominance, or gaussian blur enhancement.
Preferably, the determination of the network structure of the YOLOv3 algorithm in the step S2 includes a Darknet-53 feature extraction network part and a multi-scale detection part.
Preferably, the multi-scale detection part performs feature fusion on the feature map of each scale and the feature map of the last scale subjected to twice upsampling through the structure of the feature pyramid network adopted by the YOLOv3 algorithm.
Preferably, the target location loss in step S3 takes a mean square error as an objective function of the loss function, and specifically includes first calculating a ratio of an area of an intersection of a prediction box and a real box generated by the network to an area of a union, and obtaining an intersection ratio of the two boxes; then screening the prediction frames through a preset IOU threshold value, and screening out the prediction frames with the IOU larger than the threshold value; and finally calculating the corresponding target positioning loss.
Preferably, when the prediction box and the real box are not intersected and the IOU values of the two boxes are 0, the CIOU Loss is used for replacing the original target positioning Loss function.
Preferably, in the step S1, the clipped image needs to be divided into a plurality of images with smaller sizes in the image training and recognition process, and then input to the model for training and detection.
Preferably, if the sediment pool is identified as 2 or more prediction frames in the segmented image, the incomplete prediction frame needs to be replaced by the complete prediction frame, which specifically includes:
firstly, a threshold value alpha is appointed, the IOMIN indexes of two prediction boxes are calculated, if the IOMIN is larger than alpha, the prediction box with the smaller area is deleted, and the prediction box with the larger area is reserved.
Preferably, a random forest classification method is adopted in the process of positioning the sedimentation tank of the rare earth mining area of the remote sensing image.
After the method is adopted, the Yolov3 target monitoring algorithm is improved, and the attention mechanism is embedded into the feature extraction network, so that the gradient with the attention effect can flow into a deeper network, the extraction capability of the gradient on the key features is improved on the premise of not influencing the detection speed, and the model is quickly and stably converged by improving the loss function. In addition, an image offset segmentation method and a new index IOMIN are provided, and the image offset segmentation method and the new index IOMIN are combined to solve the problem that a plurality of detection frames may appear on the same target in the remote sensing image segmentation process or the target is difficult to identify after being segmented, so that detection omission occurs. And finally, converting the detection result of the rare earth mining area sedimentation tank into a positioning point in a plane coordinate system by using a coordinate conversion formula, thereby providing technical support for a rare earth management department to know the distribution condition of the rare earth mining area sedimentation tank in time and perform efficient processing work.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a schematic diagram of a preprocessed remote sensing data image;
FIG. 2 is a schematic diagram of a YOLOv3 network structure;
FIG. 3 is a schematic view of a channel attention mechanism;
FIG. 4 is a schematic diagram of a spatial attention mechanism;
FIG. 5 is a schematic illustration of CBAM embedding in the residual structure of Darknet-53;
FIG. 6 is a schematic diagram of a sample bounding box K-means clustering result;
FIG. 7 is a diagram of the size prior and location prediction of bounding boxes;
FIG. 8 is a schematic diagram of the detection flow of the YOLOv3 algorithm;
FIG. 9 is a diagram illustrating the improved results of the YOLOv3 model;
FIG. 10 is a schematic illustration of a sedimentation basin being segmented into two or more images;
FIG. 11 is a schematic diagram of a method for segmenting an image 4;
FIG. 12 is a schematic view of local segmentation of remote sensing images of rare earth mining areas;
FIG. 13 is a schematic view of a sliding window;
FIG. 14 is a schematic view of an XY plane coordinate system;
FIG. 15 is a schematic diagram of the precise positioning of a rare earth mining area sedimentation tank;
FIG. 16 is a diagram illustrating the result of misidentification of a sedimentation tank;
FIG. 17 is a diagram illustrating the results of random forest classification.
Detailed Description
As shown in fig. 1, the method for recognizing and positioning rare earth mining high-resolution images of the present invention comprises the following steps,
step S1: remote sensing image data is obtained and preprocessed, radiation correction, geometric correction and image fusion preprocessing are carried out after the remote sensing image data is obtained, then the remote sensing image data is led out to be an RGB three-channel image, and finally the image is cut to remove the part which does not contain the detection target in the image;
step S2: building a YOLOv3 model, firstly determining a network structure of a YOLOv3 algorithm, and then determining a loss function in a YOLOv3 algorithm, wherein the loss function comprises a first part of target positioning loss, a second part of target confidence loss and a last part of target classification loss;
step S3: adjusting a YOLOv3 algorithm, and replacing an original target positioning Loss function by using CIOU Loss when a prediction frame and a real frame are not intersected; embedding CBAM in Darknet-53 of the YOLOv3 algorithm network structure;
step S4: and outputting a result by the model, and marking the pixel position information containing the prediction bounding box, namely the pixel coordinate information relative to the upper left corner of the image on the remote sensing image in a point form.
In step S1, the present invention uses the Pleiades remote sensing image of france as the study area data. The Pleiades remote sensing image consists of 1 panchromatic waveband with the spatial resolution of 0.5m and red, green, blue and near infrared wavebands with the spatial resolution of 2 m. And (3) preprocessing the Pleiades remote sensing image such as radiation correction, geometric correction, image fusion and the like, exporting the Pleiades remote sensing image into an RGB three-channel image, and cutting the Pleiades remote sensing image by using matlab software according to the image size of 320 pixels multiplied by 320 pixels. In the research area range, a large part of the area displayed by the remote sensing image belongs to the forest land, and most of the cut small images do not contain detection targets and need to be removed. The number of images in the data set after the elimination is small, and therefore the data set is expanded by adopting a data enhancement method such as inversion, rotation, mirror image, brightness, chromaticity, Gaussian blur and the like, as shown in fig. 1, (a) an original image, (b) rotation of 90 degrees counterclockwise, (c) rotation of 180 degrees counterclockwise, (d) rotation of 270 degrees counterclockwise, (e) rotation of 30 degrees clockwise after vertical inversion, (f) horizontal inversion, (g) Gaussian blur, (h) color balance, and (i-j) brightness adjustment in fig. 1. Finally, the method is expanded to 2488 images and randomly divided into a training set, a testing set and a verification set in a ratio of 4:1: 0.4. Sample label data were created using lableIMG manual labeling, including both round and square sedimentation tanks.
In step S2, the YOLOv3 model includes a network structure and a loss function, which are as follows:
(1) network architecture
The network structure of the YOLOv3 algorithm is mainly divided into two parts, as shown in fig. 2. 1) The Darknet-53 feature extracts the network part. The Darknet-53 adopts a full convolution network to realize down-sampling of the characteristic diagram, and the residual structure of ResNet is used for reference to reduce the risk of gradient explosion and avoid gradient disappearance. 2) A multi-scale detection section. To enhance the accuracy of the algorithm for detecting small objects, YOLOv3 adopts a structure similar to a Feature Pyramid Network (FPN), and performs feature fusion on the feature map of each scale and the feature map of the last scale which is subjected to twice upsampling. By the fusion mode, the feature maps with different resolutions can be associated, so that the feature map used for predicting each layer is fused with features with different resolutions and different semantic strengths. Finally, the category and position prediction is carried out on three scales of 13 × 13, 26 × 26 and 52 × 52.
(2) Loss function
The loss function in YOLOv3 is divided into three parts, the first part is the target localization loss, the second part is the target confidence loss, and the last part is the target classification loss.
Loss=Lcoor+Lconf+Lclass (1)
In the formula, LcoorLoss is located for the target; l isconfA target confidence loss; l isclcssThe losses are classified for the target.
Figure BDA0002954011700000071
Figure BDA0002954011700000072
Figure BDA0002954011700000073
Figure BDA0002954011700000074
In the formula, Lxy,LwhRespectively representing the coordinate error of the upper left corner of the prediction frame and the width and height error of the prediction frame; lambda [ alpha ]coorIs an error coordination coefficient; k2Represents a division of the input image into K × K grids; i represents the ith mesh in the first input picture; j is the predicted branch number;
Figure RE-GDA0003071602370000076
representing the center coordinates of the prediction box;
Figure RE-GDA0003071602370000077
representing the center coordinates of the first real frame;
Figure RE-GDA0003071602370000078
indicating whether the ith mesh predicts a target object, if the mesh is responsible for predicting a target
Figure RE-GDA0003071602370000079
Otherwise
Figure RE-GDA00030716023700000710
Figure RE-GDA00030716023700000711
Representing the width and height of the prediction box;
Figure RE-GDA0003071602370000081
representing the width and height of the real frame;
Figure RE-GDA0003071602370000082
the probability value of the target object contained in the prediction frame is obtained;
Figure RE-GDA0003071602370000083
the true value is represented, the value is determined by whether the ith grid is responsible for predicting a certain type of target, and if so, the value is determined by whether the ith grid is responsible for predicting the certain type of target
Figure RE-GDA0003071602370000084
Otherwise
Figure RE-GDA0003071602370000085
classes represents a set of detection target classes;
Figure RE-GDA0003071602370000086
a probability value representing that the prediction box of the ith grid responsible for prediction belongs to the category c;
Figure RE-GDA0003071602370000087
the true value representing the class to which the prediction box belongs, if it belongs to class c
Figure RE-GDA0003071602370000088
Otherwise
Figure RE-GDA0003071602370000089
Loss of target location LcoorMean Square Error (MSE) is used as the objective function of the loss function. Firstly, the ratio of the intersection area of the prediction frame and the real frame generated by the network to the union area is calculated, and the intersection ratio (IOU) of the two frames is obtained. And then screening the prediction frames through a preset IOU threshold value, and screening out the prediction frames with the IOU larger than the threshold value. Finally, calculating the corresponding Lcoor
Figure BDA00029540117000000815
In the formula, b1、b2Respectively representing a prediction frame and a real frame; sI(b1,b2) Representing the area of intersection of the two boxes; sU(b1,b2) The area of the union of the two boxes is indicated.
The step S3 yollov 3 algorithm is adjusted and improved as follows:
the real-time detection performance of the YOLOv3 algorithm benefits from the full convolution network structure, the smaller convolution kernel size and the algorithm design of the regression bounding box, and compared with other target detection models, the YOLOv3 algorithm has the characteristics of high speed and high precision. Aiming at the image characteristics of the rare earth mining area sedimentation tank on the remote sensing image, the YOLOv3 algorithm is improved, so that the detection task of the rare earth mining area sedimentation tank can achieve better performance.
(1) The loss function improves. When the predicted frame and the real frame are not intersected, the IOU value of the two frames is 0, so that the distance between the two frames cannot be reflected, and the target positioning loss function cannot optimize the condition that the predicted frame and the real frame are not intersected. Aiming at the problems, the CIOU Loss is used for replacing the original target positioning Loss function. The CIOU Loss comprehensively considers the distance between the central points of the prediction frame and the real frame, the length-width ratio and the overlapping rate on the basis of the IOU, can better depict the position relation between the prediction frame and the real frame, and improves the target positioning precision through integration of the positioning process of the prediction frame so as to realize faster and more stable convergence of the model.
Figure BDA0002954011700000091
Figure BDA0002954011700000092
Figure BDA0002954011700000093
In the formula, LCIOUIs CIOU Loss, bgtRespectively representing the central points of the prediction frame and the real frame; rho (b, b)gt) Representing the Euclidean distance between the central points of the prediction frame and the real frame; c represents the diagonal length of the minimum bounding rectangle of the union of the prediction box and the real box; alpha is a balance parameter; v is a parameter for measuring the consistency of the aspect ratio of the prediction frame and the real frame; w, wgtRespectively representing the widths of the prediction frame and the real frame; h, hgtRespectively representing the heights of the prediction frame and the real frame;
(2) and (4) improving a feature extraction network. Extracting the key information of the target plays a crucial role in classifying the target, but is easily affected by useless information such as background when extracting the key information of the image in the target detection process. SENET proposed in 2019 adds a channel Attention module to the ResNet residual block, followed by CBAM (volumetric Black Attention model) proposed by Sanghyun et al, which uses both spatial Attention and channel Attention. The CBAM is embedded into Darknet-53 to improve the extraction capability of the CBAM on the key characteristics of the rare earth mining area sedimentation tank.
In the channel attention mechanism, given a feature map of C × H × W (C is the number of channels) as an input, channel attention learns different weights on channel wefts for each channel. Channel attention can be seen as a process of selecting relevant features according to context semantics in a detection task, and when an object is to be predicted, a feature map corresponding to the object is assigned a larger weight. FIG. 3 is a channel attention mechanism configuration. Compressing global information into a channel through global average pooling and global maximum pooling, then reducing feature map latitude through a full connection layer, reconstructing the latitude before inputting the attention module through a ReLU activation function and a full connection layer, and finally obtaining normalized weight through Sigmoid to generate a channel attention matrix. And adding a channel attention mechanism for the feature map through the feature weighting operation to obtain a re-screened feature map, and continuously propagating downwards.
The spatial attention mechanism mainly focuses on the position information of the target on the image and can be regarded as a supplement of the channel attention. In a C H W feature map, the spatial attention mechanism learns different weights on H W-sized feature maps, and the weights are the same in the channel dimension. Fig. 4 is a spatial attention mechanism configuration. The channel information of the feature map is first compressed in the channel dimension using the average pooling and the maximum pooling. And then connecting the results of the two pooling operations to obtain a new characteristic diagram, setting the number of channels to be 1 through a convolution operation of 7 multiplied by 7, obtaining normalized weight through a Sigmoid function, and generating a spatial attention matrix. And finally, increasing the spatial attention of the feature map through the feature weighting operation to obtain a re-screened feature map, and continuously transmitting the feature map downwards.
Training and detecting a YOLOv3 model when the model is established and improved, wherein all experiments are performed on a Windows10 operating system, a processor is Intel Xeon (R) Silvet 4110CPU @ 2.10GHz, a GPU is NVIDIA Quadro P5000, a video memory 16G uses Tensorflow and Keras as a deep learning framework. In terms of parameter setting, the initial learning rate is set to 0.001, the learning rate attenuation coefficient is 0.1, the Batch size is set to 8 (training sample pictures per iteration), the iterations are 22400 times, and the confidence threshold is set to 0.7. The YOLOv3 algorithm does not need to generate a region of interest (ROI) in advance, but directly trains the network in a regression manner, and performs clustering of training sample bounding boxes on the training data set by using a K-moons algorithm, as shown in fig. 6, and finally presets 3 sets of predefined bounding box sizes on 3 scales respectively. As shown in fig. 7, feature extraction is first performed on an input image through a feature extraction network, then feature vectors are input into an FPN structure, mesh regions on 3 scales (13 × 13, 26 × 26, 52 × 52) are generated, each mesh region predicts 3 bounding boxes, 10647 bounding boxes are generated in total, and finally a vector P is predicted in each bounding box. And finally, carrying out non-maximum value inhibition on the generated prediction frame to obtain a final prediction result. The whole detection process is shown in fig. 8, and the partial detection result of the improved YOLOv3 model is shown in fig. 9.
P=(tx,ty,tw,th,IobjIOU,p1,p2,…,pi) (10)
bx=Sigmoid(tx)+Cx (11)
by=Sigmoid(ty)+Cy (12)
Figure BDA0002954011700000111
Figure BDA0002954011700000112
Figure BDA0002954011700000113
In the formula, tx,ty,tw,thIs 4 variables related to the pixel coordinates and height and width of the central point of the prediction frame; σ represents a Sigmold function; cx,CyRepresenting the offset of the grid to which the bounding box belongs relative to the upper left corner of the picture; bx,byRepresenting the center point pixel coordinates of the final prediction frame; p is a radical ofw,phRepresenting the width and height of the predefined anchor frame; bw,bhRepresenting the width and height of the prediction box; i isobjRepresents the predicted score of the bounding box using logistic regression, when the overlap of the bounding box with the ground route is maximum, Iobj-1, otherwise Iobj0; IOU is the intersection ratio of the bounding box and the ground route; p is a radical of1,p2,…,piAnd the score representing that the predicted target belongs to the ith class in all the classes is obtained by a Sigmoid function.
After rare earth mining area sedimentation tank identification and positioning are carried out through a YOLOv3 model, a target detection evaluation index needs to be determined, wherein accuracy rate P and recall rate R are evaluation indexes commonly used in classification problems, P is the ratio of the number of a certain class correctly identified in a test sample to the actual number of the class in the sample, R is the ratio of the number of the certain class correctly identified to the total number of the class predicted in the sample, namely
Figure BDA0002954011700000121
Figure BDA0002954011700000122
In the formula: TP is the number of correctly classified positive classes, i.e. the number of samples that are actually positive classes and classified as positive classes by the classifier; FP is the number of samples which are wrongly classified into positive classes, namely the number of samples which are actually negative classes but are classified into the positive classes by the classifier; FN is the number of samples which are wrongly classified into negative classes, namely the number of samples which are actually positive classes but are classified into the negative classes by the classifier; TN is the number of samples that are correctly classified as negative classes, i.e., actually negative classes and classified as negative by the classifier.
In order to quantify the recognition and segmentation performance of the algorithm model adopted by the invention on the remote sensing image of the rare earth mining area, a confusion matrix shown in a table 1 is set.
TABLE 1 prediction class confusion matrix
Table 1 Predictive category confusion matrix
Figure BDA0002954011700000123
Figure BDA0002954011700000131
R is a measure of model coverage in object detection, and P and R are two indexes which are difficult to consider in deep learning object recognition, and are both relationships of the length of the model coverage and the length of the model coverage. The present invention adopts the average Accuracy (AP) for a single category in the image recognition accuracy evaluation. The calculation method comprises the following steps: assuming there are M positive classes in the N samples, then M R's are obtained
Figure BDA0002954011700000132
And for each R, calculating the corresponding maximum P, and then averaging the M P to obtain the final AP value, wherein the AP measures the performance index of the trained model on each class, and the higher the AP value is, the higher the accuracy is. The mean average precision (mAP) is adopted for all categories, the mAP measures the performance index of the trained model on all categories, and the calculation formula is
Figure BDA0002954011700000133
Figure BDA0002954011700000134
In the formula: m is the number of positive classes in the sample, and P (R) is the maximum accuracy rate corresponding to R; q is the number of categories; ap (q) is the average accuracy of the corresponding category.
In step S1, the remote sensing image needs to be offset-divided, specifically as follows:
the satellite image is usually large in scale, and most of the coverage area exceeds 200km2The area of the sedimentation tank of the rare earth mining area in the image is 20 to 1500m2. If the image is directly input into the model for training, on one hand, the target is excessively compressed and cannot be identified, and on the other hand, a large amount of video memory is occupied, and the operation speed is influenced, so that the image needs to be segmented into images with smaller sizes in the image training and identification processes and then is input into the model for trainingAnd detecting. In the image segmentation process, a certain target may be segmented into two or more sub-images, as shown in fig. 10, a plurality of detection frames may appear on the same target, or the target may be difficult to identify after being segmented, thereby causing a problem of missing detection. To address this problem, the following solutions are proposed: 1) first, an original image is divided into 320 × 320 pixel sizes, and then the original image is divided into 320 × 320 pixel sizes by shifting 160 pixel (divided into half of the width of a small image) lengths in the X-axis direction, the Y-axis direction, and the X-axis and the Y-axis (the above 4 division methods are hereinafter collectively referred to as a division method 1, a division method 2, a division method 3, and a division method 4), as shown in fig. 11: in fig. 11, the image 4 is divided, with solid lines representing dividing lines and dashed lines representing non-offset dividing lines: no offset split (top left); shift the split to the Y axis (top right); shift the segmentation to the X axis (bottom left); while shifting the split to the X and Y axes (bottom right). The green squares represent small images that have not been segmented by an offset, and the yellow squares represent small images that have been obtained using a different segmentation.
2) Inspired by the IOU (intersection ratio, which is usually used to measure the overlap of the prediction box and the real box), a new index iosin (i.e. the ratio of the intersection area of two prediction boxes to the area of the prediction box with smaller area) is constructed to determine whether the two prediction boxes come from the same target. As shown in fig. 12, it is assumed that the image is a partial image, the green translucent background represents a small image obtained by the segmentation method 1, and the yellow translucent background represents a small image obtained by the segmentation method 4. As can be seen from the figure, the sedimentation tank at the upper right corner of the image under the green background is segmented into 4 small images under the segmentation method 1. A prediction box identified as yellow in the green background image. And is recognized as a red prediction box in the yellow background image as a whole. The sediment pool may be identified as 2 or more prediction blocks in the image of segmentation method 1. Therefore, the incomplete prediction frame needs to be replaced by the complete prediction frame, and the method adopted by the invention is as follows: firstly, a threshold value alpha is appointed, the IOMIN indexes of two prediction boxes are calculated, if the IOMIN is larger than alpha, the prediction box with the smaller area is deleted, and the prediction box with the larger area is reserved. For each small image obtained by the segmentation method 1, 8 small images adjacent to the small image can be obtained by using other three segmentation methods. And respectively calculating the IOMIN indexes of any two prediction frames in the first image and the second image in every two adjacent images in the 9 images until all the prediction frames participate in calculation, and not participating in calculation when no target is detected in the images. Finally, the unit uses a sliding window mechanism to calculate the IOMIN indexes of any two prediction frames in all adjacent images in the whole image by taking the small images obtained by the segmentation method 1 as the basis, and eliminates the prediction frames of incomplete targets according to the IOMIN threshold, as shown in FIG. 13. The method can be used for realizing complete identification of the rare earth mining area sedimentation tank.
Figure BDA0002954011700000151
In the formula: b1 and b2 represent two prediction boxes respectively; sI(b1, b2) represents the area where the two boxes intersect; sMIN(b1, b2) represents the area of the smaller of the two boxes.
In order to facilitate the rare earth management department to know the distribution condition of the rare earth mining area sedimentation tank in time and carry out efficient processing work, plane coordinate information in a model prediction result needs to be given and marked on a remote sensing image in a point form. The output result of the model contains the pixel position information of the prediction bounding box, namely the pixel coordinate information relative to the upper left corner of the image, and the result expresses the position of the prediction bounding box by 4 numerical values, namely the pixel coordinate values (x) of the upper left corner and the lower right corner respectivelymin、ymin、xmax、ymax). Therefore, the pixel coordinates of the central point of the prediction frame need to be converted into plane point coordinates and then derived into a Shape layer. The coordinate system selected is WGS _1984_ UTM _ zone _ 50N. The coordinate conversion process is as follows.
The XY coordinate system in fig. 14 is a plane coordinate system. The xy coordinate system is a pixel coordinate system, the large rectangular box represents an image, and the small rectangular box represents a certain predicted bounding box in the image. In conjunction with the image acquisition and segmentation process, canTo know the upper left corner (X) of each image01,Y01) Then according to (X)01,Y01) Calculating the plane coordinate (X) of the central point of each predicted bounding boxi,Yi). The results of the study area sedimentation basin positioning are shown in fig. 15.
Figure BDA0002954011700000152
Figure BDA0002954011700000161
In the formula: i represents the ith prediction frame in the image; x is the number ofi、yiPixel coordinates representing a center point of the prediction frame; xi、 YiA plane coordinate value representing the center point of the ith prediction frame; Δ x and Δ y represent the spatial resolution (unit: m) in the horizontal direction and the vertical direction of the remote sensing image, respectively.
The spectral characteristics of the water body in the remote sensing image are similar to those of dark buildings, buildings and mountain shadows. There occurs a case where dark buildings, buildings and mountain shadows are mistakenly classified as sedimentation basins in the process of identifying the sedimentation basins in the rare earth mining area, as shown in fig. 16. In the research of water extraction of remote sensing images, indexes such as NDWI and MNDWI are commonly used for extracting water, but the water index cannot stably distinguish water from shadows. Experiments show that the water body, the buildings and the shadows can be effectively distinguished by using a random forest classification method. The invention divides the positioning points of the sedimentation tank into 3 types: sedimentation, dark buildings, buildings and mountain shadows. Firstly, constructing a feature set by using spectral features (reflectivity of 4 bands of red, green, blue and near infrared of Pleiades images), a water body index (NDWI) and a vegetation index (NDVI) of remote sensing images; secondly, randomly selecting 387 sample points of various positioning points in the range of the research area, wherein the number of the training samples is 270, and the number of the verification samples is 117 as shown in table 2; and finally, carrying out random forest classification on the positioning points, and verifying the classification result by combining with the verification sample points.
TABLE 2 number of samples for various types of anchor points
Figure BDA0002954011700000162
As can be seen from the confusion matrix (Table 3) of the classification results, the overall accuracy of the classification results reaches 92.31%, and the Kappa coefficient is 0.86. The overall classification result is better, and sedimentation tanks, dark buildings, buildings and mountain shadows can be well distinguished. The classification results are shown in fig. 17.
TABLE 3 confusion matrix
Figure BDA0002954011700000171
Although specific embodiments of the present invention have been described above, it will be appreciated by those skilled in the art that these are merely examples and that various changes or modifications may be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined only by the appended claims.

Claims (9)

1. A rare earth mining high-resolution image identification and positioning method is characterized by comprising the following steps,
step S1: remote sensing image data is obtained and preprocessed, radiation correction, geometric correction and image fusion preprocessing are carried out after the remote sensing image data is obtained, then the remote sensing image data is exported to be an RGB three-channel image, finally the image is cut, and parts, which do not contain detection targets, in the image are removed;
step S2: building a YOLOv3 model, firstly determining a network structure of a YOLOv3 algorithm, and then determining a loss function in a YOLOv3 algorithm, wherein the loss function comprises a first part of target positioning loss, a second part of target confidence loss and a last part of target classification loss;
step S3: adjusting a YOLOv3 algorithm, and replacing an original target positioning Loss function by using CIOU Loss when a prediction frame and a real frame are not intersected; embedding CBAM in Darknet-53 of the YOLOv3 algorithm network structure;
step S4: and outputting a result by the model, and marking pixel position information containing the prediction bounding box, namely pixel coordinate information relative to the upper left corner of the image on the remote sensing image in a point form.
2. The rare earth mining high-resolution image identification and positioning method according to claim 1, characterized in that: and after the image in the step S1 is cut, the data set is expanded by adopting a turning, rotating, mirror image, brightness, chroma and Gaussian blur data enhancement method.
3. The method for recognizing and locating rare earth mining high-resolution images according to claim 1, wherein the determination of the Yolov3 algorithm network structure in the step S2 includes a Darknet-53 feature extraction network part and a multi-scale detection part.
4. The rare earth mining high-resolution image identification and positioning method according to claim 3, characterized in that: and the multi-scale detection part performs feature fusion on the feature map of each scale and the feature map of the previous scale subjected to twice upsampling through the structure of the feature pyramid network adopted by the YOLOv3 algorithm.
5. The rare earth mining high-resolution image identification and positioning method according to claim 1, characterized in that: in the step S3, the target positioning loss takes a mean square error as an objective function of a loss function, and specifically includes first calculating a ratio of an intersection area and a union area of a prediction frame and a real frame generated by a network, and obtaining an intersection and a union ratio of the two frames; then screening the prediction frames through a preset IOU threshold value, and screening out the prediction frames with the IOU larger than the threshold value; and finally calculating the corresponding target positioning loss.
6. The rare earth mining high-resolution image identification and positioning method according to claim 5, characterized in that: and when the predicted frame and the real frame are not intersected and the IOU values of the two frames are 0, replacing the original target positioning Loss function by using the CIOU Loss.
7. The method for recognizing and positioning rare earth mining high-resolution images according to claim 1, wherein the cropped images in step S1 need to be segmented into a plurality of images with smaller sizes in the image training and recognition process, and then input into a model for training and detection.
8. The method of claim 7, wherein if the sedimentation basin is identified as 2 or more prediction frames in the segmented image, the incomplete prediction frame needs to be replaced by the complete prediction frame, and the method specifically comprises:
firstly, a threshold value alpha is appointed, the IOMIN indexes of two prediction boxes are calculated, if the IOMIN is larger than alpha, the prediction box with the smaller area is deleted, and the prediction box with the larger area is reserved.
9. The rare earth mining high-resolution image identification and positioning method according to claim 1, characterized in that: and a random forest classification method is adopted in the positioning process of the rare earth mining area sedimentation tank of the remote sensing image.
CN202110219415.8A 2021-02-26 2021-02-26 Rare earth mining high-resolution image identification and positioning method Pending CN113033315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110219415.8A CN113033315A (en) 2021-02-26 2021-02-26 Rare earth mining high-resolution image identification and positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110219415.8A CN113033315A (en) 2021-02-26 2021-02-26 Rare earth mining high-resolution image identification and positioning method

Publications (1)

Publication Number Publication Date
CN113033315A true CN113033315A (en) 2021-06-25

Family

ID=76462403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110219415.8A Pending CN113033315A (en) 2021-02-26 2021-02-26 Rare earth mining high-resolution image identification and positioning method

Country Status (1)

Country Link
CN (1) CN113033315A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020881A (en) * 2022-01-10 2022-02-08 珠海金智维信息科技有限公司 Topic positioning method and system
CN114298187A (en) * 2021-12-20 2022-04-08 西南交通大学 Target detection algorithm integrating improved attention mechanism
CN115861328A (en) * 2023-03-01 2023-03-28 中国科学院空天信息创新研究院 Grave detection method and device and electronic equipment
CN116246175A (en) * 2023-05-05 2023-06-09 西昌学院 Land utilization information generation method, electronic device, and computer-readable medium
CN116664573A (en) * 2023-07-31 2023-08-29 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147778A (en) * 2019-05-27 2019-08-20 江西理工大学 Rare Earth Mine exploits recognition methods, device, equipment and storage medium
CN111666836A (en) * 2020-05-22 2020-09-15 北京工业大学 High-resolution remote sensing image target detection method of M-F-Y type lightweight convolutional neural network
CN112287788A (en) * 2020-10-20 2021-01-29 杭州电子科技大学 Pedestrian detection method based on improved YOLOv3 and improved NMS

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147778A (en) * 2019-05-27 2019-08-20 江西理工大学 Rare Earth Mine exploits recognition methods, device, equipment and storage medium
CN111666836A (en) * 2020-05-22 2020-09-15 北京工业大学 High-resolution remote sensing image target detection method of M-F-Y type lightweight convolutional neural network
CN112287788A (en) * 2020-10-20 2021-01-29 杭州电子科技大学 Pedestrian detection method based on improved YOLOv3 and improved NMS

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
严开忠 等: ""基于改进YOLOv3的机载平台目标检测"", 《电光与控制》 *
徐信罗 等: ""基于Faster R-CNN的松材线虫病受害木识别与定位"", 《农业机械学报》 *
李恒凯、肖松松、***、柯江晨1: ""基于Mask R-CNN的高分遥感影像的稀土开采识别方法"", 《中国矿业大学学报》 *
王生霄、侯兴松、黑夏萌: ""嵌入CBAM结构的改进YOLOV3超宽带雷达生命信号检测算法"", 《国外电子测量技术》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298187A (en) * 2021-12-20 2022-04-08 西南交通大学 Target detection algorithm integrating improved attention mechanism
CN114298187B (en) * 2021-12-20 2023-08-29 西南交通大学 Target detection method integrating improved attention mechanism
CN114020881A (en) * 2022-01-10 2022-02-08 珠海金智维信息科技有限公司 Topic positioning method and system
CN114020881B (en) * 2022-01-10 2022-05-27 珠海金智维信息科技有限公司 Topic positioning method and system
CN115861328A (en) * 2023-03-01 2023-03-28 中国科学院空天信息创新研究院 Grave detection method and device and electronic equipment
CN116246175A (en) * 2023-05-05 2023-06-09 西昌学院 Land utilization information generation method, electronic device, and computer-readable medium
CN116664573A (en) * 2023-07-31 2023-08-29 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX
CN116664573B (en) * 2023-07-31 2024-02-09 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX

Similar Documents

Publication Publication Date Title
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN110175576B (en) Driving vehicle visual detection method combining laser point cloud data
CN107871119B (en) Target detection method based on target space knowledge and two-stage prediction learning
CN113033315A (en) Rare earth mining high-resolution image identification and positioning method
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN112818903A (en) Small sample remote sensing image target detection method based on meta-learning and cooperative attention
Wang et al. Photovoltaic panel extraction from very high-resolution aerial imagery using region–line primitive association analysis and template matching
CN113378686B (en) Two-stage remote sensing target detection method based on target center point estimation
CN112347895A (en) Ship remote sensing target detection method based on boundary optimization neural network
CN112766184B (en) Remote sensing target detection method based on multi-level feature selection convolutional neural network
CN113435282B (en) Unmanned aerial vehicle image ear recognition method based on deep learning
CN113569724B (en) Road extraction method and system based on attention mechanism and dilation convolution
CN107992856A (en) High score remote sensing building effects detection method under City scenarios
CN111008994A (en) Moving target real-time detection and tracking system and method based on MPSoC
CN111563408A (en) High-resolution image landslide automatic detection method with multi-level perception characteristics and progressive self-learning
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
Laupheimer et al. The importance of radiometric feature quality for semantic mesh segmentation
CN113902792A (en) Building height detection method and system based on improved RetinaNet network and electronic equipment
CN112924037A (en) Infrared body temperature detection system and detection method based on image registration
CN113052110A (en) Three-dimensional interest point extraction method based on multi-view projection and deep learning
CN117079125A (en) Kiwi fruit pollination flower identification method based on improved YOLOv5
CN111476167A (en) student-T distribution assistance-based one-stage direction remote sensing image target detection method
CN110889418A (en) Gas contour identification method
CN106909936B (en) Vehicle detection method based on double-vehicle deformable component model
CN115984712A (en) Multi-scale feature-based remote sensing image small target detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210625