CN109583425B

CN109583425B - Remote sensing image ship integrated recognition method based on deep learning

Info

Publication number: CN109583425B
Application number: CN201811573380.2A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2023-05-02
Anticipated expiration: 2038-12-21
Also published as: CN109583425A

Abstract

The invention discloses a remote sensing image ship integrated recognition method based on deep learning, which comprises image classification, target detection and image segmentation. Compared with the prior art, the method utilizes the modern artificial intelligence deep learning model to combine with the traditional image processing method to realize the detection and segmentation of the ship with the target remote sensing image; the remote sensing image segmentation method based on deep learning can accurately identify the ship in the sea area, is suitable for various processing environments, has good anti-interference capability to complex environments, and can accurately segment the ship after detection and identification.

Description

Remote sensing image ship integrated recognition method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision, and relates to the specific fields of target detection and image segmentation, in particular to a remote sensing image ship integrated recognition method based on deep learning.

Background

Along with the development of remote sensing information, the processing of remote sensing images gradually occupies important positions in the image field, and detection algorithms using the remote sensing images as platforms are also layered endlessly. Aiming at the task of detecting and identifying the ship target by remote sensing images, most detection algorithms at present utilize the traditional thought of extracting features, and detect the ship target by preprocessing and enhancing the images.

Because of the specificity of the remote sensing image, compared with the common image, the remote sensing image is easily influenced by the condition factors such as illumination, weather, sea conditions or imaging time, and the cloud layer and sea wave information can also cause interference on the quality of the image. In addition, due to the different satellite wave bands, the resolution of the images is various, wherein the features of the high-resolution ship target shape, texture and the like are rich, and the detail features of the low-resolution ship images are blurred. The unique diversity of remote sensing images is therefore a significant challenge to conventional approaches.

Generally, conventional methods, such as support vector machine, dynamic threshold, adaptive clustering, etc., are used to classify the feature information of the picture so as to determine and detect the actual target. However, the traditional feature processing has higher requirements on data samples, has stronger sensitivity on training samples, and has an unsatisfactory detection effect when the remote sensing image features are complex and various.

For target detection tasks, the conventional method is a large class of detection algorithms based on combination of feature description and machine learning, and generally comprises two tasks of target candidate region and candidate region identification. The traditional algorithm utilizes the difference between the ship target and the background sea area to extract the candidate area, and then utilizes the classifier to classify the identified ship, so that the method is direct and simple. However, this method is generally suitable for situations where the sea surface is relatively simple and the characteristics of the vessel are significant, and when the sea surface is complex, the effect achieved using conventional detection methods may be poor.

For image segmentation tasks, the traditional modes include a pyramid threshold method, a mean shift method, an energy field-based segmentation method and the like. However, the objects processed by the traditional segmentation method are generally ordinary images, and for remote sensing images with extremely complex environments, the segmentation algorithms are easy to cause segmentation confusion, so that the foreground and the background cannot be segmented correctly, the type of the segmented ship cannot be identified, and the ship target cannot be segmented well naturally.

In summary, the conventional target detection and image segmentation method is unsatisfactory in detection and segmentation effects for remote sensing images with abundant information and complex conditions, and lacks of adaptability to processing environments.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a remote sensing image ship integrated recognition method based on deep learning, which utilizes a modern artificial intelligent deep learning model to combine with a traditional image processing method to realize detection and segmentation of a target remote sensing image ship; the remote sensing image segmentation method based on deep learning can accurately identify the ship in the sea area, is suitable for various processing environments, has good anti-interference capability to complex environments, and can accurately segment the ship after detection and identification.

In order to achieve the above purpose, the invention is implemented according to the following technical scheme:

a remote sensing image ship integrated identification method based on deep learning comprises the following steps:

s1, classifying images: collecting a remote sensing image dataset, training by adopting a neural network with a ResNet-34 structure, and judging whether a ship exists in the processed remote sensing visual field area;

s2, target detection: inputting the remote sensing image of the ship which is screened out by the S1 into a designed neural network by using the ResNet-101 neural network framework as a characteristic extraction network, extracting a characteristic layer of the remote sensing image of the ship, and further obtaining the position information of the remote sensing image;

s3, image segmentation: training the remote sensing images of the ships with the screening of the S1 through a neural network of a U-Net framework to obtain a characteristic diagram of the remote sensing images of the ships with the screening; and performing transposition convolution operation on the characteristic information graphs with different scales, gradually improving the resolution of the characteristic information, enabling the position information in the characteristic graphs to be specifically displayed, and finally obtaining the segmentation information of the ship.

Further, the specific steps of the S1 are as follows:

s11, collecting remote sensing images containing global common sea area information to form a remote sensing image data set, wherein the remote sensing image data set contains 30 ten thousand remote sensing images, the resolution of each remote sensing image is 768pxX px, and the format is an RGB three-channel color image; marking the position and the outline of the ship in all the remote sensing images and the position information of the pixel points occupied in the remote sensing images, and manufacturing a ship tag set of the remote sensing images, wherein the tag image is a resolution 768pxX768px, a single-channel gray map, and the file format is a jpg format image;

S12, image enhancement: carrying out image enhancement on a remote sensing image with a ship by adopting one or more modes of image horizontal/vertical overturning, image random rotation by 0-30 degrees, image brightness random change, image contrast random change and image distortion;

s13, training by a cross validation method: taking the remote sensing image with the ship after image enhancement as an input image, training the remote sensing image data by adopting a neural network with a ResNet-34 structure, and training by adopting a 5-fold cross-validation mode in the input process;

s14, TTA image classification reasoning: and carrying out TTA image classification reasoning on the input image, carrying out horizontal overturn and vertical overturn on the input image respectively, carrying out TTA image classification reasoning on the input image respectively, fusing the reasoning result with the reasoning result of the original input image, then testing in a trained neural network with a ResNet-34 structure, and judging whether a ship exists in the visual field of the remote sensing image.

Further, the specific steps of the step S2 are as follows:

s21, convolution feature extraction: extracting features of the remote sensing image with the ship by utilizing a ResNet-101 network structure, and respectively deriving feature information of the remote sensing image with the ship at feature layers output at five different stages of ResNet-101; the fusion extraction of the feature information of the remote sensing images of the ship with different scales is realized through the feature pyramid network, and the extracted feature information is refined and integrated to obtain a final feature layer;

S22, suggesting feature areas: performing anchor generation in all feature layers with different layers and different dimensions by using the feature layers extracted in the step S21, performing square frame setting on all anchors, and generating a series of suggested areas on the feature map after the length and the proportion of the set frame setting pixels are determined;

s23, characteristic region adjustment: s22, generating a series of suggested areas, selecting and adjusting to finally obtain a square frame which is proper in size and just contains the measuring ship, setting a loss function, continuously optimizing the loss function in the algorithm execution process to dynamically adjust the position of the square frame, and determining that the square frame contains the classification of the ship so as to determine the final position of the ship;

s24, obtaining classification information: on the basis of S23, classification information of the ship is obtained by reducing classification loss, and in a program, the classification information is one-hot unique code information and is converted into actual ship classification information, so that a remote sensing image is classified into a ship target and a background;

s25, boundary box adjustment: the boundary frame is adjusted to a part which needs to be automatically executed when the detection error is updated in a gradient way, and the coordinate position of the boundary frame containing the ship is adjusted when the gradient update is carried out, so that the boundary frame accurately frames the detected ship;

S26, optimizing detection loss: optimizing the loss function of the neural network of the S23 by adopting a gradient descent method with momentum, so as to obtain a rectangular calibration frame which is finally most suitable for the ship;

s27, obtaining a segmentation mask and a boundary box: and constructing a mask branch network, inputting the classification framing region and the characteristic information extracted in S21-S26, and generating a corresponding mask to obtain an image after target detection.

Further, the specific step of S3 is as follows:

s31, multi-scale feature extraction: the remote sensing image of the ship with the presence screened by the S1 is segmented by adopting a network with a U-Net structure, wherein an encoder of the U-Net is a neural network with a ResNet-34 structure, and the remote sensing image of the ship with the presence screened by the S1 is input into an encoder end to output characteristic information with different scales;

s32, deconvolution operation feature diagram: in the neural network decoding stage of the ResNet-34 structure, performing dimension-lifting operation on the low-resolution feature map through transposition convolution operation, so that the low-resolution feature map is combined with the feature information of the decoding section to obtain a segmented image;

s33, integrating fusion characteristic information: fusing the image after target detection obtained in the step S27 with the segmented image obtained in the step S32 to obtain more accurate segmentation information;

S34, a Markov random field diffusion division point: correcting the segmentation information obtained in the step S33 by using a Markov random field algorithm, and selecting seed points of the segmentation information for diffusion so that the segmentation information is more complete and accurate;

s35, performing an open operation to eliminate an edge overlapping effect: according to the morphological principle, sequentially performing corrosion and expansion operations on the result graph obtained in the step S34, performing open operation on the image, effectively relieving the overlapping effect of the ship segmentation information, and improving the segmentation effect;

s36, outputting a final result diagram.

Compared with the prior art, the invention has the following advantages:

1) The invention adopts the advanced deep learning technology in the artificial intelligence idea to construct the neural network to classify, detect and divide the target; the designed network has better robustness after proper training, can accurately judge whether targets exist in the remote sensing image and locate and divide the targets, which is not possessed by the traditional technology;

2) Compared with other traditional detection technologies, the anti-interference capability of the environment is improved by one level, and after training data are fully learned by the deep neural network, even if the condition of a remote sensing image to be detected is complex and severe, the algorithm can still extract the position of a target to be detected from a complex background and segment the target;

3) Compared with other prior art adopting traditional algorithm, the algorithm of the invention can measure a plurality of targets in the graph at the same time, adopts the integration technology in machine learning, and improves the robustness of the whole method. And further processing the result graph in a mode of combining with the traditional method in the subsequent stage of algorithm processing to achieve an accurate segmentation effect.

Drawings

FIG. 1 is a general flow chart of an embodiment of the present invention.

Fig. 2 illustrates remote sensing image training data and an image enhancement technique according to an embodiment of the present invention.

Fig. 3 shows the neural network structure adopted at different stages in the embodiment of the present invention.

Fig. 4 is the resulting image information of the second and third portions of an embodiment of the invention.

Fig. 5 is a diagram showing the image information of the fourth partial image segmentation according to the embodiment of the present invention.

Fig. 6 shows the image post-processing steps and the final resulting image information according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. The specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.

Referring to fig. 1, the first section describes the data set information required for training and testing by the method, the data set adopted by the method is a public remote sensing image data set, the total number of the image data sets is 300,000, the resolution is 768 and pxX px, the number of channels of the image is three-channel color RGB, and the file format is jpg format. The algorithm training process also needs a label set containing segmentation information, wherein the number of the label sets is the same as that of the data sets, and the label sets are in one-to-one correspondence. As shown in fig. 2 (a), which shows a portion of the remote sensing image information used in the present invention, the tag set of the ship is fused with the original image using a semi-transparent mask, and it can be seen from the figure that the bright stripe mask range is the actual segmented region of the ship.

In practical training, the image dataset needs to be divided into three parts, a training set, a validation set and a test set, respectively. The training set inputs the neural network in the training process, the verification set periodically verifies the rationality of the method in the training process, and the test set evaluates the performance of the method when the method is completed.

Among the total 300,000 data, 200,000 images were selected as the training set data, and 30,000 images were selected as the validation set data. The rest 70,000 images are used as test set data, the whole data distribution ratio is 20:3:7, and the deep learning training standard is met. The distribution proportion of the label set and the data set is the same, wherein the label image is a resolution 768pxX px, the single-channel gray map is a jpg format image. The tag map contains segmentation information corresponding to the dataset image, wherein the background has a pixel value of 0 and the ship segmentation entity has a pixel value of 1.

The second section describes a specific process of classifying the remote sensing images using a neural network of the Resnet-34 classical structure. In connection with the overall flow in fig. 1, the main execution steps of the second part are: image input and image enhancement, cross validation training and TTA image classification reasoning.

(1) Image input and image enhancement

In this step, the present invention will train the processed remote sensing image. Before training by using the neural network, the image to be trained is required to be processed by an image enhancement technology, so that the difficulty of learning the image characteristics by the neural network can be increased, and the algorithm can more deeply mine the characteristic information of the remote sensing image to achieve an accurate classification effect.

Aiming at the characteristics of the remote sensing ship image dataset, the invention adopts the following image enhancement modes: image horizontal/vertical overturn, image random rotation 0-30 degrees, image brightness random change, image contrast random change and image distortion. As in fig. 2 (b), the effect of different enhancements to a set of examples containing images of a ship in the telemetry dataset of fig. 2 (a) is illustrated.

(2) Neural network architecture for use

The main operation mode in the neural network is convolution operation, and is similar to the operation mode of a traditional filter, and in the process of analyzing and training an input image, the neural network gradually extracts the characteristics of a remote sensing image from shallow to deep to analyze. The basic convolution operation is as follows:

/>

Wherein x is a convolution input image, h is a convolution kernel, y is a convolution result, convolution operation is a basic calculation method in image processing based on deep learning, and the effect of extracting the characteristics of the input image is realized by updating parameters of the convolution kernel.

The selection of a suitable neural network architecture has a significant relationship to the classification performance of the first stage. In the invention, the residual network ResNet-34 is selected as the backbone network of the classification network, and the design of the ResNet-34 residual structure can effectively extract the characteristics of the image. The invention modifies the network tail layer number to make the network architecture possess the ability to classify remote sensing images, as shown in fig. 3 (a), and shows the residual network ResNet-34 structure used.

In addition, in order to increase training accuracy and speed during training, aiming at the image classification task of the first stage, the invention firstly pretrains the ImageNet image classification data set before formal training. Therefore, a migration learning method is adopted, so that the designed network weight has high sensitivity to image classification in advance. After pre-training, weight information meeting the requirements and with high precision is obtained, and then the pre-training weight obtained in the last step is used in the formal training stage, so that the generalization capability of the network can be effectively improved.

The loss function employed by the ResNet-34 backbone network described above is cross entropy (Cross EntropyLoss) loss, which is used to calculate the difference between the result distribution and the label distribution, the larger the difference between the two distributions, the larger the loss value. The cross entropy loss formula is as follows:

where x is the result of the network calculation after zero-normalization of the sigmoid function, class is the tag set, and at this stage the tag set values are 1 and 0, respectively, where 1 represents the presence of a ship in the corresponding image and 0 represents the absence of a ship in the corresponding image.

(3) Training by cross-validation

After the data is prepared and the neural network is designed, the data of the training set is input into the network for training. In the input process, a 5-fold cross validation mode is adopted, firstly, image set data in a training set are divided into 5 mutually non-intersecting sets, each set has 40,000 images, and the total number of the 5 sets has 200,000 images.

And selecting one set from the 5 sets as a verification set during training, and training the other four sets as training sets. A total of 5 batches were trained, 20 rounds per batch, with different validation sets selected for each batch. The information in the data set can be fully utilized in a cross-validation mode, so that the neural network learns more characteristics of the remote sensing image, and the phenomenon of over-fitting is effectively avoided.

The batch adopted in the training process is 32, the optimization function is an adam optimizer, wherein the momentum parameters are 0.9 and 0.99, and the initial learning rate is 0.01. The learning rate is purposefully attenuated in each round until the last round has a learning rate attenuation of 0.00001.

(4) TTA image classification reasoning

After training the ship data set, the network designed by the invention can well classify the remote sensing images and divide the image information containing the ship and not containing the ship. In the reasoning stage, a TTA reasoning enhancement method is adopted, the input images are respectively horizontally overturned and vertically overturned and then respectively reasoned, the reasoning results are fused with the reasoning results of the original input images, the accuracy of the test on the test set can reach 98.6%, and the confusion matrix for classifying the test set is shown in fig. 4 (a). The trained algorithm network can accurately judge whether a ship exists in the visual field of the remote sensing image. And takes the output information as the subsequent classification information.

The third section describes a specific process of target detection by using a neural network with a ResNet-101 structure, and the implementation steps of this section are mainly as follows, in combination with the general flow in FIG. 1: convolution feature extraction, feature region suggestion, feature region adjustment, obtaining classification information, bounding box adjustment, optimizing detection loss, obtaining segmentation masks, and bounding boxes.

(1) Feature extraction network

In the target detection stage, the present invention uses convolution operation to perform sliding operation in the image to extract the content information and edge information of the image, resNet-101 is an extremely deep neural network, and as shown in FIG. 3 (b), the basic structure of ResNet-101 is shown, and the network architecture of the present invention comprises 101 network layers. ResNet-101 is the deepest neural network structure at present, realizes extremely dark network through constructing the residual layer, because this neural network possess very dark degree of depth, so neural network can more effectively draw the degree of depth characteristic information of image to reach the purpose of target detection.

(2) Feature extraction using neural networks

The invention firstly utilizes the ResNet-101 neural network to extract the characteristics of the image input by the invention, the invention extracts the 5 stage output layers of the ResNet-101 to represent the current image characteristics, the 5 stage output characteristic layers are 128, 256, 512 and 512 respectively, and the deeper the characteristic extraction is, the more the characteristic layers are.

After the 5 feature layers are extracted, the feature information layers are further extracted by the feature pyramid network for the 5 feature layers, the feature pyramid network can extract the features in the image more finely, 4 network layers containing the fine features can be obtained by sequentially carrying out up-sampling operation and feature layer merging operation on the feature layers output by the previous 5 stages, and then the final feature layers of 256 layers are obtained by re-convoluting the 4 network layers with a convolution layer with a kernel size of 3.

These feature layers will be input as features for region suggestions and mask overlays in later steps.

(3) Feature region suggestion

The characteristic field proposal is an important step in the whole target detection link, and the input of an image containing the vessels to be measured according to the invention, how to locate the positions of the vessels is considered first in the algorithm. Firstly, the invention utilizes the characteristic layers extracted before to perform anchor generation in all the characteristic layers with different dimensions and different layers.

Anchor is a collection channel generated in the characteristic image, the interval is generally set to be 2, the number generated in each layer of characteristic image is 2000, the generated number is super-parameter, the generated number can be adjusted according to the requirement, the detection accuracy can be improved by improving the generated number, but the execution speed of an algorithm can be correspondingly reduced. The process of generating the anchor is similar to the convolution process, the extracted feature layers are scanned layer by layer according to set intervals, and finally the invention obtains feature graphs with anchors.

After all feature graphs extracted in the previous step are generated by the anchors, square frame framing is needed to be performed on the anchors, the anchor proportion of each feature layer is set, and in order to ensure that ships with different sizes can be framed by square frames formed by the anchors, the length pixel of a framed area generated by the anchors is generally set as follows: 32. 64, 128, 256, 512. In addition, since the ship may have a rectangular shape, a square shape, or the like, the ratio of the framed areas is set to 0.5, 1, 2. After the frame pixel length and scale are determined, a series of suggested regions may be generated on the feature map.

After the previous step, a lot of framing information is formed in the feature map. The suggested area in the feature map represents that the box centered on the pixel point is likely to contain the contained object, and the suggested area of the area needs to be adjusted in the following step so that the framed box of the suggested area is more accurate and exactly contains the ship to be measured according to the invention.

In addition, after generating a framed box with 4 coordinate points, the classification to which the box belongs needs to be generated, so as to determine which ship the box contains and classify the information of the ship.

(4) Feature area adjustment

After the previous step of obtaining the regional suggestion set in the feature map, the regional suggestion set needs to be selected and adjusted to obtain a block with proper size and containing the measurement ship, and a loss function set according to different purposes needs to be created for this purpose. The algorithm is performed to continuously optimize the loss function to dynamically adjust the position of the box, and the box contains the classification to which the ship belongs, so as to determine the final position of the ship of a certain type.

The loss function is divided into three parts, namely a classification error Lcls, a detection error Lbox and a mask segmentation error Lmask, wherein the classification error represents the negative number of belief values of whether the ship belongs to a specific ship, the loss function value is derived by comparing a classification result obtained by an algorithm with a label (manual labeling) comparison result in the training process of the neural network, then the loss function value is optimized by a specific optimization function, the loss is actively reduced through the reverse feedback operation of the neural network, and the weight information of the neural network is updated to enable the neural network to have the capability of detecting the ship classification.

Similarly, the detection error and the mask segmentation error are also optimized by comparing the result calculated by the algorithm with the actual tag result, except that the parameter of the detection error is 4 coordinate values of the ship, and the loss value is obtained by comparing the 4 coordinate values with the 4 coordinate values of the actual ship through MSE mean square error.

The following formula is an MSE mean square error expression form, and the loss value between the two can be obtained by comparing corresponding coordinate values:

mask segmentation error l _mask Then it is calculated according to the loss IoU (IntersectionoverUnion) criterion, ioU is a calculation formula for evaluating whether the image segmentation reaches the specified criterion, target represents the target mask coverage area, prediction represents the prediction mask coverage area:

when the prediction mask coverage area and the target mask coverage area completely overlap, the value of IoU is 1, which represents that the prediction completely meets the standard required by the invention, and when the prediction mask coverage area completely leaves the target mask coverage area, the value of IoU is 0, which represents that the prediction is completely wrong.

The final total loss L can be summarized as the following formula:

L＝Lcls+Lbox+Lmask

the total loss function is divided into three parts, namely a ship classification loss Lcls, a box frame definition loss Lbox and a mask loss Lmask, and the neural network can be trained through forward operation and backward feedback operation, so that the total loss is reduced, and finally trained neural network weights can be obtained.

(5) Obtaining classification information

In the algorithm, the invention obtains the classification information of the ship by reducing the classification loss, wherein the classification information is one-hot unique code information in a program and needs to be converted into actual ship classification information, and two types of ship classifications are respectively a ship target and a background in a remote sensing image.

(6) Bounding box adjustment

The boundary frame is adjusted to be a part which needs to be automatically executed when the gradient is updated, the loss of the correct coordinates is calculated, in addition, the offset values of four points of the coordinates are set, and the offset values are updated in the gradient updating process, so that the coordinate position of the boundary frame containing the ship is adjusted, and the boundary frame can accurately frame the detected ship.

(7) Optimizing detection loss

The optimization loss is the loss optimization process mentioned before, and the optimization method of the neural network optimization loss adopted by the invention is a gradient descent method with momentum, and the learning rate gradually decays along with the iteration times. The learning rate is super-parameter, different initial learning rates are required to be set for different image sets, and the default learning rate L _r 0.001, wherein the momentum parameter is 0.9 and the decay rate is 0.0001.

In the training process, the training set is trained and the verification set is verified independently. And observing the total loss of training and the loss curve of the verification set, and setting a cutoff number n, namely performing early termination operation on training under the condition that the current loss function has no obvious change after n cycles. Therefore, overfitting caused by transition training can be effectively avoided, and the loss function curve does not rise and fall.

After the final proposed area (Proposals) is obtained, there are many proposed areas around the actual ship, where a suitable proposed area needs to be found to determine the final position of the target, at which point the invention picks the most suitable proposed area of the target according to the Non-maximal suppression (Non-Maximum Suppression) algorithm.

The non-maximum suppression algorithm suppresses elements which are not maximum, and similar to the local maximum search function, a block set of proposed areas is formed around a certain ship, the proposed areas which are most suitable for framing the ship need to be selected by using the non-maximum suppression algorithm, the sliding windows are extracted with characteristics, and after the sliding windows are classified and identified by a classifier, each window can be classified and scored. Sliding windows can result in many windows being inclusive or mostly crossed with other windows. The NMS is then required to choose those windows in the neighborhood that have the highest score (highest probability of being an object of a certain class) and suppress those windows that have a low score.

In brief, the purpose of non-maximum suppression is to eliminate redundant blocks, find the best ship detection material, and divide the process into the following steps:

1. assuming that 6 candidate frames are provided, sorting is performed according to the classification probability of the classifier class, and the probabilities of the classifier class belonging to vehicles from small to large are A, B, C, D, E, F respectively.

2. Starting from the maximum probability rectangular frame F, whether the overlapping degree IOU of A-E and F is larger than a certain set threshold value is respectively judged.

3. Assuming B, D overlap with F exceeds a threshold, B, D is thrown away; and marks the first rectangular box F, it is the first rectangular box F that the invention retains.

4. E with the highest probability is selected from the rest rectangular frames A, C, E, then the overlapping degree of E and A, C is judged, and the overlapping degree is larger than a certain threshold value and is thrown away; and the label E is the second rectangular box that remains with the present invention.

5. This process is repeated all the time, finding all the rectangular boxes that were once reserved.

After performing these several steps, the invention results in a rectangular calibration frame that is ultimately most suitable for the vessel.

(8) Acquiring mask partitions and bounding boxes

In the previous step, the present invention obtains position information around the detected ship, i.e., a rectangular frame containing the ship, and in this step, the ship in the most target is required to be covered with a mask, so that the mask pixels cover the entire surface of the ship as much as possible, thereby obtaining profile information of the ship.

Reviewing the architecture of the network layer mentioned in the first section, the invention performs feature extraction on the input image through a ResNet-101 network so as to obtain feature information with different dimensions and different sizes of the input image. Due to invariance of the convolution operation and the pooling operation, the characteristic information contains the position of the ship in the image.

The invention constructs a mask branch network, inputs the classified framing region and the characteristic information extracted in the previous step, and generates a corresponding mask, and the specific steps are as follows:

1. extracting ship information in different feature maps;

2. the size of the feature images is adjusted, and feature information of different feature images is fused through scaling;

3. scaling up the feature mask according to the actual size and the scaling of the feature map;

4. comparing the amplified mask with an actual mask, and calculating a loss value;

in the third step, because the mask pixels are floating point values, the soft mask ensures that the accuracy remains more detailed during the amplification process, and the resulting mask can better cover the ship to be measured, and each detection target corresponds to one mask message. As shown in fig. 4 (b), the segmentation result output by the third section, by comparing the actual mask on the left side, it can be found that the detection information on the right side is already more accurate and reliable.

The fourth part mainly performs the step of image segmentation, and in the last step, the present invention obtains mask information of the ship target by using the block position information through the target detection method. However, the mask information of the method cannot fully represent the segmentation information of the ship, so that the remote sensing ship image is segmented by adopting a network with a U-Net structure in the fourth part, wherein an encoder of the U-Net is a network with a ResNet-34 structure, and the decoder is redesigned according to the characteristics of the encoder, and finally the segmentation information of the image is output.

The fourth part has the following general steps: and (3) extracting multi-scale features, deconvoluting an operation feature map and integrating fusion feature information. And integrating the integrated fusion characteristic information with the output information in the third part to finally obtain more accurate segmentation information.

(1) Multi-scale feature extraction

The structure of the U-Net network is shown in FIG. 3 (c), in which the corresponding parts of the encoder and decoder are combined. The multi-scale feature extraction mode using the U-Net structure aims at fusing features of different scales with features of corresponding parts of a decoder at the encoder end. Thus, in the decoder stage, the neural network not only performs an up-scaling operation on the feature map from bottom to top, but also incorporates the feature vectors of the encoder stage. The network characteristic information is richer and more accurate.

The image data employed during the training phase has dimensions (3, 768, 768), with the number of channels of the image being 3 and the resolution size being 768. The dimensions of the output phase of the network are (1, 768, 768), and the loss functions adopted in the network are hybrid loss functions, as with the tag data sets, the Dice loss and the Focal loss, respectively. The loss value is reduced through an optimization algorithm, so that the network accurately segments the image, and the formula of the mixed loss is as follows:

M _loss ＝D _loss +t*F _loss

wherein M is _loss D is a mixed loss function _loss As a Dice loss function, F _loss As for the Focal loss function, t is the weight coefficient of the Focal loss, and the coefficient has the best experimental effect by 10 after verification.

D _loss Losses are typically applied as image segmentation tasks, guiding the optimizer to make adjustments to the network by comparing the output segmentation information with the actual label information. The formula is as follows:

wherein A and B represent the outputted segmentation information and the actual segmentation information respectively, when the pixel information of the two pieces of segmentation information are completely overlapped, the loss function is 0, and when the pixel information of the two pieces of segmentation information are not overlapped, the loss function is 1.

Focal loss is a variation of cross entropy loss, and by modifying the cross entropy loss, the method has good adaptability to the task of dividing data pixel imbalance, and the formula is as follows:

Where y and y' represent the actual pixel value and the network output pixel value, respectively, both in the range 0-1. When the two are identical, the loss function value is 0. The gamma parameter reduces the loss of the easily classified samples, and the balance factor alpha is used for balancing the uneven proportion of the positive and negative samples.

(2) Deconvolution operation feature map

In the network training process, the remote sensing image input encoder outputs characteristic information of different scales, and in the decoding stage, the transposed convolution operation is used for carrying out dimension increasing operation on the low-resolution characteristic image so as to combine the characteristic information of the decoding section. The dimension-lifting operation mode is transposition convolution, and the transposition convolution can learn dimension-lifting pixel information in the training process, so that the accuracy of the dimension-lifting information is ensured. The hotspot graph of the image segmentation is shown in fig. 5 (a), while fig. 5 (b) shows the segmentation result graph, wherein the left side is the original image, the middle is the segmented image, and the right-most side is the actual tag set.

(3) Integrating fusion feature information

The image segmentation information obtained in the fourth part and the third part is integrated, the segmentation information of the fourth part and the third part is subjected to simple weighting operation, the coefficient of the fourth part is 0.7, the coefficient of the third part is 0.3, and more accurate segmentation information can be obtained after the weighting operation, as shown in fig. 6 (c) shows a final segmentation result.

The fifth part is mainly used for processing the output images in the previous parts, so that the remote sensing segmentation map is further corrected, and a better effect is achieved. The main steps of this stage are: and (5) diffusing the division points and opening operation of the Markov random field to eliminate edge overlapping, and outputting a final result graph. The implementation steps of this stage are specifically described below:

(1) Markov random field diffusion division point

In the previous steps, classification of the remote sensing image ship and detection segmentation of the target have been completed, and finally segmentation information of the remote sensing image is output. But there is still room for improvement in accuracy for these segmentation information: 1. the segmentation of the ship image at the edges is missing; 2. the edge information of the ship is easily overlapped in the region where the ship is densely distributed.

The first task performed in the last step is thus to perfect the segmentation information at the edges so that the mask covers as much as possible the whole ship object to be detected, the accuracy of the image segmentation. For this purpose, the mask information is refined by using a Markov random field method.

The pixel point distribution of the detection target is equivalent to a Markov random field, and in the actual detection process, the object processed by the method is a pixel point in an image. The pixel point distribution of the target to be detected is generally connected and meets the basic normal distribution, so that the relation between adjacent pixel points can be represented in a probability mode.

The process of adjusting the coverage mask can be understood as a process of searching other similar pixels according to the seed pixels, expanding the seed nodes according to the interconnection relationship between the pixels so as to cover the whole target area, and the relationship between the pixels of the image can be described by the following formula:

wherein S is an input image, W is the belonging classification of the current pixel point, P (S|W) is a conditional probability distribution, P (W) is a priori probability distribution of classification results, P (S) is a probability distribution of the image, P (W|S) is a final classification result, and whether each pixel point belongs to the target area is judged according to image information.

In this process, there are two categories of the current pixel point: belonging to the detection target; not belonging to the detection target. Pixels belonging to the detection target are merged into a mask region, and pixels not belonging to the detection target are ignored. Pixel points in a target vessel

Classification information is known in advance: w (w) ₁ 、/>

The pixel points that are merged to the same ship can be calculated using the prior probability P (W) and the conditional probability P (s|w).

The distribution of the pixels of the object can be obtained by gibbs distribution, the gibbs distribution is used for constructing distribution information according to the relation among all ships, the pixels in the image belonging to a certain ship are related, the pixels around the ship and the pixels covered by the surface of the ship are related, and the conditional probability among the pixels can be calculated by using the relation to construct the gibbs distribution so as to obtain the joint distribution in the whole image.

According to Hammersley Clifford theory, the Markov random field and the Gibbs distribution are uniform, that is, whether the distribution of P (W) pixels belonging to the ship satisfies the Gibbs distribution:

where Z is a partitioning function, i.e. a normalization constant, the larger T is the smoother the window of P (W), and in addition U ₂ (w) represents the relationship between potential energy groups. And:

v in the above formula _c Represented as potential energy of a potential energy group. Wherein beta is a coupling coefficient, s and t are two adjacent pixel points respectively.

Gibbs sampling (Gibbs sampling) is a sampling method that uses a conditional distribution to perform a series of operations to finally approximate a joint distribution, and accordingly, gibbs distribution indicates that the distribution information satisfying the Gibbs distribution can be approximately obtained by obtaining the corresponding conditional probability between pixels. Whereas in the target ship image, P (w ₁ |w ₂ ) If the pixel mark belongs to the ship, the probability of distributing the mark information around the pixel is calculated, so as to determine whether the classification mark of the pixel is correct or not and the updating is not needed.

In the above formula, P (s|w) ×p (W) =p (S, W), thus calculating

Equivalent to the calculation:

i.e. the joint probability distribution of the image and pixel classification, a gibbs distribution is used here to represent P (W), whereas P (S) represents the distribution information of the image, typically a constant.

On the premise of Gibbs distribution, the Markov random field can be converted into potential energy, and the conditional probability of the MRF is determined through an energy function, so that the MRF has consistency in the global. I.e. through simple local interaction of a single pixel and its neighborhood, the MRF can obtain complex global behavior, i.e. get global statistics by using local Gibbs distribution.

In practice, the present invention finds P (s|w) ×p (W), where P (W) can be calculated by the potential energy function described above, and finds P (s|w), that is, using the marker information to estimate the value of this pixel, and if the distribution of pixels in a certain marker class satisfies the gaussian distribution, the present invention can determine in which class it is based on the value of a certain pixel, that is, whether the pixel belongs to the ship area.

The distribution of pixels corresponds to a gaussian distribution:

where the classification information w=1, 2 represents whether the current pixel belongs to the current ship or not. Solving for it

And +.>

Wherein N is _w The number of pixels in a certain class is N, the number of pixels in the whole image is N, and y is the pixel value.

Thus, by P (S|w ₁ )，P(S|w ₂ ) The attribution classification of each pixel point can be estimated, and finally the posterior probability is converted into the product (P (W) by P (S|W)) of the prior probability and the likelihood function, and the labels are gradually updated so that the larger the probability is, the best segmentation effect is obtained.

The left image in fig. 6 (b) is an initial image in the iterative process, the ship segmentation information is clearly shown but still not complete, the right image in fig. 6 (b) is a final iterative result, it can be seen that the ship to be detected can be basically covered by the mask, the edge information is also complete, and the coverage accuracy is greatly improved compared with the previous one.

(2) Edge overlap cancellation by open operation

Edge-overlapping effects can easily occur where vessels are dense, for which reason the original segmented image is optimized in this step using morphology in conventional methods. The basic operation mode is open operation, the specific steps are divided into corrosion and expansion, and through experimental verification, the edge overlapping effect can be effectively eliminated after proper operation.

In the open operation step, the corrosion effect formula in the first step is as follows:

Wherein A is the input segmented original image, and B is a kernel function set in advance, namely, the image A is eroded by B. The specific steps are similar to convolution operation, the kernel function B sequentially moves in the pixel point area of the original segmented image, the area with the product result of the kernel function and the pixel point area being 1 is set as 1, the rest areas are 0, and the overlapped edge information of the segmented image can be eliminated after iteration.

In the open operation step, the expansion effect formula in the second step is as follows:

wherein A is the original segmentation image input, B is a kernel function set in advance, and the expansion operation is opposite to the corrosion operation. The kernel function B sequentially carries out convolution operation on the pixel points of the image on the basis of the original segmented image, and as long as an intersection area in the range of the kernel function and the pixel points is not empty, the area is set to be 1, and information at the edge can be refilled after a plurality of iterations.

After the above-mentioned open operation, the edge information of the overlapped ships can be removed more accurately, so that each independent ship can be perfectly divided. As shown in fig. 6 (a), the far left side is the original input remote sensing image, the middle image is the original split image, and the overlapping phenomenon of the ship edge above can be observed, and the overlapping phenomenon of the ship edge after the operation is performed is well relieved.

(3) Outputting the final result diagram

By combining the advantages of the deep learning and the conventional method after all steps from the first part to the fifth part, the method can detect the remote sensing image of the complex environment, accurately segment the ship target in the image, effectively avoid the phenomena of edge overlapping and edge missing, and find that the ship in the image is accurately segmented as shown in fig. 6 (c).

The results of the present invention can be illustrated by the following experiments:

1. experimental conditions:

the invention operates on the Ubuntu-16.04 system with Intel Kuri 7-7800x,32GB memory and GTX 1080ti graphics calculator. And testing a remote sensing image data set with resolution of 768px×768px by adopting a software platform for PyCharm and OpenCV.

2. Experimental results:

as shown in fig. 6 (a), the final result is subjected to an open operation, the first left image is the original image, the middle is the original result image, and the right is the result image after the open operation. It can be found that the phenomenon of multi-ship edge superposition in image segmentation is slightly improved through open operation processing. While fig. 6 (b) shows the improvement of the division effect of the markov random field on the ship, the left graph is the division information before filling the markov random field, and the right graph is the processed division information, so that the processed division information is more perfect and the edge detection precision is better. The final result of all the steps is shown in fig. 6 (c), in which an image with semitransparent segmentation information added on the basis of the original image is shown, and a good segmentation effect on the remote sensing image can be seen.

The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.

Claims

1. The remote sensing image ship integrated recognition method based on deep learning is characterized by comprising the following steps of:

s3, image segmentation: training the remote sensing images of the ships with the screening of the S1 through a neural network of a U-Net framework to obtain a characteristic diagram of the remote sensing images of the ships with the screening; performing transposition convolution operation on the characteristic information graphs with different scales, gradually improving the resolution of the characteristic information, enabling the position information in the characteristic graphs to be specifically displayed, and finally obtaining the segmentation information of the ship;

The specific steps of the S1 are as follows:

s14, TTA image classification reasoning: TTA image classification reasoning is carried out on the input image, TTA image classification reasoning is carried out on the input image after horizontal overturn and vertical overturn are carried out on the input image respectively, the reasoning results are fused with the reasoning results of the original input image, then testing is carried out on a neural network with a trained ResNet-34 structure, and whether a ship exists in the visual field of the remote sensing image is judged;

The specific steps of the S2 are as follows:

s23, characteristic region adjustment: s22, generating a series of suggested areas, selecting and adjusting to finally obtain a square frame which is proper in size and just contains the ship, setting a loss function, continuously optimizing the loss function in the algorithm execution process to dynamically adjust the position of the square frame, and determining that the square frame contains the classification of the ship so as to determine the final position of the ship;

2. The deep learning-based remote sensing image ship integrated recognition method according to claim 1, wherein: the specific steps of the S3 are as follows:

s36, outputting a final result diagram.