CN110910413A

CN110910413A - ISAR image segmentation method based on U-Net

Info

Publication number: CN110910413A
Application number: CN201911192861.3A
Authority: CN
Inventors: 林财永; 许洁平; 方宇强; 徐灿; 韩蕾; 李智
Original assignee: Space Engineering University
Current assignee: Space Engineering University; Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-03-24

Abstract

The invention belongs to the field of radar signal processing, relates to an ISAR image segmentation method, in particular to an ISAR image segmentation method based on U-Net, and comprises the following steps: s1, preprocessing data; s2, generating a training set; s3, constructing a network structure of the U-Net and setting network parameters; s4, training U-Net; and S5, carrying out data preprocessing on the target ISAR image to be segmented according to S1, sending the preprocessed image into the U-Net trained in S4, and finally obtaining the segmentation result of the target ISAR image through a network output layer. Compared with the traditional method, the method has superiority in the aspect of feature extraction, can obtain deep semantic features according to the image by utilizing deep learning, and realizes more accurate segmentation of the ISAR image; compared with the FCN, the method has the advantages that parameters required to be trained are few, the network can be trained only through one-time training and few training samples, and training efficiency is high.

Description

ISAR image segmentation method based on U-Net

Technical Field

The invention belongs to the field of radar signal processing, relates to an ISAR image segmentation method, and particularly relates to an ISAR image segmentation method based on U-Net.

Background

With the high importance of aerospace in various countries of the world and the rapid development of civil aerospace, more and more satellites are launched and lifted off, and the space becomes increasingly crowded and more competitive and antagonistic. How to effectively sense the space situation and further control the space becomes a big problem currently faced by each aerospace large country. Inverse Synthetic Aperture Radar (ISAR) is an all-weather and all-day means, has the advantages of high resolution and large amount of acquired information, and plays an important role in extracting spatial target characteristics. The information such as the appearance, the structure and the size of the target can be inverted through the obtained high-resolution ISAR image of the space target, and the target posture can be further estimated by combining the multi-frame image.

And performing inversion estimation on the structure scale and the motion attitude of the satellite target by using the ISAR image, and firstly extracting the ISAR image contour characteristics of the target, namely realizing the segmentation of the target in the image. Image segmentation is a processing technology for extracting a region of interest in a map, and the quality of the processing effect directly influences the cost of subsequent processing. At present, a great number of algorithms exist in the field of image segmentation, and the algorithms include an edge detection method based on Canny operators, a method based on boundary tracking, a method based on region growing, separating and aggregating and the like. In addition, Rafael c. and Beucher et al propose morphological watershed-based segmentation methods that are highly interesting because of their fast computation speed and their ability to accurately locate the edges of images. However, these conventional image segmentation methods are sensitive to noise, do not consider images and semantic information between images, and have a limited application range.

In recent years, the deep learning technology has been rapidly developed, and an Image Semantic Segmentation (ISS) method based on deep learning is also changing day by day. Compared with the traditional image segmentation, the ISS adds certain semantic information to the target or the foreground in the image on the basis of the ISS, can obtain the information which needs to be expressed by the image according to the texture, the scene and other high-level semantic features of the image, and has higher practical value. In 2014, Grishick et al proposed a Regional Convolutional Neural Network (RCNN) based on a Convolutional Neural Network (CNN). The RCNN combines the candidate region generated by the selective search algorithm with the visual feature generated by the CNN, and simultaneously completes two tasks of target detection and ISS. However, RCNN has the disadvantages of severe dependence on candidate regions, insufficient segmentation accuracy and complex processing procedure, so Grishick et al further proposes an improved algorithm Fast-RCNN, maps the candidate regions onto the convolution feature map of CNN, and generates a feature map of a fixed size for each candidate region through pooling of regions of interest, thereby increasing the speed of generating the candidate regions. The Fast-RCNN Network proposed by Ren et al adds a Region suggestion Network (RPN) on the basis of the Fast-RCNN Network, further improving the speed of generating high-quality candidate regions. However, these candidate-region-based methods require a large number of candidate regions to be generated, and only rectangular framing of the target region can be achieved, and end-to-end pixel-level image segmentation cannot be completed. Long et al designs a Full Convolutional Network (FCN) compatible with any size image for image semantic segmentation, uses Convolutional layer to replace the full connection layer in the conventional CNN, and uses cross-layer and bilinear interpolation methods to extend image-level classification to pixel-level classification. The FCN directly classifies each pixel in the image without generating a target candidate region, and the original image directly outputs a classification result after passing through an end-to-end model. However, there are three problems with such FCN networks: firstly, after the image is subjected to pooling operation, the resolution of the characteristic graph is continuously reduced, and partial pixel spatial positions are lost; secondly, image context information cannot be considered in the segmentation process, and rich spatial position information cannot be fully utilized, so that the utilization rate of local features and global features is unbalanced; thirdly, the network structure is complex, and training needs a large amount of manually labeled samples.

Most of the methods are based on optical image segmentation, less researches are carried out on ISAR image segmentation reflecting the electromagnetic scattering characteristics of the target, and the problems of coherent speckle noise, side lobe interference, small sample number and the like which are specific to radar images are not considered.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an ISAR image segmentation method based on U-Net. The U-Net adopts a network structure of an encoder-decoder, firstly carries out down sampling, learns the deep characteristics of an image through convolution of different degrees, and restores the deep characteristics of the image into the size of an input image through up sampling, and the up sampling is realized by deconvolution. The network structure of the U-Net only comprises a convolution layer and a pooling layer, and does not comprise a full connection layer, a shallow high-resolution layer in the network is used for solving the problem of pixel positioning, and a deep layer is used for solving the problem of pixel classification, so that segmentation at a semantic level can be realized, and speckle noise and side lobe influence of an ISAR image are eliminated. The method can fully utilize the spatial position information of the image without generating a candidate region, thereby greatly reducing the image segmentation time and improving the segmentation precision of the target ISAR image. Meanwhile, the training of the network can be completed by using a small number of marked ISAR images due to the simple network structure and few parameters.

In order to achieve the purpose, the invention is concretely realized by the following technical scheme:

s1 data preprocessing

S1.1 ISAR image normalization. Due to the comprehensive influence of the target environment, radar system parameters and target scattering characteristics, the target ISAR images obtained through imaging processing have different amplitudes, and in order to eliminate the influence of the image amplitude on image segmentation, the ISAR images composed of N matrix elements need to be subjected to amplitude normalization, wherein the normalization method comprises the following steps:

wherein s is an ISAR image data matrix, x is an ISAR image data matrix after amplitude normalization, n represents the nth element in the ISAR image data matrix, and the value of each element in the normalized matrix is in the range of 0 to 255.

S1.2, image enhancement. Due to the influence of the velocity compensation error and the side lobe, the ISAR image has object ghosting and bright lines, and the image quality is poor. In addition to the difference in the scattering intensity of echoes from different parts of the target, the "bright" parts in the ISAR image will occlude the "dark" parts. In order to facilitate image segmentation, the ISAR image needs to be enhanced, sharpened and better highlighted. Aiming at the characteristics of the ISAR image, a piecewise linear transformation method is adopted to increase the dynamic display range of the gray value of the image.

And S1.3, cutting the ISAR image subjected to image enhancement in the S1.2, and storing the ISAR image in a picture format. In general, a certain margin is left in the sampling bandwidth of the radar compared with the bandwidth of a transmitted signal, so that certain data redundancy exists in the ISAR image in the distance dimension. Meanwhile, the relatively stable motion state of the space target and the high pulse repetition frequency of the radar enable the ISAR image to have certain data redundancy in the azimuth dimension. The redundant data can increase the training time of the neural network and occupy the storage space of the computer, so that the ISAR image needs to be intercepted according to the actual size of the target, and the redundancy is eliminated. Meanwhile, in order to facilitate the further processing of the later U-Net, the intercepted ISAR image is stored into a picture format.

And S2, generating a training set. And taking the cut ISAR image sample in the S1.3 as a data set, and simultaneously carrying out manual marking on each image in the data set to obtain a label set corresponding to the data set, wherein the data set and the label set corresponding to the data set form a training set. Each sample in the training set contains an ISAR image and its artificially labeled ISAR image, and the training set is used as input for network training of U-Net.

S3, constructing a network structure of the U-Net and setting network parameters: U-Net adopts a network structure of an encoder-decoder, and comprises a down-sampling layer and an up-sampling layer, wherein the down-sampling layer consists of a convolution module, and the up-sampling layer consists of a deconvolution module. The upper sampling layer and the lower sampling layer adopt convolution operation with the same number of layers, and the SkipConnection structure is used for connecting the lower sampling layer and the upper sampling layer, and the specific implementation process is as follows:

s3.1, building a down-sampling layer of U-Net: the down-sampling layer consists of a plurality of (e.g., 4) convolution modules, each convolution module consisting of two 3 × 3 convolution layers, one modified Linear Unit (ReLU), and one 2 × 2 max pooling layer;

s3.2, building an upper sampling layer of the U-Net: the upsampling layer consists of a plurality of (e.g., 4) deconvolution modules, each of which consists of one 2 × 2 deconvolution (Up-Convolution) layer, two 3 × 3 Convolution layers, and one ReLU;

s3.3 Skip Connection: connecting the characteristic layer output by the convolution module in the lower sampling layer to the deconvolution module of the corresponding upper sampling layer, and connecting the characteristic layer and the deconvolution module in series to serve as input characteristics;

s3.4, constructing a Dropout layer: in order to avoid overfitting in the network training process, a Dropout layer is added into the lower sampling layer;

s3.5, a network output module of the U-Net is built: and (3) passing the output of the up-sampling layer through a convolution layer of 2 x 2 and 1 x 1 to obtain the final output of U-Net. Therefore, the U-Net can carry out end-to-end segmentation on the pixels, namely, an image is input, and the output is also an image with the same size.

S3.6, setting network parameters of U-Net: the network parameter setting of the U-Net comprises the number of convolution and deconvolution modules, an optimizer, a loss function, an activation function, Dropout and the like, and the setting of the network parameter is not uniformly specified and needs to be set according to the actual data type, the network application, the system requirement and other conditions.

And S4, training the U-Net. Training the U-Net constructed in the S3 by using an ISAR image training set generated in the S2 to obtain a trained network model; in order to ensure the training effect, part of data of a training set needs to be reserved as a verification set, and the network is subjected to performance evaluation in the training process to ensure the training effect.

And S5, carrying out data preprocessing on the target ISAR image to be segmented according to S1, sending the preprocessed image into the U-Net trained in S4, and finally obtaining the segmentation result of the target ISAR image through a network output layer.

The invention has the beneficial effects that:

(1) compared with the traditional method, the ISAR image segmentation method based on the U-Net has superiority in feature extraction, can obtain deep semantic features according to the image by utilizing deep learning, and realizes more accurate segmentation of the ISAR image.

(2) Unlike the method based on RCNN, the invention does not need candidate regions, cancels a full connection layer and only has a convolution layer and a pooling layer. The Skip Connection structure is adopted to connect the upper sampling layer and the lower sampling layer, so that the characteristics extracted by the lower sampling layer can be directly transmitted to the upper sampling layer, the pixel positioning of U-Net is more accurate, and the segmentation precision is higher.

(3) The method takes data as drive, can realize end-to-end segmentation, and simplifies the segmentation process of the target ISAR image; meanwhile, the method has good universality and strong generalization capability on ISAR images of different sizes, and meets the large-batch and real-time engineering requirements.

(4) The U-Net network has simple structure and shallow layer number, and compared with the FCN network, the U-Net network has less parameters to be trained, can complete the training of the network only by one-time training and less training samples, and has high training efficiency.

Drawings

FIG. 1 is a flow chart of an ISAR image segmentation method based on U-Net according to the present invention;

FIG. 2 shows an ISAR image of a satellite target after amplitude normalization;

FIG. 3 shows an ISAR image of a satellite target after image enhancement;

FIG. 4 shows a cropped ISAR image of a satellite target;

FIG. 5 is a schematic diagram of an ISAR image of a marked satellite target according to the present invention;

FIG. 6 is a schematic diagram of a U-Net network structure provided by the present invention;

FIG. 7 is a graph showing the variation of the loss function values with the number of iterations in the network training of the present invention;

FIG. 8 is a graph showing the variation of segmentation accuracy with iteration number in the network training of the present invention;

FIG. 9 shows the ISAR image segmentation result based on U-Net according to the present invention;

fig. 10 shows the result of ISAR image segmentation based on the conventional morphological method.

Detailed Description

The technical solution of the present invention is specifically described below with reference to the accompanying drawings and specific examples, and it should be noted that the technical solution of the present invention is not limited to the implementation manner described in the examples, and those skilled in the art refer to and refer to the content of the technical solution of the present invention, and the modifications and designs made on the basis of the present invention shall fall into the protection scope of the present invention.

The embodiment of the invention provides a U-Net-based ISAR image segmentation method, wherein the size of each ISAR image matrix is 1024 x 512, and the flow of the ISAR image segmentation method is shown in figure 1.

The method specifically comprises the following steps:

s1 data preprocessing

wherein s is an ISAR image data matrix, x is an ISAR image data matrix after amplitude normalization, n represents the nth element in the image data matrix, and the value of each element in the normalized matrix is in the range of 0 to 255.

The normalization process of the ISAR image is also a graying process for converting the color image into a 256-step grayscale image, 0 representing black and 255 representing white. The gray-scale image only contains brightness information, can reflect the strength information of the electromagnetic scattering of the target, and meets the requirements of image segmentation. Meanwhile, because the color information is not contained, the image information amount is greatly reduced, the image processing calculation amount is correspondingly greatly reduced, and the subsequent processing is convenient. Fig. 2 shows the result of the normalization of the satellite target ISAR image.

S1.2.1, setting a threshold value, and setting the element value which is larger than the threshold value in the ISAR image matrix after normalization in S1 as the size of the threshold value, wherein the element value is unchanged when the element value is smaller than the threshold value. In the embodiment, an Otsu threshold method is adopted to adaptively determine the threshold of each ISAR image obtained in S1.1, and the Otsu threshold method is an optimal algorithm selected for an image segmentation domain value, so that the calculation is simple and is not influenced by the image brightness and contrast.

S1.2.2 piecewise linear transformation. After threshold processing, contrast lifting is carried out on the image matrix within the threshold range, namely, piecewise linear transformation is adopted to transform the dynamic range of the image value range back to 0 to 255, which can be expressed as formula

Wherein x is_nIs an element, r, in the amplitude normalized ISAR image data matrix_nThresh is a threshold value for the elements of the ISAR image data matrix after the piecewise linear transformation, and n represents the nth element in the image data matrix. After contrast pull-up process, r_nThe value of each element in (a) is still in the range of 0 to 255, corresponding to the range of grey values.

Fig. 3 shows an ISAR image of the satellite target after image enhancement.

In this embodiment, according to the size of the satellite target, the ISAR image is cut from the original 1024 × 512 size to 128 × 128 size, and the cut ISAR image is saved in a picture format by adopting a lossless compression PNG format. Fig. 4 shows the result of cropping the ISAR image of the satellite target.

And S2, generating a training set. And taking the cut ISAR image sample in the S1.3 as a data set, and simultaneously carrying out manual marking on each image in the data set to obtain a label set corresponding to the data set, wherein the data set and the label set corresponding to the data set form a training set. Each sample in the training set contains an ISAR image and its artificially labeled ISAR image, and the training set is used as input for network training of U-Net. The invention adopts labeling software Labelme opened by the Massachusetts institute of technology, carries out pixel-level labeling on the ISAR image stored in the PNG format by using the software, and simultaneously stores the ISAR image after labeling in the PNG format. Thus, the data set and the label set corresponding to the data set form a training set for network training of U-Net.

In the embodiment, a total of 50 satellite target ISAR images are marked, and fig. 5 shows the marking result of one satellite target ISAR image.

S3, constructing a network structure of the U-Net and setting network parameters: the U-Net adopts a network structure of an encoder-decoder, and comprises a down-sampling layer and an up-sampling layer, wherein the down-sampling layer consists of a convolution module, and the up-sampling consists of a back-convolution module. The upper sampling layer and the lower sampling layer adopt convolution operation with the same number of layers, and the SkipConnection structure is used for connecting the lower sampling layer and the upper sampling layer, and the specific implementation process is as follows:

s3.1, building a down-sampling layer of the U-Net; the downsampled layer consists of a plurality of (e.g., 4) convolution modules, each convolution module consisting of two 3 x 3 convolutional layers, one ReLU, and one 2 x 2 max pooling layer;

s3.2, building an upper sampling layer of the U-Net; the upsampling layer consists of a plurality of (e.g., 4) deconvolution modules, each of which consists of one 2 × 2 deconvolution (Up-Convolution) layer, two 3 × 3 Convolution layers, and one ReLU;

s3.3 Skip Connection; connecting the characteristic layer output by the convolution module in the lower sampling layer to the deconvolution module of the corresponding upper sampling layer, and connecting the characteristic layer and the deconvolution module in series to serve as input characteristics;

S3.6, setting network parameters of U-Net: the network parameter settings of U-Net include the number of convolution and deconvolution modules, optimizer, loss function, activation function, Dropout, etc. In this embodiment, the size of the ISAR picture in the data set is 128 × 128, so the downsampling layer of the U-Net network is set to 5 convolution modules, the feature dimensions after convolution processing are sequentially set to 16-32-64-128-256, the corresponding upsampling layer is composed of 5 deconvolution modules, the output feature dimensions are sequentially set to 128-64-32-16, and the two layers form a symmetrical structure. Because the output of the output layer is an image, the activation functions of other layers are all ReLU functions except the Sigmoid function. The optimizer selects an Adam optimizer combining the advantages of two optimization algorithms of AdaGrad and RMSProp, so that the memory requirement is less, and the calculation is more efficient. To prevent the training process from overfitting, the Dropout layer is set to a 50% discard rate, i.e., the Dropout layer will randomly disconnect 50% of the input neurons each time the parameters are updated during the training process. And finally, selecting a binary cross entropy function as a loss function of the network.

Fig. 6 shows a schematic diagram of the U-Net network structure in this embodiment.

And S4, training the U-Net. Training the U-Net constructed in the S3 by using an ISAR image training set generated in the S2 to obtain a trained network model; in order to ensure the training effect, part of data of a training set needs to be reserved as a verification set, and the network is subjected to performance evaluation in the training process to ensure the training effect. In the embodiment of the invention, 20% of data is reserved as a verification set, and the verification set does not participate in training and is used for testing indexes of the model after each iteration is finished, such as loss functions, accuracy and the like. In this embodiment, the number of network iterations is set to 80.

Fig. 7 and fig. 8 respectively show the loss function value and the segmentation accuracy rate in the network training process as a function of the number of network training iterations. As can be seen from fig. 7 and 8, when the number of iterations reaches 60, the loss function values and the segmentation accuracy of the validation set and the training set both tend to be stable, which indicates that it is reasonable to set 80 iterations, and an effective training result can be obtained.

In order to verify the effect of the invention, the method of the invention is compared with a segmentation method based on traditional morphology, and the traditional method realizes the segmentation of the target ISAR image through corrosion, expansion and boundary extraction. Fig. 10 shows the result of the conventional method for segmenting the same satellite target ISAR image. Comparing fig. 9 and fig. 10, it can be seen that the method of the present invention is significantly superior to the conventional segmentation method, and can more accurately and reasonably extract the target contour without distortion of the contour edge.

From the results of the embodiment, it can be seen that the identification method of the present invention is significantly superior to the conventional segmentation method. Meanwhile, the network structure is simple, the required training samples are few, the requirement of real-time image segmentation can be met, and the method is suitable for small sample sets such as ISAR images.

Claims

1. An ISAR image segmentation method based on U-Net is characterized by comprising the following steps:

s1 data preprocessing

S1.1 ISAR image normalization: in order to eliminate the influence of the image amplitude on image segmentation, the amplitude normalization of the ISAR image composed of N matrix elements is required, and the normalization method is as follows:

wherein s is an ISAR image data matrix, x is an ISAR image data matrix after amplitude normalization, n represents the nth element in the ISAR image data matrix, and the value of each element in the normalized matrix is in the range of 0 to 255;

s1.2, image enhancement: aiming at the characteristics of the ISAR image, a piecewise linear transformation method is adopted to increase the dynamic display range of the gray value of the image;

s1.3, cutting the ISAR image subjected to image enhancement in the S1.2, and storing the ISAR image in a picture format;

s2, generating a training set: taking the cut ISAR image sample in the S1.3 as a data set, and simultaneously carrying out manual marking on each image in the data set to obtain a label set corresponding to the data set, wherein the data set and the label set corresponding to the data set form a training set; each sample in the training set comprises an ISAR image and an artificially marked ISAR image thereof, and the training set is used as input for network training of U-Net;

s3, constructing a network structure of the U-Net and setting network parameters: the U-Net adopts a network structure of an encoder-decoder and comprises a down-sampling layer and an up-sampling layer, wherein the down-sampling layer consists of a convolution module, and the up-sampling layer consists of a deconvolution module; the upper sampling layer and the lower sampling layer adopt convolution operation with the same number of layers, and the SkipConnection structure is used for connecting the lower sampling layer and the upper sampling layer, and the specific implementation process is as follows:

s3.1, building a down-sampling layer of U-Net: the down-sampling layer consists of a plurality of convolution modules, and each convolution module consists of two convolution layers of 3 x 3, a correction linear unit and a maximum pooling layer of 2 x 2;

s3.2, building an upper sampling layer of the U-Net: the up-sampling layer consists of a plurality of deconvolution modules, and each deconvolution module consists of a 2 x 2 deconvolution layer, two 3 x 3 convolution layers and a ReLU;

s3.5, a network output module of the U-Net is built: the output of the up-sampling layer is processed by a 2 x 2 and 1 x 1 convolution layer to obtain the final output of U-Net, thereby ensuring that the U-Net can carry out end-to-end segmentation on pixels, namely, an image is input, and the output is also an image with the same size;

s3.6, setting network parameters of U-Net: the network parameter setting of the U-Net comprises the number of convolution and reverse convolution modules, an optimizer, a loss function, an activation function and Dropout, the setting of the network parameter has no uniform regulation and needs to be set according to the conditions of actual data types, network purposes, system requirements and the like;

s4, training U-Net: training the U-Net constructed in the S3 by using an ISAR image training set generated in the S2 to obtain a trained network model;

2. The U-Net based ISAR image segmentation method according to claim 1, wherein: in S4, in order to ensure the training effect, part of the data of the training set needs to be reserved as a verification set, and the network is subjected to performance evaluation in the training process to ensure the training effect.

3. The U-Net based ISAR image segmentation method according to claim 1, wherein: in S3.1, the downsampling layer consists of 4 convolution modules.

4. The U-Net based ISAR image segmentation method according to claim 1, wherein: in S3.2, the upsampling layer consists of 4 convolution modules.

5. The U-Net based ISAR image segmentation method according to claim 1, wherein: s1.2, the specific steps of image enhancement by adopting a piecewise linear transformation method are as follows:

s1.2.1 sets a threshold: setting the element value which is larger than the threshold value in the ISAR image matrix after the normalization of S1 as the threshold value, and keeping the element value which is smaller than the threshold value unchanged;

s1.2.2 piecewise linear transformation: after threshold processing, contrast lifting is carried out on the image matrix within the threshold range, namely, piecewise linear transformation is adopted to transform the dynamic range of the image value range back to 0 to 255, which can be expressed as formula

Wherein x is_nIs an element, r, in the amplitude normalized ISAR image data matrix_nAfter the contrast ratio is increased, r is the element of the ISAR image data matrix after the piecewise linear transformation, Thresh is a threshold value, n represents the nth element in the image data matrix_nThe value of each element in (a) is still in the range of 0 to 255, corresponding to the range of grey values.

6. The U-Net based ISAR image segmentation method according to claim 5, wherein: in S1.2.1, the threshold value of each ISAR image obtained in S1.1 is adaptively determined by adopting an Otsu threshold value method.