CN109086405B

CN109086405B - Remote sensing image retrieval method and system based on significance and convolutional neural network

Info

Publication number: CN109086405B
Application number: CN201810862331.4A
Authority: CN
Inventors: 邵振峰; 杨珂; 李从敏; 周维勋
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2021-09-14
Anticipated expiration: 2038-08-01
Also published as: CN109086405A

Abstract

The invention provides a remote sensing image retrieval method and a system based on significance and a convolutional neural network, which comprises the steps of extracting a significant image from an image in a retrieval image library; inputting images in a retrieval image library into a pre-trained convolutional neural network model, and extracting a feature mapping graph of each convolutional layer to serve as convolutional layer features; up-sampling the feature mapping graph of each convolution layer, and expanding the feature mapping graph to the original size of an input image to obtain the feature of the synthesized convolution layer; carrying out weighting integration on the synthesized convolutional layer characteristics by using the saliency map, and coding to form a final effective characteristic representation; and extracting characteristic representation for the given remote sensing image to be detected, and performing similarity measurement retrieval based on the characteristic representation of each image in the retrieval image library. The method combines the saliency map and the convolutional neural network, simultaneously considers the information of the saliency region and the background region, further extracts effective feature representation on the convolutional layer features, can reduce the calculation cost, and has robustness to scale change and noise interference.

Description

Remote sensing image retrieval method and system based on significance and convolutional neural network

Technical Field

The invention belongs to the technical field of remote sensing image processing, and relates to a remote sensing image retrieval method and system based on significance characteristics and convolutional neural network model optimization.

Background

With the rapid development of remote sensing technology, the data volume of remote sensing images is rapidly increasing. The increasing amount of data brings convenience to people's lives, but at the same time, how to effectively manage remote sensing data becomes a challenge. The remote sensing image retrieval refers to the fact that interested remote sensing images can be retrieved from a massive database quickly, and is one of effective methods for solving the problem of data management. How to realize high-efficiency and quick image retrieval has important research significance.

The current commonly used retrieval technology is remote sensing image retrieval based on content, and retrieval is realized by extracting basic features (including color, shape, texture and the like) or deep learning features (including unsupervised learning features, convolutional neural network learning features and the like) in an image to form a feature vector. The convolutional neural network features have good stability. In the image retrieval method based on the salient region, the salient model is often used to extract the image of the salient region through the mask, and then the image features of the salient region are obtained, which reduces the range of feature extraction, but simultaneously ignores the content features contained in the non-salient region, and does not show significance in the features. Aiming at the problem, the invention provides a remote sensing image retrieval method based on significance and convolutional neural network weighted integration.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a remote sensing image retrieval technical scheme based on significance and convolutional neural network weighted integration. The method carries out weighted integration on the saliency map and the feature map of the convolutional neural network, introduces the saliency of the image into the features, realizes effective retrieval on the remote sensing image, and improves the precision of the remote sensing image retrieval.

In order to achieve the above object, the technical solution of the present invention provides a method for retrieving a remote sensing image based on saliency and convolutional neural network, comprising the following steps,

step a, extracting a saliency map from an image in a retrieval image library;

b, inputting the images in the retrieval image library into a pre-trained convolutional neural network model, and extracting a feature mapping graph of each convolutional layer to serve as convolutional layer features;

c, up-sampling the feature mapping graph of each convolution layer, and expanding the feature mapping graph to the original size of an input image to obtain the features of the synthesized convolution layer;

d, performing weighted integration on the convolution layer characteristics synthesized in the step c by using the saliency map, and encoding to form a final effective characteristic representation;

and e, extracting feature representation of the given remote sensing image to be detected in a mode consistent with the steps a-d, carrying out similarity measurement retrieval based on the feature representation of each image in the retrieval image library, and returning a plurality of images with highest similarity as results.

In step a, a saliency map is extracted from each image in the search image library by using the GBVS model.

In the step b, a plurality of convolutional layer features are extracted for each image in the search image library by using a convolutional neural network model trained on the ImageNet dataset.

In step c, the transposed convolution is used to realize the up-sampling, firstly, 1 × D convolution kernels are used to convert the features of each convolution layer into feature mapping graphs with the same number, then, the up-sampling is carried out on the features of each convolution layer until the size of the features is the same as that of the original image, and the features of corresponding areas are added to obtain a final convolution feature graph; wherein D is the number of feature maps obtained by the last convolutional layer.

In step d, assuming that the pixel value of each position in the convolution feature map obtained in step c represents the feature value of the position, the element value of each position in the saliency map represents the probability that the position is focused, and the probability value is used as the weight of the feature value of the corresponding position in the feature map to perform weighted integration on the convolution feature map and the saliency map to obtain a new feature representation.

The invention also provides a remote sensing image retrieval system based on the significance and the convolution neural network, which comprises the following modules,

the system comprises a first module, a second module and a third module, wherein the first module is used for extracting a saliency map from an image in a retrieval image library;

the second module is used for inputting the images in the retrieval image library into a pre-trained convolutional neural network model and extracting the feature mapping graph of each convolutional layer to be used as the convolutional layer feature;

a third module, configured to perform upsampling on the feature map of each convolutional layer, and expand the feature map to an original size of an input image to obtain a synthesized convolutional layer feature;

the fourth module is used for performing weighted integration on the convolutional layer characteristics synthesized by the third module by using the saliency map and performing coding to form a final effective characteristic representation;

the fifth module is used for extracting feature representation for the given remote sensing image to be detected, including extracting a saliency map, inputting the saliency map into a pre-trained convolutional neural network model, and extracting a feature mapping map of each convolutional layer to serve as convolutional layer features; up-sampling the feature mapping graph of each convolution layer, and expanding the feature mapping graph to the original size of an input image to obtain the feature of the synthesized convolution layer; carrying out weighted integration on the synthesized convolutional layer characteristics by using the saliency map, and coding to form a final effective characteristic representation;

and performing similarity measurement retrieval based on the feature representation of each image in the retrieval image library, and returning a plurality of images with the highest similarity as a result.

The method combines the saliency map and the convolutional neural network, simultaneously considers the information of the saliency region and the background region, further extracts effective feature representation on the convolutional layer features, can reduce the calculation cost, and has robustness to scale change and noise interference. Compared with the prior art, the invention has the following characteristics and beneficial effects:

(1) acquiring a saliency map of an image by adopting a visual attention model to obtain saliency of different regions of the image;

(2) the convolution layer characteristics are directly extracted by using the pre-trained convolution neural network model, extra training is not needed, and the calculation cost is reduced.

(3) The method for extracting the feature representation by weighting and integrating the convolution layer feature map and the saliency map is provided, not only accords with the visual attention feature of human eyes, but also considers the feature information of a background area, and the extracted features are more effective.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a flowchart of convolutional layer signature coding according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and examples.

Considering human visual theory, for different areas of an image, attention degrees of human eyes are different, the traditional retrieval method based on the salient image area performs mask extraction on the salient area, only image content of the salient area is reserved for feature extraction, and the background area is ignored to contain certain distinguishing information. The features extracted by the convolutional neural network contain rich information, and for image retrieval, a better retrieval result can be obtained by fusing the salient map and the convolutional layer features.

Therefore, the invention provides a remote sensing image retrieval method based on significance and convolutional neural network weighted integration, which comprises the steps of firstly extracting a significant image from an image in a retrieval image library by using a visual attention model, secondly extracting convolutional layer characteristics of the image by using a convolutional neural network model, and carrying out upsampling expansion to the same size as that of an original image. And then, carrying out weighted integration on the sampled convolution characteristic diagram and the saliency map and coding to obtain a final characteristic representation. And finally, extracting the characteristics of the given retrieval remote sensing image according to steps, carrying out similarity measurement on the extracted characteristics and the image characteristics in the image library, and returning the image with high similarity as a similar image.

Referring to fig. 1, the example flow is as follows:

step a, obtaining a saliency map of each image in a retrieval image library.

For each image in the library of search images, a visual attention model can be used to extract the saliency map. In specific implementation, a specific extraction manner may be specified by itself, in the embodiment, a GBVS (Graph-Based Visual salience, Graph theory-Based significance analysis) model is preferably adopted to calculate and extract a significant map of an original image, the GBVS model is the prior art, and details are not repeated in the present invention.

And b, acquiring the feature of the convolutional layer of each image in the retrieval image library, wherein the step b comprises the step of inputting the images in the image library into a pre-trained convolutional neural network model and extracting a feature mapping chart of the convolutional layer.

In the embodiment, a relatively common pre-trained convolutional neural network model is adopted and is trained on an ImageNet data set, wherein ImageNet is a large-scale image library, and the convolutional neural network model is a trained image classification model, which is not repeated in the invention. In specific implementation, the convolutional neural networks are various and can be selected according to specific needs, for example, two most common models of AlexNet and VGG-M can be used for comparison in experiments, and the best result model is obtained. AlexNet is a classical network model mainly comprising 5 convolutional layers and 3 full-connection layers, and a VGG-M network is an improvement on AlexNet, and the adopted convolutional kernel is small.

In the embodiment, a convolutional neural network model trained on an ImageNet data set is used for extracting the characteristics of a plurality of convolutional layers of each image in a retrieval image library, the images in the retrieval images are directly input into the network model without being trained aiming at the retrieval image library, the characteristic mapping graph of each convolutional layer is extracted as the characteristics of the convolutional layer, and the extracted characteristic graph is H_i*W_i*D_iFeature vector of dimension, i represents the ith convolution layer, i.e. H_i、W_i、D_iRespectively representing the length, width and depth of the characteristic map of the ith convolutional layer.

And c, up-sampling the feature mapping graph of each convolution layer, and expanding the feature mapping graph to the original image size to obtain the synthesized convolution layer features.

In the embodiment, the feature of each convolution layer obtained in the step b is up-sampled by using a method of transposition convolution, and the feature size is expanded to be the same as the size of the original image. Let depth D be the number of feature maps obtained for the last convolutional layer, i.e., the maximum number (the last convolutional layer is the largest, although the depth of each convolutional layer is different). Firstly, utilizing convolution kernel of 1X D to make convolution on several convolution layer characteristics so as to make convolution layer characteristic graphs with different depths have identical depth and obtain H_i*W_iA feature map of D; then, performing transposition convolution to enable the convolution feature maps to have the same length, width and depth to obtain H W D feature maps, wherein H, W, D represent the length, width and depth of the feature maps respectively, and H, W is the same as the original image; and adding and combining the characteristic values of the corresponding areas, namely synthesizing the characteristics of the plurality of convolution layers to form new convolution layer characteristics. In the implementation, the up-sampling mode can be self-designated by those skilled in the art. The corresponding areas refer to the same receptive field areas corresponding to the original image, and the addition and combination refers to the addition of the characteristic values of one receptive field area.

And d, performing weighted integration and coding on the convolutional layer characteristics synthesized in the step c and the saliency map obtained in the step a to obtain final characteristics.

In the step, the convolutional layer features are weighted and integrated by using the saliency map, and are encoded to form a final effective feature representation, wherein the pixel value of each position in the convolutional feature map obtained in the step c represents the feature value of the position, and the element value of each position in the saliency map represents the probability of the position being concerned. And (4) weighting and integrating the convolution characteristic diagram and the saliency diagram by using the probability value as the weight of the characteristic value of the corresponding position in the characteristic diagram to obtain new characteristic representation.

In this feature representation, the features of a region (salient region) having a large saliency value, i.e., a large weight value, are emphasized and highlighted, and the features of a region (background region) having a small saliency value, i.e., a small weight value, are weakened and reduced, thereby integrating the feature expressions of different regions. The new characteristic shows that the dimension is larger, and the invention further provides that a BOVW (bag of visual word) bag-of-word model is used for coding to form the final compact characteristic. Wherein, the BOVW is an existing visual coding model and is not described herein again.

The specific implementation of the embodiments is as follows,

assume that the extracted saliency map is S ═ S_x,yIn which s is_x,yThe saliency at the (x, y) position in the image, that is, the probability that the position is focused, is used as the weight of the position feature value. Assuming a convolution characteristic of

Wherein

Is the feature value of the ith dimension at the (x, y) position in the image. The formula for the saliency map and convolution feature weighting integration is as follows:

wherein, M represents the feature layer after weight integration is added,

and (3) representing the characteristic value of the (x, y) position of the ith layer characteristic diagram after integration.

The H W D dimension feature vector obtained after weighting integration has higher feature dimension, needs to form compact feature representation by adopting a coding mode, and reduces the calculation cost. In the embodiment, a BOVW (bag of visual word) bag model is adopted to code weighted features, wherein the size of a dictionary is set as n, each image is finally formed to obtain a feature vector representation of 1 x n, a flow chart of convolutional layer feature coding is shown in fig. 2, firstly, convolutional layer features after weighted integration are extracted from a plurality of images in an image library, n feature clustering centers are extracted by using a k-means clustering method to serve as the dictionary, then convolutional layer features after weighted integration of the image to be queried are extracted, and feature coding is carried out on the image to be queried by using a BOVW method according to a feature sample of the dictionary to form an n-dimensional feature vector. In the specific implementation, a person skilled in the art can specify the encoding mode by himself. Wherein, the BOVW is the existing visual coding model, and the invention is not repeated.

In specific implementation, the size of the dictionary can be taken as a plurality of values to carry out comparison experiments, for example, 1000,10000,50000,100000,500000, and the corresponding value with the best result is taken as the value of n.

And e, retrieving the remote sensing image.

For a given remote sensing image to be detected as a query image, firstly, extracting features according to the step abcd, namely extracting a significant map of the query image in the same way, acquiring convolutional layer features, up-sampling the convolutional layer features, expanding the convolutional layer features to the size of an original image, performing weighted integration and coding on the sampled convolutional features and the significant map, and acquiring the final effective feature representation of the query image; similarity measurement is then carried out on the features of the query image and the features of other images in the image library, and the calculation of the measurement distance is various, and in the embodiment, the Euclidean distance is preferably used as a feature distance calculation method. The euclidean distance is a feature vector calculation formula, belongs to the prior art, and is not described herein again. After the similarity of the two images is calculated, a certain number of (preset value) similar images are returned according to a certain sequence (the similarity is from high to low or from low to high).

In specific implementation, any image in the search image library may be used as a query image, other images in the image library may be used as candidate images, and images similar to the query image in the candidate images may be searched. When other images in the image library are used as query images, the processing mode is the same, and the features in the image library can be extracted and put in a warehouse for processing.

In specific implementation, the operation of the above processes can be realized by adopting a software technology, and a corresponding system can also be provided in a modularized manner. The embodiment of the invention also provides a remote sensing image retrieval system based on the significance and the convolutional neural network, which comprises the following modules,

The implementation of each module can refer to the corresponding steps, and the invention is not repeated.

The foregoing is a more detailed description of the invention, taken in conjunction with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments disclosed. It will be understood by those skilled in the art that various changes in detail may be effected therein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A remote sensing image retrieval method based on significance and convolution neural network is characterized in that: comprises the following steps of (a) carrying out,

step a, extracting a saliency map from an image in a retrieval image library;

assume that the extracted saliency map is S ═ S_x,yIn which s is_x,yIs the degree of saliency at (x, y) position in the image, i.e. the probability that the position is concerned, as the weight of the characteristic value of the position; assuming a convolution characteristic of

Wherein

Is the feature value of the ith dimension at the (x, y) position in the image, the formula for integrating the saliency map and the convolution feature weights is as follows:

wherein, M represents the feature layer after weight integration is added,

a feature value representing the (x, y) position of the ith layer feature map after integration; and e, extracting feature representation of the given remote sensing image to be detected in a mode consistent with the steps a-d, carrying out similarity measurement retrieval based on the feature representation of each image in the retrieval image library, and returning a plurality of images with highest similarity as results.

2. The method for retrieving the remote sensing image based on the significance and the convolutional neural network as claimed in claim 1, wherein: in the step a, a GBVS model is used for extracting a saliency map from each image in the retrieval image library.

3. The method for retrieving the remote sensing image based on the significance and the convolutional neural network as claimed in claim 2, wherein: in the step b, a plurality of convolutional layer features are respectively extracted from each image in the retrieval image library by using a convolutional neural network model trained on the ImageNet data set.

4. The method for retrieving the remote sensing image based on the significance and the convolutional neural network as claimed in claim 3, wherein: in the step c, the transposed convolution is used for realizing the up-sampling, firstly, 1 × D convolution kernels are used for converting the features of each convolution layer into feature mapping graphs with the same number, then, the up-sampling is carried out on the features of each convolution layer until the features are the same as the original image in size, and the features of corresponding areas are added to obtain a final convolution feature graph; wherein D is the number of feature maps obtained by the last convolutional layer.

5. The method for retrieving the remote sensing image based on the significance and the convolutional neural network as claimed in claim 1, 2, 3 or 4, wherein: in the step d, the pixel value of each position in the convolution characteristic diagram obtained in the step c is set to represent the characteristic value of the position, the element value of each position in the saliency map represents the concerned probability of the position, and the convolution characteristic diagram and the saliency map are weighted and integrated by using the probability value as the weight of the characteristic value of the corresponding position in the characteristic diagram to obtain a new characteristic representation.

6. A remote sensing image retrieval system based on significance and convolution neural network is characterized in that: comprises the following modules which are used for realizing the functions of the system,

Wherein

wherein, M represents the feature layer after weight integration is added,

a feature value representing the (x, y) position of the ith layer feature map after integration;

7. The remote sensing image retrieval system based on saliency and convolutional neural network of claim 6, characterized in that: in the step a, a GBVS model is used for extracting a saliency map from each image in the retrieval image library.

8. The remote sensing image retrieval system based on saliency and convolutional neural network of claim 7, wherein: in the step b, a plurality of convolutional layer features are respectively extracted from each image in the retrieval image library by using a convolutional neural network model trained on the ImageNet data set.

9. The remote sensing image retrieval system based on saliency and convolutional neural network of claim 8, wherein: in the step c, the transposed convolution is used for realizing the up-sampling, firstly, 1 × D convolution kernels are used for converting the features of each convolution layer into feature mapping graphs with the same number, then, the up-sampling is carried out on the features of each convolution layer until the features are the same as the original image in size, and the features of corresponding areas are added to obtain a final convolution feature graph; wherein D is the number of feature maps obtained by the last convolutional layer.

10. The remote sensing image retrieval system based on saliency and convolutional neural network of claim 6 or 7 or 8 or 9, characterized in that: in the step d, the pixel value of each position in the convolution characteristic diagram obtained in the step c is set to represent the characteristic value of the position, the element value of each position in the saliency map represents the concerned probability of the position, and the convolution characteristic diagram and the saliency map are weighted and integrated by using the probability value as the weight of the characteristic value of the corresponding position in the characteristic diagram to obtain a new characteristic representation.