CN113515661B

CN113515661B - Image retrieval method based on filtering depth convolution characteristics

Info

Publication number: CN113515661B
Application number: CN202110805566.1A
Authority: CN
Inventors: 张伯健; 卢奋; 刘广海; 孔令杰; 陆周
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2022-03-11
Anticipated expiration: 2041-07-16
Also published as: CN113515661A

Abstract

The invention discloses an image retrieval method based on filtering depth convolution characteristics, which comprises the steps of firstly inputting an image of a data set into a pre-trained depth convolution neural network model and extracting depth convolution characteristics; secondly, filtering the depth convolution characteristics to remove background noise; then, designing a space weight for enhancing the response of the target object; secondly, enhancing the channel by using the channel weight to generate a representation vector of the image; and finally, normalizing and reducing the dimension of the expression vector of the image to obtain a final feature vector for similarity matching, thereby returning the result of image retrieval. The method simulates the depth convolution characteristic of the image obtained by the depth convolution neural network model to execute the image retrieval task, and the generated image representation can effectively describe the target object in the image and can improve the image retrieval accuracy.

Description

Image retrieval method based on filtering depth convolution characteristics

Technical Field

The invention relates to the technical field of image retrieval, in particular to an image retrieval method based on filtering depth convolution characteristics.

Background

With the rapid development and wide application of network communication technology, people prefer to share their daily lives through the internet, and thus a large amount of image data is uploaded to the network. Image data on networks has exhibited explosive growth. Meanwhile, the internet is an effective technical means for acquiring information by human beings, and in the face of massive images, how to query the images required by the human beings faces huge difficulties and challenges. Early image retrieval techniques were text-based, which required labeling of each image, which varied with human subjectivity, and at the same time, a single text did not effectively represent the content of the image, which limited the development of text-based image retrieval techniques. Accordingly, content-based image retrieval techniques have begun to emerge. In early developments of image retrieval, researchers often represented image content using global features such as color and texture. However, under certain lighting, occlusion, and deformation conditions, global features perform poorly, and are therefore difficult to apply to image retrieval tasks in certain scenes. In recent years, the application of deep convolutional neural networks to the field of image retrieval shows very excellent performance. The method mainly comprises the steps of extracting depth convolution characteristics from a depth convolution neural network, forming an image characteristic representation vector with differentiability by aggregating the depth convolution characteristics, carrying out characteristic matching by using the characteristic vector and returning the most similar image. Image retrieval technology based on a deep convolutional neural network has been a hotspot of current research, however, since background noise in deep convolutional features affects retrieval results, how to construct a more distinguishable image representation by using the deep convolutional features is a main difficulty and challenge facing currently.

Disclosure of Invention

The invention provides an image retrieval method based on a filtering depth convolution characteristic, aiming at the problem that the background noise in the depth convolution characteristic influences the retrieval result.

In order to solve the problems, the invention is realized by the following technical scheme:

an image retrieval method based on filtering depth convolution characteristics comprises the following steps:

step 1, inputting each data image in a data set into a deep convolution neural network model respectively, and extracting N deep convolution feature maps X of each data image_mn(p, q); wherein each depth convolution feature map corresponds to a channel;

step 2, calculating a filter graph F of each data image_m(p,q)；

Step 2.1, calculating the variance E of N depth convolution characteristics of each data image_mn；

2.2, selecting k depth convolution feature maps with larger variance from the N depth convolution feature maps of each data image as a filtering selected depth convolution feature map of each data image;

step 2.3, adding the characteristic values of pixel points at the same positions of the k filtering selected depth convolution characteristic graphs of each data image to obtain a superposition depth convolution characteristic graph of the data image; dividing the characteristic value of each pixel point of the superimposed depth convolution characteristic graph by k to obtain a filter graph F of each data image_m(p,q)；

Step 3, for each data image, filtering picture F of the data image obtained in the step 2_m(p, q) and the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) carrying out dot multiplication to obtain N filtering depth convolution characteristic maps X 'of each data image'_mn(p,q)；

Step 4, calculating the space weight graph S of each data image_m(p,q)；

Step 4.1, convolving N filtering depth of each data image into a feature map X'_mnAdding the characteristic values of all pixel points of (p, q) to obtain the comprehensive characteristic value h of N filtering depth convolution characteristic graphs of each data image_mn；

Step 4.2, adding the comprehensive characteristic values of the filtering depth convolution characteristics of the corresponding channels of all the data images to obtain N channel characteristic values h'_n；

Step 4.3, firstly, carrying out comparison on N channel characteristic values h'_nSorting and recording a channel characteristic value h'_nLarger frontThe serial numbers of the b channel characteristic values are used as the serial numbers of the selected channels; respectively selecting a depth convolution feature map corresponding to the selected channel serial number from the N depth convolution feature maps of each data image as a space selected depth convolution feature map of each data image;

step 4.4, carrying out square addition on the feature values of the pixel points at the same position on the b space selection depth convolution feature maps of each data image to obtain a space superposition depth convolution feature map S 'of each data image'_m(p,q)；

Step 4.5, superposing the space of each data image with a depth convolution characteristic map S'_m(p, q) normalizing to obtain a spatial weight map S of each data image_m(p,q)；

Step 5, for each data image, the space weight map S of the data image obtained in the step 4_m(p, q) and the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic maps X' of each data image_mn(p,q)；

Step 6, channel weight values P of N space weighted depth convolution characteristic graphs of each data image_mnAnd the integrated characteristic value phi_mnMultiplying to obtain N channel weighted depth convolution characteristic values G of each data image_mn；

Step 7, weighting depth convolution characteristic value G of N channels of each data image_mnL2 standardization and PCA whitening dimensionality reduction processing are carried out to obtain N characteristic representations G 'of each data image'_mnAnd G 'is expressed by N features of each data image'_mnConstructing a feature representation vector G 'of each data image'_m；

Step 8, inputting the image to be retrieved into the depth convolution neural network model, and extracting N depth convolution characteristic graphs X of the image to be retrieved_*n(p,q)；

Step 9, firstly, the b depth convolution characteristic graphs of the image to be retrieved corresponding to the selected channel serial number obtained in the process of calculating the space weight graph in the step 4.3 are subjected to pixel points on the same positionAdding squares of the characteristic values to obtain a spatial superposition depth convolution characteristic image of the image to be retrieved; then, normalizing the spatial superposition depth convolution characteristic graph of the image to be retrieved to obtain a spatial weight graph S of the image to be retrieved_*(p,q)；

Step 10, a space weight graph S of the image to be retrieved_*(p, q) N depth convolution feature maps X with the image to be retrieved_*n(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic graphs X' of the image to be retrieved_*n(p,q)；

Step 11, channel weight values P of N space weighted depth convolution characteristic graphs of the image to be retrieved_*NAnd the integrated characteristic value phi_*nMultiplying to obtain N channel weighted depth convolution characteristic values G of the image to be retrieved_*n；

Step 12, weighting depth convolution characteristic value G of N channels of image to be retrieved_*nL2 standardization and PCA whitening dimensionality reduction processing are carried out to obtain N characteristic representations G 'of the image to be retrieved'_*nAnd representing G 'by using N features of the image to be retrieved'_mnConstructing a feature representation vector G 'of an image to be retrieved'_*；

Step 13, calculating a feature representation vector G 'of the image to be retrieved'_*And a feature representation vector G 'of each data image in the data set'_mL2, and returning the final retrieval result in the order of the distance from small to large;

where M is 1, 2., M denotes the number of data images in the data set, N is 1, 2., N denotes the number of channels, p is 1, 2., W denotes the height of the depth convolution feature map, q is 1, 2., H denotes the height of the depth convolution feature map, k, b are set values, and e is a set constant.

In the above step 6, the nth channel weight value P of the mth data image_mnComprises the following steps:

wherein, M is 1,2The number of data images in the data set is shown, N is 1,2, the., N represents the number of depth convolution feature maps, p is 1,2, the., W represents the height of the depth convolution feature maps, q is 1,2, the., H represents the height of the depth convolution feature maps, Z is 1,2_mnA non-zero ratio, beta, of characteristic values of pixels of an nth spatially weighted depth convolution characteristic map representing an mth data image_mnAnd the response intensity value of the nth space weighted depth convolution characteristic map which represents the mth data image is a set constant value.

In the above step 6, the nth overall characteristic value Φ of the mth data image_mnComprises the following steps:

where M1, 2., M denotes the number of data images in the data set, N1, 2., N denotes the number of depth convolution feature maps, p 1, 2., W denotes the height of the depth convolution feature maps, q 1, 2., H denotes the height of the depth convolution feature maps, X ″, and_mn(p, q) represents the nth spatially weighted depth convolution feature map of the mth data image.

In the above step 11, the nth channel weight value P of the image to be retrieved_*nComprises the following steps:

wherein N is 1,2, the., N represents the number of depth convolution feature maps, p is 1,2, the., W represents the height of the depth convolution feature maps, q is 1,2, the., H represents the height of the depth convolution feature maps, Z represents the height of the depth convolution feature maps, and N represents the number of depth convolution feature maps_*nNon-zero ratio, beta, of characteristic values of pixels of an nth spatially weighted depth convolution characteristic map representing an image to be retrieved_*nAnd the response intensity value of the nth space weighted depth convolution characteristic graph representing the image to be retrieved is a set constant.

In the above step 11, the nth comprehensive characteristic value Φ of the image to be retrieved_*nComprises the following steps:

where N ═ 1, 2., N denote the number of deep convolution feature maps, p ═ 1, 2., W denote the height of the deep convolution feature maps, q ═ 1, 2., H denote the height of the deep convolution feature maps, X ″, and_*nand (p, q) represents the nth spatial weighted depth convolution characteristic map of the image to be retrieved.

Compared with the prior art, the invention provides an algorithm for filtering the depth convolution characteristics on the basis of the visual application based on the depth convolution characteristics of the image to obtain a new image representation vector based on the depth convolution characteristics, which can remarkably describe a target object in the image, can effectively inhibit background noise in the depth convolution characteristics, can form distinguishable image representation, and the experimental result proves that the image retrieval accuracy can be effectively improved by the method.

Drawings

FIG. 1 is a flow chart of an image retrieval method based on a filtered depth convolution feature.

Fig. 2 is an exemplary diagram of a filter map for calculating a depth convolution feature map for each data image.

FIG. 3 is an exemplary diagram of computing a spatial weight map.

Fig. 4 is an exemplary diagram of computing channel weighted depth convolution eigenvalues.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.

The invention provides an image retrieval method based on filtering depth convolution characteristics, as shown in figure 1, firstly, inputting an image of a data set into a pre-trained depth convolution neural network model, and extracting a depth convolution characteristic graph; secondly, filtering the depth convolution characteristics to remove background noise; then, designing a space weight for enhancing the response of the target object; secondly, enhancing the channel by using the channel weight to generate a representation vector of the image; and finally, normalizing and reducing the dimension of the expression vector of the image to obtain a final feature vector for similarity matching, thereby returning the result of image retrieval. The method simulates the depth convolution neural network model to acquire the depth convolution characteristics of the image to execute the image retrieval task, the generated image representation can effectively describe a target object in the image, the image retrieval accuracy can be improved, higher weights are respectively given to the characteristic graphs comprising the key semantic information through the space weight and the channel weight so as to improve the distinguishable performance of the image, and the method comprises the following specific steps:

1) obtaining feature representation vectors for data maps in a dataset

Step 1, inputting each data image in a data set into a deep convolution neural network model respectively, and extracting a deep convolution characteristic map of each data image.

Inputting an image in the image data set into the deep convolution neural network model, and extracting a deep convolution characteristic image X of the image_mn(p, q), where M is 1, 2., M denotes the number of data images in the dataset, N is 1, 2., N denotes the number of depth convolution feature maps (also the number of value channels, since each channel corresponds to one depth convolution feature map) per data image, p is 1, 2., W denotes the width of the depth convolution feature maps, and q is 1, 2., H denotes the height of the depth convolution feature maps.

And 2, calculating a filter map of the depth convolution characteristic map of each data image, as shown in FIG. 2.

Step 2.1, calculating the variance E of N depth convolution characteristic graphs of each data image_mn：

In the formula, symbol

Indicating averaging.

Represents X_mnThe mean value of the (p, q) depth convolution characteristic map refers to the depth convolution characteristic map X_mnAnd (q, p) adding the characteristic values of all the pixel points and dividing the sum by the number of the pixel points.

Representing a deep convolution feature map X_mnAnd (5) sequentially subtracting the mean value from the characteristic value of each pixel point in the (p, q).

And 2.2, selecting k depth convolution feature maps with larger variance from the N depth convolution feature maps of each data image as the selected filtering depth convolution feature map of each data image.

Step 2.3, adding the characteristic values of pixel points at the same positions of the k selected filtering depth convolution characteristic graphs of each data image to obtain a superposition depth convolution characteristic graph of the data image; dividing the characteristic value of each pixel point of the superimposed depth convolution characteristic graph by k to obtain a filter graph F of the data image_m(p,q)：

Step 3, for each data image, utilizing the filter image F of the data image obtained in the step 2_m(p, q) are respectively convolved with the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) carrying out dot multiplication to obtain N filtering depth convolution characteristic maps X 'of each data image'_mn(p,q)：

In the formula, symbol

Representing element dot multiplication, dot multiplication referring to the filter graph F_m(p, q) are respectively convolved with N depth convolution feature maps X_mn(p, q) multiplying the characteristic values of the pixel points at the same position.

And 4, calculating a space weight map as shown in FIG. 3.

Step 4.1, convolving N filtering depth of each data image into a feature map X'_mnAdding the characteristic values of all pixel points of (p, q) to obtain the comprehensive characteristic value h of N filtering depth convolution characteristic graphs of each data image_mn：

Step 4.2, adding the comprehensive characteristic values of the filtering depth convolution characteristic graphs of the corresponding channels of all the data images to obtain N channel characteristic values h'_n：

Step 4.3, firstly, starting from h'_nAnd selecting larger b channel characteristic values from every N channel characteristic values, and recording channel serial numbers with the larger b channel characteristic values, wherein the channel serial numbers are used as the uniform selected channel serial numbers of all the data images (namely, each data image selects the depth convolution characteristic images on the channels corresponding to the serial numbers as the space selected depth convolution characteristic images). And then, the depth convolution feature maps corresponding to the selected channel serial numbers of each data image are used as the space selected depth convolution feature maps of each data image, and the selected depth convolution feature maps are used for constructing the space weight map of each data image.

And 4.4, performing square superposition on the b selected depth convolution feature maps of each data image to obtain a spatial superposition depth convolution feature map S 'of each data image'_m(p,q)；

Step 4.5, counting the number of the tabletsSpatial superposition depth convolution feature map S 'according to image'_m(p, q) normalizing to obtain a spatial weight map S of each data image_m(p,q)。

Step 5, for each data image, utilizing the space weight map S of the data image obtained in step 4_m(p, q) are respectively convolved with the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic maps X' of each data image_mn(p,q)：

Step 6, channel weight values P of N space weighted depth convolution characteristic graphs of each data image_mnAnd the integrated characteristic value phi_mnMultiplying to obtain N channel weighted depth convolution characteristic values G of each data image_nmAs shown in fig. 4.

Channel weight value P_mn：

Wherein Z is_mnNon-zero fraction (i.e., X ″) of feature values representing pixels of a spatially weighted depth convolution feature map_mnAnd (p, q) the ratio of the number of the characteristic values of the pixel points in the (p, q) to the total number of the pixel points is greater than zero). Beta is a_mnRepresenting response intensity values of the spatially weighted depth convolved feature map,

epsilon is a small constant set to 0.0001, which is intended to ensure that the denominator and numerator are not 0.

The comprehensive characteristic value is obtained by adding the characteristic values of all pixel points of the space weighted depth convolution characteristic graph, wherein the comprehensive characteristic value phi_mnComprises the following steps:

where M1, 2., M denotes the number of data images in the data set, N1, 2., N denotes the number of depth convolution feature maps, p 1, 2., W denotes the height of the depth convolution feature maps, q 1, 2., H denotes the height of the depth convolution feature maps, X ″, and_mn(p, q) represents a spatially weighted depth convolution signature.

Then N channel weighted depth convolution eigenvalues G of each data image_mn：

G_mn＝P_mn×Φ_mn

Step 7, weighting depth convolution characteristic value G of N channels of each data image_mnL2 standardization and PCA whitening dimensionality reduction processing are carried out to obtain N characteristic representations G 'of each data image'_mnAnd G 'is expressed by N features of each data image'_mnConstructing a feature representation vector G 'of each data image'_m＝{G′_mn，n＝1,2,...,N}。

2) Obtaining a feature representation vector of an image to be retrieved

Step 8, inputting the image to be retrieved into the depth convolution neural network model, and extracting N depth convolution characteristic graphs X of the image to be retrieved_*n(p,q)。

Step 9, firstly, performing square addition of feature values of corresponding pixel points on b depth convolution feature maps of the image to be retrieved corresponding to the selected channel serial number obtained in the process of calculating the space weight map to obtain a space superposition depth convolution feature map S 'of the image to be retrieved'_*(p, q); and then, overlaying a depth convolution feature map S 'in space of the image to be retrieved'_*(p, q) normalizing to obtain a space weight map S of the image to be retrieved_*(p,q)。

Step 10, a space weight graph S of the image to be retrieved_*(p, q) N depth convolution feature maps X with the image to be retrieved_*n(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic graphs X' of the image to be retrieved_*n(P,Q)。

Step 11,Channel weight values P of N space weighted depth convolution characteristic graphs of image to be retrieved_*nAnd the integrated characteristic value phi_*nMultiplying to obtain N channel weighted depth convolution characteristic values G of the image to be retrieved_*n。

Step 12, weighting depth convolution characteristic value G of N channels of image to be retrieved_*nL2 standardization and PCA whitening dimensionality reduction processing are carried out to obtain N characteristic representations G 'of the image to be retrieved'_*nAnd representing G 'by using N features of the image to be retrieved'_*nConstructing a feature representation vector G 'of an image to be retrieved'_*＝{G′_*n，n＝1,2,...,N}。

3) Retrieval by feature representation vectors

Step 13, calculating a feature representation vector G 'of the image to be retrieved'_*And a feature representation vector G 'of each data image in the data set'_mL2, smaller distance indicates more similar images, and the final search results are returned in order of smaller distance to larger distance.

The invention provides an image retrieval method based on a filtering depth convolution characteristic, which aims to eliminate background noise and highlight a target object in a deep convolution characteristic, provides a novel method for weighting a space and a channel of the depth convolution characteristic, and applies the method to image retrieval. We propose a new image retrieval framework. The key technology mainly comprises a filter, a space weight and a channel weight. When we focus on the channel, the filter can eliminate the noise interference, which is beneficial to select more distinctive channels. Spatial weighting can enhance the response of target objects in a spatial location, playing a role in highlighting key features, further suppressing background noise. Channel weights may enhance certain channels that do not respond prominently but contain critical functions, and may also play a role in suppressing visual abruptness. Experimental results show that the method can improve the performance of image retrieval.

It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims

1. An image retrieval method based on filtering depth convolution characteristics is characterized by comprising the following steps:

step 2, calculating a filter graph F of each data image_m(p，q)；

step 2.3, adding the characteristic values of pixel points at the same positions of the k filtering selected depth convolution characteristic graphs of each data image to obtain a superposition depth convolution characteristic graph of the data image; dividing the characteristic value of each pixel point of the superimposed depth convolution characteristic graph by k to obtain a filter graph F of each data image_m(p，q)；

Step 3, for each data image, filtering picture F of the data image obtained in the step 2_m(p, q) and the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) carrying out dot multiplication to obtain N filtering depth convolution characteristic maps X 'of each data image'_mn(p，q)；

Step 4, calculating the space weight graph S of each data image_m(p，q)；

Step 4.1, convolving N filtering depth of each data image into a feature map X'_mnAdding the characteristic values of all pixel points of (p, q) to obtain N filtering depth convolution characteristic graphs of each data imageIntegrated characteristic value h of_mn；

Step 4.3, firstly, carrying out comparison on N channel characteristic values h'_nSorting and recording a channel characteristic value h'_nThe serial number of the larger first b channel characteristic values is used as the serial number of the selected channel; respectively selecting a depth convolution feature map corresponding to the selected channel serial number from the N depth convolution feature maps of each data image as a space selected depth convolution feature map of each data image;

step 4.4, carrying out square addition on the feature values of the pixel points at the same position on the b space selection depth convolution feature maps of each data image to obtain a space superposition depth convolution feature map S 'of each data image'_m(p，q)；

Step 4.5, superposing the space of each data image with a depth convolution characteristic map S'_m(p, q) normalizing to obtain a spatial weight map S of each data image_m(p，q)；

Step 5, for each data image, the space weight map S of the data image obtained in the step 4_m(p, q) and the N depth convolution feature maps X of the data image obtained in the step 1_mn(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic maps X' of each data image_mn(p，q)；

Step 8, inputting the image to be retrieved into the depth convolution neural network model, and extracting N depth convolution characteristic graphs X of the image to be retrieved_*n(p，q)；

Step 9, firstly, performing square addition of feature values of pixel points at the same position on b depth convolution feature maps of the image to be retrieved corresponding to the selected channel serial number obtained in the process of calculating the spatial weight map in the step 4.3 to obtain a spatial superposition depth convolution feature map of the image to be retrieved; then, normalizing the spatial superposition depth convolution characteristic graph of the image to be retrieved to obtain a spatial weight graph S of the image to be retrieved_*(p，q)；

Step 10, a space weight graph S of the image to be retrieved_*(p, q) N depth convolution feature maps X with the image to be retrieved_*n(p, q) performing dot multiplication to obtain N space weighted depth convolution characteristic graphs X' of the image to be retrieved_*n(p，q)；

Step 13 and step 12 are used for calculating a feature representation vector G 'of the image to be retrieved'_*And the feature expression vector G 'of each data image in the data set obtained in the step 7'_mL2, and returning the final retrieval result in the order of the distance from small to large;

2. The method as claimed in claim 1, wherein in step 6, the nth channel weight value P of the mth data image_mnComprises the following steps:

where M1, 2., M denotes the number of data images in the data set, N1, 2., N denotes the number of depth convolution feature maps, p 1, 2., W denotes the height of the depth convolution feature maps, q 1, 2., H denotes the height of the depth convolution feature maps, Z denotes the height of the depth convolution feature maps, and M denotes the number of data images in the data set_mnA non-zero ratio, beta, of characteristic values of pixels of an nth spatially weighted depth convolution characteristic map representing an mth data image_mnAnd the response intensity value of the nth space weighted depth convolution characteristic map which represents the mth data image is a set constant value.

3. The method as claimed in claim 1, wherein in step 6, the nth integrated feature value Φ of the mth data image is_mnComprises the following steps:

4. The method of claim 1The image retrieval method based on the filtering depth convolution characteristic is characterized in that in the step 11, the weight value P of the nth channel of the image to be retrieved is_*nComprises the following steps:

5. The image retrieval method based on the filter depth convolution characteristic as claimed in claim 1, wherein in step 11, the nth comprehensive characteristic value Φ of the image to be retrieved_*nComprises the following steps: