CN112000827A

CN112000827A - Hardware image retrieval method and system based on deep learning

Info

Publication number: CN112000827A
Application number: CN202010876226.3A
Authority: CN
Inventors: 蔡征兵; 郑律; 曾胜
Original assignee: Guangzhou Souliaoyi Network Technology Co ltd
Current assignee: Guangzhou Souliaoyi Network Technology Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-27

Abstract

The invention relates to a hardware image retrieval method and system based on deep learning. The method comprises the steps of obtaining a hardware image with a category label; carrying out data preprocessing on the hardware image with the category label; constructing a neural network according to the preprocessed hardware image with the category label; and searching the hardware image to be searched by utilizing the trained neural network. The hardware image retrieval method and system based on deep learning provided by the invention improve the speed and accuracy of hardware image retrieval.

Description

Hardware image retrieval method and system based on deep learning

Technical Field

The invention relates to the field of image retrieval, in particular to a hardware image retrieval method and system based on deep learning.

Background

At present, the industrial image retrieval application mainly comprises two modes. Firstly, the image features are extracted by a deep ranking model, and then the similarity is compared by calculating the distance between the two image features, the main technical problem of the method is that the triplets data required by the training model are very difficult to acquire, and some acquisition methods need additional data, such as [ the click quantity combined with the images ]. The second approach is adopted by the paper Deep Learning of Binary Hash Codes for Fast Image Retrieval (hereinafter referred to as Binary Hash Codes) and the paper Visual Search at eBay (hereinafter referred to as eBay). Features (including features of a full connection layer and Binary Hash code features proposed in the Binary Hash Codes paper) are extracted through a common image classification model, and similarity is compared by calculating the distance between the two image features. The main problem with this approach is that the accuracy of image searching is generally not as good as the first approach, but this approach has two major advantages. Firstly, a great deal of energy is not needed to be spent on collecting triplets data, secondly, the model training time is shorter, the requirement on the performance of the server is lower, and therefore the method is suitable for small and medium-sized data. However, the method proposed by the binary hash codes paper is not suitable for large and medium-sized data volumes (hundreds of thousands of images), because coarse-grained retrieval (retrieval based on binary hash codes) is to search the whole image database, and a lot of time is wasted in searching irrelevant image categories; the method proposed in the ebay paper is not suitable for small and medium size data (less than 5 ten thousand graphs), and the final search accuracy is not high due to no reordering for fine-grained retrieval (retrieval based on full-connected layer fc 6).

Disclosure of Invention

The invention aims to provide a hardware image retrieval method and a hardware image retrieval system based on deep learning, which are used for improving the speed and the accuracy of hardware image retrieval.

In order to achieve the purpose, the invention provides the following scheme:

a hardware image retrieval method based on deep learning comprises the following steps:

acquiring a hardware image with a category label;

carrying out data preprocessing on the hardware image with the category label; the pretreatment comprises the following steps: adjusting the size of an image, converting the type of data, normalizing, standardizing and enhancing the image;

constructing a neural network according to the preprocessed hardware image with the category label; the neural network comprises a convolution layer, a full connection layer, a Batch Normalization layer, an activation function Relu layer and a Dropout layer which are connected in sequence; the Dropout layer is respectively connected with the first layer network and the second layer network; wherein the first layer network comprises a fully connected layer; the second layer network comprises two fully connected layers and a sigmoid layer between the two fully connected layers; the neural network takes a hardware image as input and takes a category label of the hardware image as output; the output of the Relu layer of the activation function is the feature of the fc6 layer, and the feature of the fc6 layer is used for fine-grained retrieval; the output of the sigmoid layer is converted into a binary hash code, and the binary hash code is used for coarse-grained retrieval;

and searching the hardware image to be searched by utilizing the trained neural network.

Optionally, the data preprocessing is performed on the hardware image with the category label, and specifically includes:

adjusting the image size of the hardware image with the category label to obtain the hardware image with the category label after size adjustment;

converting the data type of the hardware image with the category label after the size adjustment into a numpy array data type to obtain the hardware image with the category label after the data type conversion;

carrying out normalization processing on the hardware image with the class label after the data type conversion to obtain the hardware image with the class label after the normalization processing;

standardizing the hardware image with the category label after the normalization processing to obtain the hardware image with the category label after the standardization processing;

and carrying out image enhancement processing on the hardware image with the category label after the standardization processing to obtain the hardware image with the category label after the pretreatment.

Optionally, the method of constructing a neural network according to the preprocessed hardware image with the category label further includes:

and training the neural network to obtain the trained neural network.

Optionally, the training is utilized to search the hardware image to be searched by the neural network, and the method specifically includes:

carrying out data preprocessing on the hardware image to be retrieved to obtain a preprocessed hardware image to be retrieved;

determining a plurality of category labels (the first m labels with the maximum probability), binary retrieval vectors and features of an fc6 layer of the preprocessed hardware image to be retrieved according to the first layer network and the second layer network of the neural network;

performing coarse-grained search according to the category labels and the binary retrieval vector to obtain a plurality of images with the difference smaller than a set threshold value with the preprocessed hardware image to be retrieved; and performing fine-grained search by using the characteristics of the fc6 layer.

A hardware image retrieval system based on deep learning comprises:

the hardware image acquisition module is used for acquiring hardware images with category labels;

the preprocessing module is used for preprocessing the data of the hardware image with the category label; the pretreatment comprises the following steps: adjusting the size of an image, converting the type of data, normalizing, standardizing and enhancing the image;

the neural network construction module is used for constructing a neural network according to the preprocessed hardware images with the class labels; the neural network comprises a convolution layer, a full connection layer, a Batch Normalization layer, an activation function Relu layer and a Dropout layer which are connected in sequence; the Dropout layer is respectively connected with the first layer network and the second layer network; wherein the first layer network comprises a fully connected layer; the second layer network comprises two fully connected layers and a sigmoid layer between the two fully connected layers; the neural network takes a hardware image as input and takes a category label of the hardware image as output; the output of the Relu layer of the activation function is the feature of the fc6 layer, and the feature of the fc6 layer is used for fine-grained retrieval; the output of the sigmoid layer is converted into a binary hash code, and the binary hash code is used for coarse-grained retrieval;

and the retrieval module is used for retrieving the hardware image to be retrieved by utilizing the neural network.

Optionally, the preprocessing module specifically includes:

the image size adjusting unit is used for adjusting the image size of the hardware image with the category label to obtain the hardware image with the category label after size adjustment;

the data type conversion unit is used for converting the data type of the hardware image with the category label after the size adjustment into a numpy array data type to obtain the hardware image with the category label after the data type conversion;

the normalization processing unit is used for performing normalization processing on the hardware image with the category label after the data type conversion to obtain the hardware image with the category label after the normalization processing;

the standardized processing unit is used for carrying out standardized processing on the hardware image with the classified label after the standardized processing to obtain the hardware image with the classified label after the standardized processing;

and the image enhancement processing unit is used for carrying out image enhancement processing on the hardware image with the category label after the standardization processing to obtain the hardware image with the category label after the preprocessing.

Optionally, the method further includes:

and the trained neural network determining module is used for training the neural network to obtain the trained neural network.

Optionally, the retrieving module specifically includes:

the preprocessing unit is used for preprocessing the hardware image to be retrieved to obtain a preprocessed hardware image to be retrieved;

the characteristic determining unit is used for determining a plurality of categories, binary retrieval vectors and fc 6-layer characteristics of the preprocessed hardware images to be retrieved according to the trained neural network;

the retrieval unit is used for performing coarse-grained search according to the plurality of category labels and the binary retrieval vector to obtain a plurality of images, the difference between the images and the preprocessed hardware image to be retrieved is smaller than a set threshold value; and performing fine-grained search by using the characteristics of the fc6 layer.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the hardware image retrieval method and system based on deep learning, the preprocessed hardware images with the class labels are used for constructing the neural network, the data for constructing the neural network is simple, the neural network can be used for retrieving a large number of images, and the retrieval accuracy is improved on the basis of improving the retrieval speed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a hardware image retrieval method based on deep learning according to the present invention;

fig. 2 is a schematic structural diagram of a hardware image retrieval system based on deep learning according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a schematic flow chart of a hardware image retrieval method based on deep learning, as shown in fig. 1, the hardware image retrieval method based on deep learning provided by the present invention includes:

and S101, acquiring the hardware image with the category label.

S102, carrying out data preprocessing on the hardware image with the category label; the pretreatment comprises the following steps: image resizing, data type conversion, normalization, and image enhancement.

S102 specifically comprises the following steps:

and adjusting the image size of the hardware image with the category label to obtain the hardware image with the category label after size adjustment. The resized labeled hardware image is typically 244 x 244.

And converting the data type of the hardware image with the category label after the size adjustment into a numpy array data type to obtain the hardware image with the category label after the data type conversion. The hardware image with the category label after data type conversion is an array of (n, 244, 244, 3), where n is the number of images, 244 is the size of the image, and 3 is the RGB value.

And carrying out normalization processing on the hardware image with the category label after the data type conversion to obtain the hardware image with the category label after the normalization processing. I.e. to pixel point values x_iCarry out x_iAnd carrying out treatment of/255.0 to convert the value range of the pixel value from 0-255 to 0-1.

And carrying out standardization processing on the hardware image with the category label after the standardization processing to obtain the hardware image with the category label after the standardization processing. I.e. for pixel point value x_iIs carried out (x)_iMean)/std treatment. Wherein mean and std are each X_trainAll pixelsMean and standard deviation of point values. The image standardization is to realize centralized processing of data through mean value removal, and the data centralization accords with a data distribution rule according to convex optimization theory and data probability distribution related knowledge, so that a generalization effect after training is obtained more easily.

And carrying out image enhancement processing on the hardware image with the category label after the standardization processing to obtain the hardware image with the category label after the pretreatment. The specific process of image enhancement processing is as follows:

an angle is randomly selected from the range of 0 to 90 degrees to rotate the image.

Randomly selecting a multiple from 0.9 to 1.1 to make the image simultaneously perform the same degree of scaling operation in the length direction and the width direction. When the parameter is greater than 0 and less than 1, the zoom-in operation is performed, and when the parameter is greater than 1, the zoom-out operation is performed.

The image is randomly flipped horizontally and flipped up and down.

S103, constructing a neural network according to the preprocessed hardware image with the category label; the neural network comprises a convolution layer, a full connection layer, a Batch Normalization layer, an activation function Relu layer and a Dropout layer which are connected in sequence; the Dropout layer is respectively connected with the first layer network and the second layer network; wherein the first layer network comprises a fully connected layer; the second layer network comprises two fully connected layers and a sigmoid layer between the two fully connected layers; the neural network takes the hardware image as input, and takes the category label of the hardware image as output. The convolutional layer is a convolutional layer of VGG 16. The output of the Relu layer of the activation function is the feature of the fc6 layer, and the feature of the fc6 layer is used for fine-grained retrieval; the output of the sigmoid layer is converted into a binary hash code, and the binary hash code is used for coarse-grained retrieval.

Wherein, add the Batch Normalization layer and not only can make the process of convergence accelerate greatly, greatly must promote the training speed, can effectively alleviate the gradient moreover and disappear and the gradient explosion problem. The addition of the Dropout layer can obviously reduce the overfitting phenomenon, thereby improving the robustness of the model.

After S103, further comprising:

and training the neural network to obtain the trained neural network.

The specific process of training the neural network is as follows:

when the first layer network is trained, the convolutional layer part adopts a convolutional layer which is trained in advance through imagenet. The weights of the layer two network are initialized randomly. The training process can be divided into four parts, namely forward conduction, loss function, backward conduction and weight updating.

In forward conduction, let X_trainThrough the whole network, assuming that the pictures have n categories in total, the output value of the second layer network is s_out＝[s₁，s₂，...s_n]^TWill S_outSubstituting the softmax function into the obtained q is softmax (S)_out) Passing through a loss function

To calculate the loss, p_iFor the true class of picture i, q_iIs the softmax value of the ith graph. Then conducting backward, according to the formula

The weight is updated, alpha is the learning rate, and the weight can be adjusted by self. And the above steps are repeated.

When training the second layer of network, take Dropout already trained by the first layer of network and the previous network layers, and then add a fully-connected hidden layer and a new fully-connected layer after the Dropout layer. And then training the network according to four steps of forward conduction, loss function, backward conduction and weight updating. It should be noted that the network model is also a picture classification model, so that the output of the full connection layer needs to be substituted into the softmax function, and the loss function is also a loss function

And S104, retrieving the hardware image to be retrieved by utilizing the trained neural network.

S104 specifically comprises the following steps:

and carrying out data preprocessing on the hardware image to be retrieved to obtain a preprocessed hardware image to be retrieved.

And determining a plurality of class labels, binary retrieval vectors and the characteristics of the fc6 layer of the preprocessed hardware image to be retrieved according to the first layer network and the second layer network of the neural network.

As a specific embodiment, a picture to be retrieved without a category label is given, and data preprocessing is performed on the picture to be retrieved to obtain an image data set X.

And carrying out forward conducting operation of the neural network on the image array X. The feature vector Y of the network can be obtained through the first layer network_fc6And the first k classes with the highest predicted probability (topk accuracy)>99.9%), and is noted as C^k＝[c₁，c₂，...c_k]。C^kClass c of middle row first₁Is the prediction category of the picture to be retrieved. The feature vector Y of the sigmoid layer can be obtained through the second layer network_sigmoid. Further will be

Converted into binary search vector with the calculation formula of

Wherein j represents

The j-th bit element of (1). The output of this step is the feature vector Y of the FC6 layer of the search image_fc6The first k classes C^k＝[c₁，c₂，...c_k]Binary search vector Y_binAnd a prediction class c of pictures₁。

According to C^kAnd Y_binA coarse grain search is performed. Through C^kTo reduce the search space, i.e. search only for category C^k＝[c₁，c₂，...c_k]The pictures of (2) do not need to search for other types of pictures. Followed by Y_binAnd comparing the Hamming distances of the binary retrieval vectors of the pictures in the search space, and finally selecting m pictures with the closest Hamming distances from the search space. The Hamming distance is calculated by the following formula

Assuming that x, y are both n-bit encoded,

representing an exclusive or. The output of this step is m pictures screened from the picture database.

According to C^kAnd Y_binFine-grained search is carried out, the search space is changed into m pictures screened in the steps, and Y of the picture to be retrieved is used_fc6Comparing Y of pictures in search space_fc6And (4) reordering m graphs of the search space from near to far according to the Euclidean distance to obtain a final ordering result. And then determining the picture most similar to the picture to be searched.

Fig. 2 is a schematic structural diagram of a hardware image retrieval system based on deep learning, as shown in fig. 2, the hardware image retrieval system based on deep learning provided by the present invention includes: the hardware image acquisition module 201, the preprocessing module 202, the neural network construction module 203 and the retrieval module 204.

The hardware image obtaining module 201 is used for obtaining a hardware image with a category label.

The preprocessing module 202 is configured to perform data preprocessing on the hardware image with the category label; the pretreatment comprises the following steps: image resizing, data type conversion, normalization, and image enhancement.

The neural network construction module 203 is used for constructing a neural network according to the preprocessed hardware images with the category labels; the neural network comprises a convolution layer, a full connection layer, a Batch Normalization layer, an activation function Relu layer and a Dropout layer which are connected in sequence; the Dropout layer is respectively connected with the first layer network and the second layer network; wherein the first layer network comprises a fully connected layer; the second layer network comprises two fully connected layers and a sigmoid layer between the two fully connected layers; the neural network takes the hardware image as input, and takes the category label of the hardware image as output. The output of the Relu layer of the activation function is the feature of the fc6 layer, and the feature of the fc6 layer is used for fine-grained retrieval; the output of the sigmoid layer is converted into a binary hash code, and the binary hash code is used for coarse-grained retrieval.

The retrieval module 204 is configured to retrieve the hardware image to be retrieved by using the neural network.

The preprocessing module 202 specifically includes: the image processing device comprises an image size adjusting unit, a data type converting unit, a normalization processing unit and an image enhancement processing unit.

And the image size adjusting unit is used for adjusting the image size of the hardware image with the category label to obtain the hardware image with the category label after size adjustment.

The data type conversion unit is used for converting the data type of the hardware image with the category label after the size adjustment into a numpy array data type to obtain the hardware image with the category label after the data type conversion.

And the normalization processing unit is used for performing normalization processing on the hardware image with the category label after the data type conversion to obtain the hardware image with the category label after the normalization processing.

And the standardization processing unit is used for carrying out standardization processing on the hardware image with the category label after the standardization processing to obtain the hardware image with the category label after the standardization processing.

The invention provides a hardware image retrieval system based on deep learning, which further comprises: and a trained neural network determination module.

The retrieval module specifically comprises: the device comprises a preprocessing unit, a feature determining unit and a retrieval unit.

And the preprocessing unit is used for preprocessing the data of the hardware image to be retrieved to obtain a preprocessed hardware image to be retrieved.

The feature determination unit is used for determining a plurality of categories, binary retrieval vectors and features of the fc6 layer of the preprocessed hardware image to be retrieved according to the trained neural network.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A hardware image retrieval method based on deep learning is characterized by comprising the following steps:

acquiring a hardware image with a category label;

2. The hardware image retrieval method based on deep learning of claim 1, wherein the hardware image with the category label is subjected to data preprocessing, and specifically comprises:

3. The deep learning-based hardware image retrieval method according to claim 1, wherein a neural network is constructed according to the preprocessed hardware images with the class labels, and then the method further comprises:

and training the neural network to obtain the trained neural network.

4. The hardware image retrieval method based on deep learning of claim 3, wherein the retrieval of the hardware image to be retrieved by using the trained neural network specifically comprises:

determining a plurality of categories, binary retrieval vectors and fc6 layer characteristics of the preprocessed hardware images to be retrieved according to the first layer network and the second layer network of the neural network;

5. The utility model provides a five metals image retrieval system based on deep learning which characterized in that includes:

6. The hardware image retrieval system based on deep learning of claim 5, wherein the preprocessing module specifically comprises:

7. The deep learning-based hardware image retrieval system according to claim 5, further comprising:

8. The hardware image retrieval system based on deep learning of claim 7, wherein the retrieval module specifically comprises: