CN111597367A

CN111597367A - Three-dimensional model retrieval method based on view and Hash algorithm

Info

Publication number: CN111597367A
Application number: CN202010418065.3A
Authority: CN
Inventors: 张满囤; 燕明晓; 王红; 田琪; 崔时雨; 齐畅; 魏玮; 吴清; 王小芳
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-08-28
Anticipated expiration: 2040-05-18
Also published as: CN111597367B

Abstract

The invention relates to a three-dimensional model retrieval method based on view and Hash algorithm, which comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing the view pictures; constructing a convolution neural network based on AlexNet: connecting two full-connection layers through a view layer after the 5 layers of convolution layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes; training a convolution neural network based on AlexNet by utilizing an existing three-dimensional model data set, wherein the characteristics of each model are represented by Hash characteristics learned by the trained network; the similarity between any given query three-dimensional model and the three-dimensional models in the three-dimensional model database is calculated by utilizing the Hamming distance, and the first models with the minimum Hamming distance are selected as results to be output to a retrieval list, so that the retrieval efficiency of the three-dimensional models can be improved.

Description

Three-dimensional model retrieval method based on view and Hash algorithm

Technical Field

The technical scheme of the invention relates to the retrieval of a three-dimensional (3D) model, in particular to a three-dimensional model retrieval method based on a view and a Hash algorithm.

Background

With the advent of the big data era, image acquisition becomes simpler and more diversified. In recent years, due to the large amount of low-cost 3D acquisition equipment and 3D modeling tools, the number of three-dimensional models is rapidly increased, and very huge three-dimensional model resources are already available on a network. The three-dimensional model is more and more widely applied to the aspects of three-dimensional games, virtual reality, industrial design, movie and television entertainment and the like, and the requirement for accurate and efficient three-dimensional object retrieval is increasingly shown.

At present, the retrieval work of the three-dimensional model can be mainly divided into two aspects: model-based retrieval and view-based retrieval. Model-based retrieval is primarily from the perspective of three-dimensional data to represent model features such as polygonal meshes, voxel meshes, point clouds, or implicit surfaces. The model-based method can better retain the original data information and the space geometric characteristics of the three-dimensional model. However, in the real world, it is sometimes difficult to directly represent a model by three-dimensional data, and currently, there are few open-source three-dimensional feature model databases. The view-based retrieval is carried out by representing a three-dimensional model by a group of two-dimensional images, reducing the matching dimension between the three-dimensional models to a two-dimensional layer, and inquiring the model to be searched by matching the similarity of the views, so that the over-fitting problem can be avoided to a great extent. However, in the current view-based algorithm, the extracted high-dimensional features are measured in the euclidean space to complete similarity retrieval, and the retrieval efficiency is low. How to improve the model retrieval efficiency is the key to improve the three-dimensional model retrieval performance.

Disclosure of Invention

Aiming at the defect of low algorithm retrieval efficiency of the current three-dimensional model based on view retrieval, the invention provides a three-dimensional model retrieval method based on a view and a Hash algorithm. According to the method, a Hash algorithm is added to the last layer of the convolutional neural network, after a model extracted from the convolutional layer is processed by a view layer, high-dimensional features are converted into Hash code features through a Hash layer, and then the similarity of the model is calculated in a low-dimensional Hamming space by utilizing Hamming distance, so that the model retrieval efficiency is improved.

The technical scheme adopted by the invention for solving the technical problems is as follows: the method comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing the view pictures;

constructing a convolution neural network based on AlexNet: connecting two full-connection layers through a view layer after the 5 layers of convolution layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;

training a convolution neural network based on AlexNet by utilizing an existing three-dimensional model data set, wherein the characteristics of each model are represented by Hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional models in the three-dimensional model database by using the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the models are represented, the smaller the Hamming distance is, the more similar the models are represented, and a plurality of models with the most advanced sequence are selected as results to be output to a retrieval list according to the sequence of the Hamming distance from small to large.

In the above retrieval method, model scale standardization processing is performed on different three-dimensional models before obtaining a plurality of view pictures, and since models on a network are various and large in number, standardization processing needs to be performed on all models in a data set in order to avoid being influenced by the size of the models in the retrieval process. The models with different scales are scaled into the cube with the side length of 2 by scaling the models, so that the uniformity and the usability of model characteristics can be ensured. The method comprises the following specific steps:

step 2-1, reading the information of each point of the three-dimensional model, and finding the coordinate point (x) with the minimum model_min,y_min,z_min) And the maximum coordinate point (x) of the model_max,y_max,z_max)。

2-2, calculating a difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube;

step 2-3, zooming the model to obtain a standardized model: for the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:

x′＝(x-x_min)×2/l-1

y′＝(y-y_min)×2/l-1

z′＝(z-z_min)×2/l-1

after standardization, the coordinates of all points of the model are located in [ -1,1], and the model is located in a cube with the side length of 2, so that a standardized model is obtained.

In the above retrieval method, the process of obtaining the multi-view picture is as follows: and arranging a virtual camera array around the model, taking 12 view pictures by each model, normalizing a plurality of view pictures into a uniform size, and using the normalized view pictures as the input of the convolutional neural network.

Step 3-1, placing the standardized model at the body center of the regular icosahedron, placing virtual cameras at 12 vertexes of the regular icosahedron for shooting, and obtaining a group of 12 views of the model with the size of 256 multiplied by 256;

step 3-2, cutting the multi-view of the model into 227 multiplied by 227 size as the input of the convolutional neural network, wherein the cutting method comprises the following steps:

left＝C_w/2-C′_w/2

top＝C_h/2-C_h′/2

right＝left+C′_w

bottom＝top+C′_h

wherein top, bottom, left, right respectively represent the new size (C'_w,C′_h) In the original size (C)_w,C_h) Upper, lower, left, and right boundaries of the crop in (1).

In the above search method, the specific structure of the convolution neural network based on AlexNet is as follows:

step 4-1, sequentially inputting 12 views of 227 × 227 sizes of all models into a convolutional neural network, acquiring local features of an image by using a convolutional pooling layer, wherein the convolutional layer and the pooling layer are specifically set as follows:

the first layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional kernel has a size of 11 × 11, the step size is 4, and the activation function is set to the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.

The second layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional core has a size of 5 × 5, the step size is 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.

The third layer comprises a convolutional layer, the size of the convolutional layer convolutional core is 3 x 3, the step size is 1, and the activation function is set to be a Relu function.

The fourth layer comprises a convolutional layer with a convolutional kernel size of 3 x 3, step size 1, and activation function set to the Relu function.

The fifth layer comprises a convolutional layer and a max pooling layer, the convolutional layer has the size of 3 x 3 and the step size of 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.

And 4-2, after carrying out convolution layer processing, adding a view layer after the fifth convolution layer, carrying out view layer processing on the features of the 12 pictures of each model after convolution processing, comparing the 12 pictures with the view layer, taking the feature maximum value of each dimension of each picture, generating a feature descriptor of the three-dimensional model, and inputting the feature descriptor into a full-connection layer for processing. The setting of 2 layers in total of the full connection layer is the same, 4096 neurons are set, the Relu activation function is added to avoid gradient disappearance, and the dropout layer is added to randomly set the value of the neuron to 0, so that network parameters are reduced, the complexity is reduced, and overfitting is prevented.

And 4-3, adding a hash layer after the full connection layer, wherein the layer contains k hidden layer neurons (namely the bits of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features output by a full connection layer to a low-dimensional space to form a low-dimensional Hash feature f_nFurther converting it into discrete hash code b_nThe conversion process is as follows_n＝sgn(f_n-0.5). Simultaneous setting of quantization loss function L_qlTo control the error of the hash code quantization process.

N is the number of input samples, and k is the number of bits of the hash code.

When the network is trained, a public Princeton three-dimensional model data set ModelNet40 is used, after model scale standardization processing and multi-view picture normalization processing, training set data are input into a convolution neural network based on AlexNet for training, network parameters are optimized, and a network model is generated; and then testing the model test set by using the generated network model. The invention uses a Tensorflow deep learning framework with a language of Python 3.6.

The calculation process of the Hamming distance is as follows:

and obtaining hash code characteristics corresponding to the characteristics of each model, wherein the similarity between the models is represented by a Hamming distance D, the greater the Hamming distance is, the more dissimilar the models are represented, and the smaller the Hamming distance is, the more similar the models are represented. The calculation method of the Hamming distance comprises the following steps

b_i,b_jIs the hash code characteristic of both models,

is an exclusive or operation; for any query three-dimensional model Q, similarity measurement is carried out on the three-dimensional model Q and the three-dimensional models in the three-dimensional model database M, and the model Q is matched^*The calculation process of (2) is as follows:

S(Q,M)＝argminD(b_i,b_j)

s represents the similarity between models, M_mRepresents the mth model in the database (1 is not less than m and not more than N)^*)，N^*The number of samples in the database; and finally outputting the 10 models with the highest similarity to the model to a retrieval list as a result through the calculation.

Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the task of three-dimensional model retrieval efficiency, an algorithm based on view and Hash learning is provided. The method simultaneously integrates the advantages of convolutional neural network, multi-view and Hash algorithm retrieval, and obtains better results in three-dimensional model retrieval. The convolutional network design of the invention is to process a plurality of views by utilizing the convolutional layer to generate a view pool (view layer), combine the multiple views of a three-dimensional model together, input the combined views into the network at the back to extract features, add a hash layer after the processing of a full connection layer, the hash layer is the last layer, learn the hash features by a hash algorithm after the high-dimensional features, control the loss error of hash quantization, generate almost lossless hash codes, and improve the three-dimensional retrieval precision and efficiency.

2. The retrieval method carries out scale standardization processing on the initially obtained three-dimensional model data, so that the method is suitable for various models on a data set or a network, and can avoid the problem that the extracted features of the models are influenced due to overlarge size difference of the models. In order to test the performance of the algorithm, the data set is compared with the existing algorithm in the ModelNet40, and the result shows that the method has good performance.

3. According to the method, the specific quantification loss function is added after the Hash layer is introduced to control the quantification error in the Hash code conversion process, the retrieval efficiency is improved, and the low-dimensional Hash characteristic enables the Hamming distance to be used for rapid retrieval, so that the retrieval efficiency is guaranteed.

Drawings

FIG. 1 is a general flow diagram of the present invention.

FIG. 2 is a result of a normalization process for an example three-dimensional model of the present invention.

FIG. 3 is a two-dimensional projection process of a three-dimensional model of the present invention.

FIG. 4 is a set of two-dimensional views obtained by projection of an example model in accordance with the present invention.

Fig. 5 is a network hierarchy diagram of the present invention.

FIG. 6 is a ROC plot of the performance of the present invention compared to other advanced algorithms on a ModelNet40 data set. The corresponding literature for the other 5 algorithms in fig. 6 is as follows.

[1]Su H,Maji S,Kalogerakis E,et al.Multi-view convolutional neuralnetworks for 3D shape recognition//IEEE International Conference on ComputerVision,Santiago,2015:945-953.

[2]Wu N Z,Song S,Khosla A,et al.3D shapenets:a deep representationfor volumetric shape modeling//2015IEEE Conference on Computer Vision andPattern Recognition(CVPR),Boston,MA,2015:1912-1920.

[3]Cheng HC,Lo C H,Chu CH,Kim YS.Shape similarity measurement for 3Dmechanical part using D2shape distribution and negative featuredecomposition.Computers in Industry,2010,62(3):269-280.

[4]Kun Zhou,Minmin Gong,Xin Huang,Baining Guo.Data-parallel octreesfor surface reconstruction.IEEE transactions on visualization and computergraphics,2011,17(5):669–681.

Detailed Description

The invention will be further described with reference to the accompanying drawings, but the scope of the invention is not limited thereto.

As shown in fig. 1, the three-dimensional model retrieval method based on the view and hash algorithm of the present invention mainly includes 7 modules: inputting a three-dimensional model; standardizing the model; acquiring a two-dimensional view of the model; designing a convolutional neural network structure; training a convolutional neural network structure; generating model features; and searching model similarity.

1. Input model module

The user selects the input three-dimensional model by himself, the invention uses ModelNet40 data set disclosed by Princeton university to carry out experiment, the data set comprises 40 types of universal model types, the model of each type is divided into a training set and a testing set, and the invention uses 9461 models of the training set to carry out training.

2. Model standardization

The models on the network are various and huge. In order to avoid the influence of the size scale of the model in the retrieval process, the scale standardization process needs to be carried out on all the models in the data set. For the airplane model in fig. 2, the model normalization is implemented by the following steps:

step 2-1, reading the information of each point of the airplane model, and finding the coordinate point (x) with the minimum model_min,y_min,z_min) And the maximum coordinate point (x) of the model_max,y_max,z_max)。

Step 2-2 calculation of (x)_max-x_min),(y_max-y_min),(z_max-z_min) And taking the maximum value of the three as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube.

And 2-3, zooming the model to obtain a standardized model. For the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:

x′＝(x-x_min)×2/l-1

y′＝(y-y_min)×2/l-1

z′＝(z-z_min)×2/l-1

after normalization, the coordinates of all points of the model are located at [ -1,1 ]. All point coordinates after the model normalization as shown in fig. 2 are located at-1, and the model is in a cube with a side length of 2.

3. Obtaining a two-dimensional view of a model

Step 3-1 as shown in fig. 3, the model is placed at the center of the regular icosahedron, and virtual cameras are placed at 12 vertices of the regular icosahedron for shooting, so as to obtain a group of 12 views of the model. Fig. 4 shows 12 views of 256 × 256 size taken by an example airplane model.

Step 3-2 crops the multi-view of the model to 227 x 227 size as input to the convolutional neural network. The shearing method comprises the following steps:

left＝C_w/2-C′_w/2

top＝C_h/2-C′_h/2

right＝left+C′_w

bottom＝top+C′_h

wherein C is_w＝C_h＝256,C′_w＝C′_h227, calculated left 15, right 242, top 15, bottom 242.

4. Designing convolutional neural network structures

Step 4-1, inputting the clipped model multiple views into a convolutional neural network, wherein the network structure is shown in fig. 5, the local features of the image are obtained by using a convolutional pooling layer, and the convolutional pooling layer is specifically set as follows:

N is the number of input samples, and k is the number of bits of the hash code. We set N to 9461 and k to 48 during the experiment.

5. Training convolutional neural network structure

The invention uses the deep learning framework of TensorFlow, and the language is Python 3.6. Training was performed using the training set in the ModelNet40 dataset for a total of 9461 models, with the batch _ size set to 16 and the learning rate set to 0.0001.

6. Generating model features

After training of the training set, a network model capable of well learning the hash features of the model is generated, the hash features of the model are output by the last hash layer, and each model has 48-bit hash features, for example, the hash features of the airplane model in fig. 2 are [011101111001100110110111101110110110001100111010 ].

7. Model similarity retrieval

The characteristics of each model are represented by the hash codes learned by the trained network in the fourth step. The hash layer maps the high-dimensional features of the model to hash code features in a low-dimensional hamming space. Thus, the similarity between models is represented by the hamming distance D, with larger hamming distances representing less similarity of models and smaller hamming distances representing more similarity of models. The calculation method of the Hamming distance comprises the following steps

b_i,b_jIs the hash code characteristic of both models,

is an exclusive or operation. For any query three-dimensional model Q, similarity measurement is carried out on the three-dimensional model Q and the three-dimensional models in the three-dimensional model database M, and the model Q is matched^*The calculation process of (2) is as follows:

S(Q,M)＝argminD(b_i,b_j)

s represents the similarity between models, M_mRepresents the mth model in the database (1 is not less than m and not more than N)^*)，N^*Is the number of samples in the database M. Finally obtaining a matching model Q^*. Finally, the 10 models with the highest similarity with the models are output to the retrieval column as results through the calculationTable (7). The retrieval as the airplane model airplan _0219.off in FIG. 2 returns the most similar 10 models, [ 'airplan _0219.off', 'airplan _0115.off', 'airplan _0218.off', 'airplan _0002.off', 'airplan _0027.off', 'airplan _0566.off', 'airplan _0020.off', 'airplan _0374.off', 'airplan _0613.off', and 'airplan _0276.off']。

In order to verify the effectiveness of the present invention, the disclosed three-dimensional model dataset ModelNet40 is compared with other 5 advanced algorithms, and fig. 6 shows a receiver operating characteristic Curve (ROC Curve for short) of each algorithm, where the ordinate of the ROC Curve is true positive rate (TPR sensitivity) and the abscissa is false positive rate (FPR specificity). The abscissa, False Positive Rate (FPR), the proportion of samples predicted to be positive but actually negative to all negative samples; true Positive Rate (TPR), the proportion of samples predicted to be positive and actually positive to all positive examples. The closer the point on the curve is to the upper left corner, the higher the true rate is, the lower the false positive rate is, the stronger the algorithm distinguishing capability is, and the better the performance is. The results in the figure show that the three-dimensional model retrieval method based on the view and the hash algorithm has excellent performance.

In the above embodiments, the AlexNet convolutional neural network, the ModelNet40 dataset, the TensorFlow deep learning framework, the Relu activation function, the dropout layer, and the sigmod activation function are well known in the art.

The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, and the detailed description is given for the purpose of facilitating a better understanding of the method of the invention. It will be understood by those skilled in the art that various modifications and equivalent arrangements may be made without departing from the spirit and scope of the present invention and shall be covered by the appended claims.

Nothing in this specification is said to apply to the prior art.

Claims

1. A three-dimensional model retrieval method based on view and Hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing;

2. The retrieval method according to claim 1, wherein the view layer is that a maximum feature value of each dimension of each picture is selected from a plurality of pictures of the same three-dimensional model after 5 layers of convolutional layer feature extraction, and a feature descriptor of the generated three-dimensional model is input into a full connection layer for processing;

the high-dimensional features output by the full connection layer are transcoded into low-dimensional hash features f by the hash layer_nThen the hash feature f is used_nAccording to b_n＝sgn(f_n-0.5), into a discrete hash code b_n(ii) a Quantization loss function L in the conversion process_qlIs composed of

s.t.b_n∈{0,1}^kN is the number of input samples, and k is the number of bits of the hash code.

3. The retrieval method according to claim 1, wherein before obtaining the plurality of view pictures, model scale standardization processing is performed on different three-dimensional models, and the models with different scales are scaled into a cube with a side length of 2 by scaling the models, and the specific steps are as follows:

1) reading the information of each point of the three-dimensional model, and finding the coordinate point (x) with the minimum model_min,y_min,z_min) And the maximum coordinate point (x) of the model_max,y_max,z_max)；

2) Calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube;

3) scaling the model to obtain a normalized model: wherein, for the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:

x′＝(x-x_min)×2/l-1

y′＝(y-y_min)×2/l-1

z′＝(z-z_min)×2/l-1

4. The retrieval method of claim 3, wherein the multi-view picture is obtained by: arranging a virtual camera array around the models, taking 12 view pictures by each model, normalizing a plurality of view pictures into a uniform size, and using the uniform size as the input of a convolutional neural network; the method comprises the following specific steps:

1) placing the standardized model at the body center of the regular icosahedron, placing a virtual camera at 12 vertexes of the regular icosahedron for shooting, and obtaining a group of 12 views of the model with the size of 256 multiplied by 256;

2) the multi-view of the model is clipped to 227 × 227, and the clipped method is as the input of the convolutional neural network:

left＝C_w/2-C′_w/2

top＝C_h/2-C_h′/2

right＝left+C′_w

bottom＝top+C′_h

wherein top, bottom, left, right respectively represent the new size (C'_w,C_h') in the original size (C)_w,C_h) Upper, lower, left, and right boundaries of the crop in (1).

5. The retrieval method of claim 4, wherein the AlexNet-based convolutional neural network has a specific structure as follows:

1) sequentially inputting 12 views of 227 x 227 sizes of all models into a convolutional neural network, acquiring local features of an image by using a convolutional pooling layer, wherein the convolutional layer and the pooling layer are specifically set as follows:

the first layer comprises a convolution layer and a maximum pooling layer, the size of the convolution layer convolution kernel is 11 multiplied by 11, the step length is 4, and the activation function is set to be a Relu function; then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

the second layer comprises a convolution layer and a maximum pooling layer, the size of the convolution layer convolution kernel is 5 multiplied by 5, the step length is 1, and the activation function is set to be a Relu function; then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

the third layer comprises a convolution layer, the size of the convolution layer convolution kernel is 3 multiplied by 3, the step length is 1, and the activation function is set to be a Relu function;

the fourth layer comprises a convolution layer, the size of the convolution layer is 3 multiplied by 3, the step size is 1, and the activation function is set to be a Relu function;

the fifth layer comprises a convolutional layer and a max pooling layer, the convolutional layer has the size of 3 x 3 and the step size of 1, and the activation function is set to be the Relu function. Then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

2) after the convolution layer processing, adding a view layer after the fifth convolution layer, performing the view layer processing on the feature of each three-dimensional model after the convolution processing on the 12 pictures, comparing the 12 pictures by the view layer, taking the feature maximum value of each dimension of each picture, generating a feature descriptor of the three-dimensional model, and inputting the feature descriptor into a full-connection layer for processing; the setting of 2 layers in all connection layers is the same, 4096 neurons are set, a Relu activation function is added to avoid gradient disappearance, and a dropout layer is added to randomly set the value of the neuron to 0;

3) adding a hash layer behind the full-connection layer, wherein the layer contains k hidden layer neurons, namely the bits of the hash code, and setting a sigmod activation function; mapping 4096-dimensional features output by a full connection layer to a low-dimensional space to form a low-dimensional Hash feature f_nFurther converting it into discrete hash code b_nThe conversion process is as follows_n＝sgn(f_n-0.5); simultaneous setting of quantization loss function L_qlIs composed of

Wherein, b_n∈{0,1}^kAnd N is the number of input samples.

6. The retrieval method according to claim 1, wherein when training a network, a public Princeton three-dimensional model data set ModelNet40 is used, and after model scale standardization processing and multi-view picture normalization processing, training set data are input into a convolution neural network based on AlexNet for training, network parameters are optimized, and a network model is generated; and then testing the model test set by using the generated network model.

7. The search method according to claim 1, wherein the hamming distance is calculated by:

the method for obtaining the hash code characteristics corresponding to the characteristics of each model and calculating the Hamming distance comprises the following steps

b_i,b_jIs the hash code characteristic of both models,

S(Q,M)＝arg min D(b_i,b_j)