CN111242207A - Three-dimensional model classification and retrieval method based on visual saliency information sharing - Google Patents

Three-dimensional model classification and retrieval method based on visual saliency information sharing Download PDF

Info

Publication number
CN111242207A
CN111242207A CN202010017062.9A CN202010017062A CN111242207A CN 111242207 A CN111242207 A CN 111242207A CN 202010017062 A CN202010017062 A CN 202010017062A CN 111242207 A CN111242207 A CN 111242207A
Authority
CN
China
Prior art keywords
view
branches
feature
mvcnn
visual saliency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010017062.9A
Other languages
Chinese (zh)
Inventor
聂为之
王亚
屈露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010017062.9A priority Critical patent/CN111242207A/en
Publication of CN111242207A publication Critical patent/CN111242207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional model classification and retrieval method based on visual saliency information sharing, which comprises the following steps: extracting a view every 30 degrees around the Z-axis direction of the three-dimensional model, and extracting a feature descriptor of each virtual image through a deep convolutional neural network; taking the feature descriptor as the input of the visual saliency branch, generating the weight of a view through a first LSTM module and a soft attention mechanism, and generating the feature descriptor of the visual saliency branch through a second LSTM module; the feature descriptors are used as input of MVCNN branches, visual information fusion in the MVCNN module is guided by using view weights, and the feature descriptors of the MVCNN branches are obtained through a CNN; and connecting the descriptors of the two branches in series, making decision through a full connection layer and a softmax layer, classifying, and executing similarity measurement for retrieval. The invention is based on two branches of view convolutional neural network and visual saliency, fusing feature descriptors to generate feature descriptors for 3D shape classification and retrieval.

Description

Three-dimensional model classification and retrieval method based on visual saliency information sharing
Technical Field
The invention relates to the fields of three-dimensional model feature extraction, three-dimensional model classification and retrieval and the like, in particular to a three-dimensional model classification and retrieval method based on visual saliency information sharing.
Background
In recent years, as the application of 3D technology in the film and television industry has become widespread, people can see 3D models almost anywhere, and therefore it is natural and reasonable to explore more efficient methods to learn the representation form of three-dimensional models. Furthermore, with the development of computer vision and 3D reconstruction techniques, 3D shape recognition has become a fundamental task of shape analysis, which is the most critical technique for processing and analyzing 3D data. Thanks to the powerful deep learning neural network and the use of large-scale labeled 3D shape sets, various deep networks for 3D shape recognition have been studied. In general, 3D shape recognition methods can be roughly classified into two types: model-based methods and view-based methods.
Model-based methods can learn the shape characteristics of a model directly from a 3D data format, such as: voxel grid[1]Polygonal meshes or surfaces[2]And point cloud[3]. For example, the literature[4]A novel deep learning model, namely a grid convolution limited Boltzmann machine (MCRBM), is proposed for unsupervised feature learning of 3D grids. Literature reference[5]To learn global features, MeshNet (mesh neural network) using face unit and feature segmentation is proposed, which can solve complexity and irregularity of meshes and express a three-dimensional shape. In the literature[6]In (e), it is proposed that KD (K-dimensional search) networks can process unstructured point clouds and use learning functions to perform retrieval tasks. But the limited shape representation (e.g., smooth manifold) or higher computational complexity puts constraints on the model-based approach. This limitation is even more pronounced, especially for voxel-based methods.
In view-based approaches, the input data are views taken from different angles of the 3D object, which can be easily captured compared to other approaches (e.g., point cloud structures and polygon meshes). MVCNN-based[7](multiview convolutional neural network) architecture, a compact shape descriptor can be extracted from multiple rendered views of an object using CNN (convolutional neural network) with pooling layers. Deep-Pano[8]Learning PANORAMA using CNN (Panoramic) view characteristics. Literature reference[9]A method of capturing panoramic image features is proposed which aims at achieving continuity of a three-dimensional model by constructing an enhanced representation of the three-dimensional model. Literature reference[10]A real-time 3D shape search engine GIFT (graphics processor dual-acceleration reverse file) is proposed to accelerate the GPU (graphics processor) and the two reverse files. Most view-based methods treat all views equally, which results in ignoring the dependency and distinguishing information of multiple views, limiting the performance of existing methods.
The main challenges currently faced by three-dimensional model classification and retrieval are:
1) due to the large amount of information of the three-dimensional model, the three-dimensional model classification and retrieval tasks have higher time and space complexity;
2) the designed feature descriptors are guaranteed to have high discrimination while the computation time and space complexity are considered.
Disclosure of Invention
The invention provides a three-dimensional model classification and retrieval method based on visual saliency information sharing, which is based on two branches of a view convolutional neural network (MVCNN) and visual saliency, and then fuses feature descriptors of the two branches to generate feature descriptors for 3D shape classification and retrieval, as described in detail below:
a three-dimensional model classification and retrieval method based on visual saliency information sharing, the method comprising:
extracting a view every 30 degrees around the Z-axis direction of the three-dimensional model, and extracting a feature descriptor of each virtual image through a deep convolutional neural network;
taking the feature descriptor as the input of the visual saliency branch, generating the weight of a view through a first LSTM module and a soft attention mechanism, and generating the feature descriptor of the visual saliency branch through a second LSTM module;
the feature descriptors are used as input of MVCNN branches, visual information fusion in the MVCNN module is guided by using view weights, and the feature descriptors of the MVCNN branches are obtained through a CNN;
and connecting the descriptors of the two branches in series, making decision through a full connection layer and a softmax layer, classifying, and executing similarity measurement for retrieval.
The feature descriptor is used as an input of the visual saliency branch, a weight of a view is generated through a first LSTM module and a soft attention mechanism, and the feature descriptor for generating the visual saliency branch through a second LSTM module is specifically:
sequentially inputting 12 feature descriptors into a visual saliency branch according to an extraction sequence, generating weights of all views through a first LSTM module and a soft attention mechanism, and hiding the views in a hidden state htAnd an internal storage state ctThe relation between the two is calculated to obtain ht-1Further obtaining the weight of each view;
the last hidden state is linearly weighted and then used as the input of the second LSTM module to obtain the feature descriptors of the visual saliency branches.
Further, the guiding visual information fusion in the MVCNN model by using the view weight, and obtaining the feature descriptors of the MVCNN branches through one CNN specifically include:
performing feature fusion on the two-dimensional view by applying view saliency pooling;
and obtaining the feature descriptors of the MVCNN branches through a layer of deep neural convolution network.
The technical scheme provided by the invention has the beneficial effects that:
1. according to the method, the visual information and the related information of the views are saved by updating the weights of different views in the visual saliency model, so that the flexibility and the stability of the feature descriptors are improved;
2. the visual information fusion in the MVCNN model is guided by using the view weight defined by the visual saliency model, and the visual information and related information in the view are reserved, so that the three-dimensional model is more comprehensively described;
3. the method continuously updates parameters by using a deep learning-based method, ensures the obtained weight to be the optimal solution when obtaining the three-dimensional model feature descriptor, and increases the scientificity and accuracy of the feature descriptor;
4. through comparison experiments, the algorithm is proved to be superior to each branch algorithm and a classical 3D classification retrieval method.
Drawings
FIG. 1 is a flow chart of a three-dimensional model classification and retrieval method based on visual saliency information sharing;
FIG. 2 is an exemplary diagram of the contents of a three-dimensional model database.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
At this stage, most multi-view based methods treat all views equally, which results in ignoring the dependency and differentiation information of multiple views, limiting the performance of existing methods.
The embodiment of the invention provides a three-dimensional model classification and retrieval method based on visual saliency information sharing. A three-dimensional model is taken by rotation, one view is extracted at intervals of 30 degrees, a characteristic descriptor is extracted from each view, and the characteristic descriptors corresponding to 12 views are input in a visual saliency portion. Due to the superiority of the LSTM (long short term memory) network architecture, a soft attention mechanism and two LSTM modules are used in the visual saliency branch. The soft attention mechanism and the first LSTM module are used to generate view weights for the convolution features and the second LSTM module generates features for the visual saliency branches. And inputting feature descriptors corresponding to 12 views in the MVNN branch part, guiding visual information fusion in the MVNN model by using the view weights, and then obtaining the feature descriptors of the MVNN part through one CNN. The network is used for both classification tasks and retrieval tasks. The final result is obtained by fusion decision. The results of comparing the method with several other methods are provided in the WeChat, and the evaluations in the data sets of ModelNet40 and ShapeNetCore55 show the accuracy of classification and retrieval of three-dimensional models.
Example 1
A three-dimensional model classification and retrieval method based on visual saliency information sharing is disclosed, and is shown in FIG. 1, and mainly comprises three parts: firstly, calculating the view weight based on attention; secondly, view attention pooling; and generating a final shape descriptor, wherein the specific implementation steps are as follows:
101: giving a three-dimensional model, extracting a view every 30 degrees around the Z-axis direction of the three-dimensional model, and extracting a feature descriptor of each virtual image through a deep convolutional neural network;
102: taking the feature descriptor as the input of the visual saliency branch, generating the weight of a view through a first LSTM module and a soft attention mechanism, and generating the feature descriptor of the visual saliency branch through a second LSTM module;
103: similarly, the feature descriptors are used as input of MVCNN branches, visual information fusion in the MVCNN model is guided by using view weights, and then the feature descriptors of the MVCNN part are obtained through a CNN;
104: the feature descriptors of the two branches are obtained through the above steps 101-103, the descriptors of the two branches are connected in series, and then a decision is made through a full connection layer and a softmax layer, classification is performed, and similarity measurement is performed for retrieval.
The operation of extracting the feature descriptor of the virtual view through the convolutional neural network in step 101 is specifically:
1) extracting 12 views;
2) feature descriptors are extracted for each view.
The visually significant branch in step 102 takes the output of step 101 as input, and finally obtains the weight of each view and the characteristics of the visually significant branch, and the specific steps are as follows:
1) inputting 12 feature descriptors into the visual saliency branch in sequence according to the extraction sequence of step 101, generating the weight of each view through a first LSTM module and a soft attention mechanism, wherein the view weight is equal to the previous hidden state ht-1Is concerned, so by the hidden state htAnd an internal storage state ctThe relation betweenCalculated to obtain ht-1Thereby further obtaining the weight of each view.
ei=wTtanh(Uh[ht-1,vi,t]+bv)
Figure RE-GDA0002429140290000041
Figure RE-GDA0002429140290000042
Wherein e isiA relevance score for the ith view; u shapehIs ht-1A weight matrix of (a); v. ofi,tA feature descriptor for the ith view in time t; bvIs ht-1The weight deviation of (2); e.g. of the typejFor the correlation score of the jth view, T is the matrix transpose. w, Uh、bvAre parameters that need to be optimized.
2) The last hidden state is linearly weighted and then used as the input of the second LSTM module to obtain the feature descriptors of the visual saliency branches.
The MVCNN branch in step 103 also takes the output of step 101 as input, and finally obtains the characteristics of the MVCNN branch, specifically including the following steps:
1) performing feature fusion on the two-dimensional view by applying view saliency pooling;
2) and obtaining the feature descriptors of the MVCNN branches through a layer of deep neural convolution network.
In the above step 104, the classification and search task is finally completed by fusing the two branches, and the specific steps are as follows:
1) connecting the feature descriptors of the two branches in series to obtain a feature descriptor of the final three-dimensional model;
2) the feature descriptors pass through a full connection layer and a softmax layer to obtain a classification result;
3) the similarity measure is executed to obtain a retrieval result.
In summary, in the embodiment of the present invention, the feature descriptors of the two branches are extracted through the above steps 101 to 103, and then the features are fused and used for classification and retrieval through the step 104, so that the description of the three-dimensional model is more comprehensive, and the quantization of the similarity is more accurate and scientific.
Example 2
The scheme in embodiment 1 is further described below with reference to the network structure, fig. 1, and fig. 2, and is described in detail below:
for extracting the first feature descriptor, the invention takes the Z axis as the rotation center, carries out visual angle sampling on the three-dimensional model at intervals of 30 degrees, and extracts the feature descriptor of the view through a mature deep convolutional neural network, which is concretely as follows:
1. each three-dimensional model was normalized by NPCA (three-dimensional principal component analysis) method. The visualization tool developed by OpenGL then extracts one view every 30 degrees around each three-dimensional model Z-axis direction as a human observer. 12 views are extracted to represent the visual and structural information of the three-dimensional model. Thus, these views can be seen as a series of images v1,v2,...v12This is very important for the network structure of the present invention.
2. Extracting the characteristic descriptor of each view by adopting the network structure of the CNN to obtain f1,f2,...,f12The CNN network parameters are shared.
For the visual saliency branch, the three-dimensional model is characterized by obtaining the weight of each view and generating a feature descriptor through a soft attention mechanism and two LSTM modules, which is as follows:
1) input feature descriptor f1,f2,...,f12By the hidden state htAnd an internal storage state ctThe relation h betweent=ot⊙ctCalculating to obtain the last hidden state ht-1Wherein o istIs an output gate.
2) The calculation is based on the previous hidden state ht-1Each view weight a ofiWherein v isi,tIs a feature descriptor of the virtual view at time t, wTUhAnd bvAlong with overall network parametersAnd (5) new.
For the MVCNN branch, a three-dimensional model is represented by view significance pooling and multi-view feature descriptors fused according to weight, and the details are as follows:
(1) the 12 feature descriptors f obtained in the first step of the invention1,f2,...,f12Inputting the MVCNN branch, and obtaining an average value of the dynamic weighted sum of the multi-view feature descriptors by adopting visual saliency pooling;
(2) and inputting the aggregated feature descriptors into a final CNN network for training to obtain the feature descriptors of the MVCNN branches.
After the feature descriptors of the two branches are obtained, fusion is performed, specifically as follows:
(1) setting the dimension of the feature descriptors obtained by the two branches to be (1,4096), and obtaining one feature descriptor (2,4096) in a serial mode;
(2) obtaining the scores of all classifications through a full connection layer, and classifying through the scores by a softmax layer;
(3) and completing the retrieval task through the similarity measurement.
In conclusion, the embodiment of the invention enhances the expressiveness of the three-dimensional model through the steps, eliminates the influence of each view with the same view weight on the classification and retrieval results, and improves the accuracy of the classification and retrieval of the three-dimensional model.
Example 3
The following examples are presented to demonstrate the feasibility of the embodiments of examples 1 and 2, and are described in detail below:
the database in the embodiment of the invention is based on ModelNet40 and ShapeNetCore 55. ModelNet40 is a subset of ModelNet, and contains 12,311 CAD models, divided into 40 classes. The model was cleaned up manually, but without pose normalization, the ModelNet40 model used in the present example was in the format of off. ShapeleNet Core55 is a subset of Shapelet, and contains 55 classes, approximately 51,300 three-dimensional models, each of which is subdivided into several subcategories, including a 70% training set, a 10% validation set, and a 20% test set. The ShapeNetCore55 model used in the examples of the present invention is in the form of a.
The table below shows the accuracy of classification experiments performed by different parts of the network in the ModelNet40 dataset, and the results show that the attention weights can focus the model on more representative views, resulting in better performance in 3D shape recognition, and taking the captured views as a sequence of views and extracting their structural information, the network architecture being effective for obtaining a better 3D object representation.
Table 1 shows the classification results of different components of the framework in the ModelNet40 data set
Figure RE-GDA0002429140290000061
Figure RE-GDA0002429140290000071
Embodiments of the invention have performed classification and search experiments on ModelNet40 and compared with various models, including 3D Shapelets[10],SPH[11],LFD[12],MVCNN[7],PointNet[3],PointNet++[13],KD-Network[6]And the like. The following table shows the classification and search results for each method. In the retrieval task, low-rank Mahalanobis metric learning is further applied to the MVCNN to improve the retrieval performance. The method directly uses the final feature descriptors compressed by the serialized features and the convolutional features to obtain up to date performance of 90.7%.
The result shows that the method provided by the invention can achieve the best performance, the classification precision is 92.69%, and the retrieval mAP is 90.7%. Compared with the best result of MVCNN, the double-flow network of the method improves the classification and retrieval tasks by 1.7 percent and 7.7 percent respectively.
Table 2 shows the classification accuracy of each model in the ModelNet40 data set
Figure RE-GDA0002429140290000072
The following table shows the results of the search experiments performed in the ShapeNetCore55 dataset and compared with the three-dimensional model search methods including rotationNet, Improved GIFT, ReVGG, DLAN, SHREC16Bai GIFT, SHREC16Su MVCNN: in Micro-averaged, the method has better performance and always very close to the best result of the data set, but in Macro-averaged, the method is lower than the F-score of RotationNet, but superior to other three-dimensional model retrieval methods.
Table 3 shows the accuracy of the search in the ShapeNetCore55 dataset for each model
Figure RE-GDA0002429140290000081
In order to research the influence of the number of views on classification performance and search performance, virtual views are extracted by sequentially taking 180 degrees, 90 degrees, 60 degrees, 45 degrees, 36 degrees, 30 degrees and 18 degrees around a Z axis at an angle theta, and each three-dimensional model respectively generates 4 views, 6 views, 8 views, 10 views, 12 views and 20 views.
The following table is the results of classification and search experiments with different number of views as input to the algorithm. The results show that performance can be improved by increasing the number of views, but too many view images lead to redundancy of information and thus performance is degraded. When the view number is set to 12, the NN, FT, ST, F _ measure, DCG, ANMRR and ACC are respectively improved by 15.8% -46.7%, 11.8% -118.8%, 17.0% -71.5%, 18.0% -52.4%, 12.0% -95.6% and 43.6% -77.9%. Therefore, the optimum number of viewing times is set to 12.
TABLE 4 Classification and retrieval accuracy for varying view numbers in ModelNet40 data sets
Figure RE-GDA0002429140290000082
In order to study the influence of view order on the classification and search results of the three-dimensional model, the present embodiment sets up 50 out-of-order view experiments, and the following table provides the classification and search results. The results show that the results of entering a cluttered view are even better than the results of an ordered view. Obviously, the network can adaptively calculate the importance of each view without being limited by the setting of a camera, thereby realizing the learning of powerful three-dimensional model visual information and structural information.
TABLE 5 Classification and retrieval accuracy for view misordering and forward ordering in ModelNet40 dataset
Figure RE-GDA0002429140290000091
Reference to the literature
[1]Z.Wu,S.Song,A.Khosla,F.Yu,L.Zhang,X.Tang,and J.Xiao.3D shapenets:Adeep representation for volumetric shapes.In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,pages 19121920,2015.
[2]D.Boscaini,J.Masci,E.Rodol‘a,and M.Bronstein.Learning shapecorrespondence with anisotropic convolutional neural networks.In NIPS,pages31893197,2016.
[3]C.R.Qi,H.Su,K.Mo,and L.J.Guibas.Pointnet:Deep learning on pointsets for 3d classification and segmentation.In CVPR,2017.
[4]Z.Han,Z.Liu,J.Han,C.M.Vong,S.Bu,and C.L.Chen,Mesh convolutionalrestricted boltzmann machines for unsupervised learning of features withstructure preservation on 3-d meshes,IEEE Transactions on Neural NetworksLearning Systems,28(10):22682281,2017.
[5]Y.Feng,Y.Feng,H.You,X.Zhao,and Y.Gao,Meshnet:Mesh neural networkfor 3d shape representation,arXiv:1811.11424,2018.
[6]R.Klokov and V.Lempitsky.Escape from cells:Deep kd-networks forthe recognition of 3d point cloud models.arXiv:1704.01222,2017.
[7]H.Su,S.Maji,E.Kalogerakis,and E.Learned-Miller,Multiviewconvolutional neural networks for 3D shape recognition.In Proceedings of theIEEE International Conference on Computer Vision,pages 945953, 2015.
[8]K.Sfikas,T.Theoharis,and I.Pratikakis,Exploiting the PANORAMARepresentation for Convolutional Neural Network Classification and Retrieval,in Eurographics Workshop on 3D Object Retrieval,I.Pratikakis,F.Dupont,andM.Ovsjanikov,Eds.The Eurographics Association,2017.
[9]K.Sfikas,I.Pratikakis,and T.Theoharis,Ensemble of panoramabasedconvolutional neural networks for 3d model classification and retrieval,Computers Graphics,vol.71,pages 208218. [Online].Available:http://www.sciencedirect.com/science/article/pii/S0097849317301978,2018.
[10]S.Bai,X.Bai,Z.Zhou,Z.Zhang,and L.Jan Latecki,GIFT:A realtime andscalable 3D shape search engine,in Proc.IEEE Conf.Comput.Vis.PatternRecognit,pages 50235032,2016.
[11]M.Kazhdan,T.Funkhouser,and S.Rusinkiewicz,Rotation invariantspherical harmonic representation of 3Dshape descriptors,inProc.Symp.Geometry Process.vol.6,pp.156164, 2003.
[12]D.Chen,X.Tian,Y.Shen,and M.Ouhyoung,On visual similarity based 3Dmodel retrieval, Comput.Graph.Forum,vol.22,no.3,pp.223232,2003.
[13]C.R.Qi,L.Yi,H.Su,and L.J.Guibas.Pointnet++:Deep hierarchicalfeature learning on point sets in a metric space.In NIPS,2017.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. A three-dimensional model classification and retrieval method based on visual saliency information sharing, characterized in that the method comprises:
extracting a view every 30 degrees around the Z-axis direction of the three-dimensional model, and extracting a feature descriptor of each virtual image through a deep convolutional neural network;
taking the feature descriptor as the input of the visual saliency branch, generating the weight of a view through a first LSTM module and a soft attention mechanism, and generating the feature descriptor of the visual saliency branch through a second LSTM module;
the feature descriptors are used as input of MVCNN branches, visual information fusion in the MVCNN module is guided by using view weights, and the feature descriptors of the MVCNN branches are obtained through a CNN;
and connecting the descriptors of the two branches in series, making decision through a full connection layer and a softmax layer, classifying, and executing similarity measurement for retrieval.
2. The method for classifying and retrieving three-dimensional models based on visual saliency information sharing according to claim 1, wherein the feature descriptors used as input of the visual saliency branch are generated by a first LSTM module and a soft attention mechanism as weights of views, and the feature descriptors generated by a second LSTM module as input of the visual saliency branch are specifically:
sequentially inputting 12 feature descriptors into a visual saliency branch according to an extraction sequence, generating weights of all views through a first LSTM module and a soft attention mechanism, and hiding the views in a hidden state htAnd an internal storage state ctThe relation between the two is calculated to obtain ht-1Further obtaining the weight of each view;
the last hidden state is linearly weighted and then used as the input of the second LSTM module to obtain the feature descriptors of the visual saliency branches.
3. The method as claimed in claim 1, wherein the applying the view weight to guide the visual information fusion in the MVCNN model, and obtaining the feature descriptors of the MVCNN branches through a CNN specifically comprises:
performing feature fusion on the two-dimensional view by applying view saliency pooling;
and obtaining the feature descriptors of the MVCNN branches through a layer of deep neural convolution network.
CN202010017062.9A 2020-01-08 2020-01-08 Three-dimensional model classification and retrieval method based on visual saliency information sharing Pending CN111242207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010017062.9A CN111242207A (en) 2020-01-08 2020-01-08 Three-dimensional model classification and retrieval method based on visual saliency information sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010017062.9A CN111242207A (en) 2020-01-08 2020-01-08 Three-dimensional model classification and retrieval method based on visual saliency information sharing

Publications (1)

Publication Number Publication Date
CN111242207A true CN111242207A (en) 2020-06-05

Family

ID=70872998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010017062.9A Pending CN111242207A (en) 2020-01-08 2020-01-08 Three-dimensional model classification and retrieval method based on visual saliency information sharing

Country Status (1)

Country Link
CN (1) CN111242207A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270762A (en) * 2020-11-18 2021-01-26 天津大学 Three-dimensional model retrieval method based on multi-mode fusion
CN112488122A (en) * 2020-11-25 2021-03-12 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN112801928A (en) * 2021-03-16 2021-05-14 昆明理工大学 Attention mechanism-based millimeter wave radar and visual sensor fusion method
CN113032613A (en) * 2021-03-12 2021-06-25 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN113052231A (en) * 2021-03-23 2021-06-29 哈尔滨理工大学 Three-dimensional model classification method based on voxel and global shape distribution characteristics
CN113191401A (en) * 2021-04-14 2021-07-30 中国海洋大学 Method and device for three-dimensional model recognition based on visual saliency sharing
CN113313140A (en) * 2021-04-14 2021-08-27 中国海洋大学 Three-dimensional model classification and retrieval method and device based on deep attention
CN116935477A (en) * 2023-09-13 2023-10-24 中南民族大学 Multi-branch cascade face detection method and device based on joint attention

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596329A (en) * 2018-05-11 2018-09-28 北方民族大学 Threedimensional model sorting technique based on end-to-end Deep integrating learning network
CN109063139A (en) * 2018-08-03 2018-12-21 天津大学 Based on the classification of the threedimensional model of panorama sketch and multichannel CNN and search method
CN110188228A (en) * 2019-05-28 2019-08-30 北方民族大学 Cross-module state search method based on Sketch Searching threedimensional model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596329A (en) * 2018-05-11 2018-09-28 北方民族大学 Threedimensional model sorting technique based on end-to-end Deep integrating learning network
CN109063139A (en) * 2018-08-03 2018-12-21 天津大学 Based on the classification of the threedimensional model of panorama sketch and multichannel CNN and search method
CN110188228A (en) * 2019-05-28 2019-08-30 北方民族大学 Cross-module state search method based on Sketch Searching threedimensional model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHAO MA等: ""Learning Multi-View Representation With LSTM for 3-D Shape Recognition and Retrieval"", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
HANG SU等: ""Multi-view Convolutional Neural Networks for 3D Shape Recognition"", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
WEIZHI NIE等: ""Two-Stream Network Based on Visual Saliency Sharing for 3D Model Recognition"", 《IEEE ACCESS》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270762A (en) * 2020-11-18 2021-01-26 天津大学 Three-dimensional model retrieval method based on multi-mode fusion
CN112488122A (en) * 2020-11-25 2021-03-12 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN112488122B (en) * 2020-11-25 2024-04-16 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN113032613A (en) * 2021-03-12 2021-06-25 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN112801928A (en) * 2021-03-16 2021-05-14 昆明理工大学 Attention mechanism-based millimeter wave radar and visual sensor fusion method
CN112801928B (en) * 2021-03-16 2022-11-29 昆明理工大学 Attention mechanism-based millimeter wave radar and visual sensor fusion method
CN113052231A (en) * 2021-03-23 2021-06-29 哈尔滨理工大学 Three-dimensional model classification method based on voxel and global shape distribution characteristics
CN113191401A (en) * 2021-04-14 2021-07-30 中国海洋大学 Method and device for three-dimensional model recognition based on visual saliency sharing
CN113313140A (en) * 2021-04-14 2021-08-27 中国海洋大学 Three-dimensional model classification and retrieval method and device based on deep attention
CN113313140B (en) * 2021-04-14 2022-11-01 中国海洋大学 Three-dimensional model classification and retrieval method and device based on deep attention
CN116935477A (en) * 2023-09-13 2023-10-24 中南民族大学 Multi-branch cascade face detection method and device based on joint attention
CN116935477B (en) * 2023-09-13 2023-12-26 中南民族大学 Multi-branch cascade face detection method and device based on joint attention

Similar Documents

Publication Publication Date Title
CN111242207A (en) Three-dimensional model classification and retrieval method based on visual saliency information sharing
Qiu et al. Geometric back-projection network for point cloud classification
EP3179407B1 (en) Recognition of a 3d modeled object from a 2d image
You et al. PVRNet: Point-view relation neural network for 3D shape recognition
Papadakis et al. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation
Yang et al. Multi-view CNN feature aggregation with ELM auto-encoder for 3D shape recognition
Nie et al. DAN: Deep-attention network for 3D shape recognition
CN109063139B (en) Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN
Demir et al. Skelneton 2019: Dataset and challenge on deep learning for geometric shape understanding
Shi et al. Learning to detect 3D symmetry from single-view RGB-D images with weak supervision
JP2008527473A (en) 3D model search method, search device, and search program
Yu et al. Part-wise AtlasNet for 3D point cloud reconstruction from a single image
Li et al. Shrec’16 track: 3D sketch-based 3D shape retrieval
Liu et al. VFMVAC: View-filtering-based multi-view aggregating convolution for 3D shape recognition and retrieval
Bickel et al. A novel shape retrieval method for 3D mechanical components based on object projection, pre-trained deep learning models and autoencoder
Liang et al. MVCLN: multi-view convolutional LSTM network for cross-media 3D shape recognition
Bu et al. Multimodal feature fusion for 3D shape recognition and retrieval
Williams et al. Voronoinet: General functional approximators with local support
Nie et al. MMFN: Multimodal information fusion networks for 3D model classification and retrieval
Su et al. 3d-assisted image feature synthesis for novel views of an object
Ma et al. A novel 3D shape recognition method based on double-channel attention residual network
Liu et al. Semantic and context information fusion network for view-based 3D model classification and retrieval
Zou et al. Shape-based retrieval and analysis of 3D models using fuzzy weighted symmetrical depth images
Zhou et al. Sketch augmentation-driven shape retrieval learning framework based on convolutional neural networks
Liang et al. 3D shape recognition based on multi-modal information fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200605