CN116521913A

CN116521913A - Sketch three-dimensional model retrieval method based on prototype comparison learning

Info

Publication number: CN116521913A
Application number: CN202310296899.5A
Authority: CN
Inventors: 周圆; 王冰蕊; 霍树伟; 陈克然
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-08-01

Abstract

The invention discloses a sketch three-dimensional model retrieval method based on prototype comparison learning, and relates to the technical field of multimedia information retrieval; the sketch three-dimensional model retrieval method comprises the following steps: s1, selecting a data set sample; s2, preprocessing a data set sample; s3, training a sketch intra-domain feature extractor and a three-dimensional intra-domain model feature extractor; calculating clustering prototypes by using the samples respectively, calculating prototype comparison loss, and iteratively updating network parameters of the intra-domain feature extractor; s4, training a mapping network aligned across domains; respectively calculating intra-domain features of the sketch and the three-dimensional model, matching the sketch and the three-dimensional model into pairs according to categories, calculating a common clustering prototype through a mapping network, calculating prototype comparison loss, and iteratively updating the mapping network aligned across domains; according to the invention, the difference between the categories is increased while the inter-domain semantic gap is effectively reduced for the sketch and the three-dimensional model, so that the three-dimensional model of the corresponding category can be quickly and accurately searched by inputting the sketch.

Description

Sketch three-dimensional model retrieval method based on prototype comparison learning

Technical Field

The invention belongs to the technical field of multimedia information retrieval, and particularly relates to a sketch three-dimensional model retrieval method based on prototype comparison learning.

Background

In recent years, with the rapid development of multimedia technology and internet technology, three-dimensional models are widely applied to entertainment and industry, and play an increasingly important role. In order to help users to easily find the needed three-dimensional model, multiplexing and sharing the three-dimensional model, three-dimensional model retrieval has become a popular research direction in the technical field of multimedia information retrieval. The widespread popularity of touch screen devices today allows people to draw sketches anywhere and anytime, and many people have begun to attempt to draw sketches by hand for expression and communication. Because sketch can contain richer meaning than characters, and is easier to manufacture and acquire than a three-dimensional model, compared with a traditional mode based on keywords and a three-dimensional model sample, the hand-drawn sketch is an interaction mode which is more suitable for three-dimensional model retrieval.

In the existing research, in the 2013 Li et al, "SHREC '13track:Large scale sketch-based3D shape retrieval" and "SHREC'14track:Extended large scale sketch-based3D shape retrieval", it can be seen that the early sketch three-dimensional model retrieval method is established on the characteristics manually made by researchers, and is time-consuming and labor-consuming. Wang et al in "Sketch-based 3d shape retrieval using convolutional neural networks" proposed using a twin network to solve the cross-domain problem, introducing deep learning into the Sketch three-dimensional model retrieval task for the first time; however, the method ignores the three-dimensional model, only selects one view and sketch of the three-dimensional model for pairing learning, and has low performance. In 2018 Dai et al in Deep correlated holistic metric learning for sketch-based3D shape retrieval, a sketch three-dimensional model retrieval method based on overall metric learning is proposed, discrimination loss is used for improving the distinguishability of different objects in a domain, and correlation loss is used for reducing the distance between similar objects in the domain; compared with the prior art, the performance of the method is greatly improved, and when the training is performed, the alignment operation needs to be performed at a plurality of nodes of the network, so that the method is somewhat complicated. In 2020 Dai et al, a sketch three-dimensional model retrieval mode based on Cross-domain guidance training is provided by using a knowledge distillation model in Cross-modal guidance network for sketch-based3D shape retrieval, and a large number of experiments prove that the method has good retrieval performance.

But despite the rapid and convenient retrieval of sketch-based three-dimensional models and certain progress, it still faces many challenges. First, sketches, though belonging to visual media as three-dimensional models, are quite different in dimension and generation mode, and a great semantic gap exists between two domains. Second, sketches are highly abstract in nature, and sketch style differences of similar objects drawn by different people can be quite large.

The invention provides a sketch three-dimensional model retrieval method based on prototype comparison learning, which enables samples from two different domains of sketch and three-dimensional model to be gathered near the public prototypes of corresponding classes in a public semantic space and to be far away from the public prototypes of other classes as far as possible, so that differences among classes are increased while semantic gaps among domains are effectively reduced, and the three-dimensional model of the corresponding class is quickly and accurately retrieved through inputting the sketch.

Disclosure of Invention

The invention aims to provide a sketch three-dimensional model retrieval method based on prototype comparison learning, which aims to solve the problems that sketches proposed in the background technology have high abstraction, sketch styles of similar objects drawn by different people are possibly extremely different, the sketches are different from three-dimensional models in terms of visual media, and extremely large semantic gaps exist between two domains, so that the three-dimensional model retrieval based on the sketches is difficult and the like.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a sketch three-dimensional model retrieval method based on prototype comparison learning comprises the following steps:

s1, selecting a data set sample;

the method comprises the steps that a sketch three-dimensional model retrieval data set is selected from a data set sample, and the sketch three-dimensional model retrieval data set comprises a sketch data subset and a three-dimensional model data subset;

s2, preprocessing a data set sample;

unifying sketch data in the sketch data subset in the S1 into a binary image, and rendering each three-dimensional model data in the three-dimensional model data subset to obtain 12 two-dimensional gray level views;

s3, training a sketch intra-domain feature extractor and a three-dimensional intra-domain model feature extractor;

designing a sketch intra-domain feature extractor network and a three-dimensional model intra-domain feature extractor network, extracting the respective features of the samples preprocessed in the step S2 by adopting the sketch intra-domain feature extractor network and the three-dimensional model intra-domain feature extractor network, calculating a clustering prototype by using the sample features, gathering each sample near the corresponding category prototype, calculating prototype contrast loss, and iteratively updating intra-domain feature extractor network parameters to obtain the sketch intra-domain feature extractor and the three-dimensional intra-domain model feature extractor;

s4, training a mapping network aligned across domains;

and (3) respectively calculating the intra-domain features of the sketch and the three-dimensional model by using the intra-domain feature extractor of the sketch and the intra-domain feature extractor of the three-dimensional model obtained in the step (S3), matching the sketch and the three-dimensional model into pairs according to categories, calculating a common clustering prototype through a mapping network, calculating prototype comparison loss, and iteratively updating the mapping network aligned across domains.

Preferably, the sketch three-dimensional model retrieval data set comprises different types of data, each type of sketch data is divided into a training sample and a test sample according to a ratio of 5:3, and each type of three-dimensional model is divided into a training sample and a test sample according to a ratio of 8:2.

Preferably, in the step S2, each three-dimensional model data in the three-dimensional model data subset is rendered to obtain 12 two-dimensional gray scale views, which is specifically as follows:

and placing the three-dimensional model on a plane, uniformly placing 12 virtual camera positions around the three-dimensional model at intervals of 30 degrees, adding illumination during rendering, and representing the surface depth of the three-dimensional model by using the brightness of the two-dimensional view to obtain 12 two-dimensional gray level views.

Preferably, in the step S3, a clustering manner is used to calculate a class prototype, and a contrast learning manner is used to pull the similar prototype and keep the similar prototype away from the similar prototype, which is specifically as follows:

s31, designing a sketch intra-domain feature extractor network; designing a sketch intra-domain feature extractor network according to the characteristics of sketch data by utilizing AlexNet, wherein the convolution kernel of the first layer of the sketch intra-domain feature extractor network is 15 x 15, and the dropout rate after the last full connection layer is 0.5, so as to extract the features of the sketch;

s32, designing a three-dimensional model domain feature extractor network; taking a multi-view method MVCNN of a three-dimensional model as a frame, taking a pre-trained ResNet-50 on a large image dataset ImageNet as a backbone convolution neural network for processing each view gray rendering graph to obtain a three-dimensional model intra-domain feature extractor network for extracting features of the three-dimensional model;

s33, extracting sketch data training samples and three-dimensional model training samples by using a sketch intra-domain feature extractor network and a three-dimensional model intra-domain feature extractor network to train respective features, calculating respective clustering prototypes of the training samples, enabling each sample to be gathered near the prototypes of the corresponding class and to be far away from prototypes of other classes as far as possible, calculating prototype comparison loss, and iteratively updating intra-domain feature extractor network parameters to obtain a sketch intra-domain feature extractor and a three-dimensional intra-domain model feature extractor;

and S34, verifying the intra-domain feature classification accuracy of the sketch intra-domain feature extractor and the three-dimensional intra-domain model feature extractor by using the sketch data test sample and the three-dimensional model test sample.

Preferably, the loss function used to iteratively update the intra-domain feature extractor network parameters during the training in S33 is prototype-to-prototype loss

The corresponding calculation formula is as follows:

where v is a feature vector extracted from the sample; sim is a similarity measurement function; τ is the control coefficient; pc is a clustering prototype of c category characteristics in the memory data; c+ represents a positive class; p is the set of all cluster prototypes.

Preferably, in S4, the prototype contrast loss is calculated according to the formula (1) in S3, and the mapping network aligned across domains is iteratively updated.

Preferably, the mapping network is a three-layer fully-connected network with a scale of 2048-1024-256, and the sketch intra-domain feature extractor and the three-dimensional intra-domain model feature extractor correspond to each other.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention can gather samples from two different domains of sketch and three-dimensional model near the common prototype of their corresponding category in a common semantic space and keep away from the common prototype of other category as far as possible, thereby effectively reducing inter-domain semantic gap and increasing the difference between categories, and the three-dimensional model of the corresponding category can be quickly and accurately searched by inputting sketch.

(2) The method provided by the invention is simple and easy to implement, and the content is mainly that the two intra-domain feature extractors and the cross-domain alignment mapping network are trained by using a prototype comparison learning training mode. A large number of experiments carried out on two general data sets of SHREC '13 and SHREC'14 show that the method has excellent searching performance, is not inferior to the current best method, and has good utilization prospect.

Drawings

FIG. 1 is a schematic view of an overall frame in the present invention;

FIG. 2 is a schematic diagram of the core principle of the present invention;

fig. 3 is an exemplary diagram of the search results obtained by the present invention on both the shrc '13 and shrc' 14 data sets.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1:

step one, selecting a data set.

The data sets are selected from the general SHREC '13 and SHREC'14 sketch three-dimensional model retrieval data sets, and both data sets comprise sketch data subsets and three-dimensional model data subsets.

Wherein, the SHREC'13 data set comprises 7200 sketch data and 1258 three-dimensional model data, and 90 different categories are total, each sketch comprises 50 training samples and 30 test samples, the volumes of the three-dimensional model samples of each category are different, and the volumes of the three-dimensional model samples of each category are 80 percent in experiment: 20% are randomly divided into training samples and test samples;

the SHREC'14 data set comprises 13680 sketch data and 8987 three-dimensional model data, and 171 different classes are total, each sketch comprises 50 training samples and 30 test samples, the volumes of the three-dimensional model samples of each class are different, and the volumes of the three-dimensional model samples of each class are 80 percent when in experiment: 20% are randomly divided into training samples and test samples.

And step two, preprocessing a data set.

And (3) carrying out data preprocessing on the samples in the data set selected in the step (1).

For sketch data, the original sample of the data set is an 8-channel gray scale image with 1111×1111 size, and for convenience of network learning and processing, the original sample is uniformly transformed into a binary image with 224×224 size.

For three-dimensional model data, the original sample of the data set is a three-dimensional model in an off format, and the method is provided for selecting and rendering the three-dimensional model data into the expression form of a plurality of two-dimensional views: assuming that the model directions are uniformly placed on a certain plane (such as vertical to a horizontal line), uniformly placing 12 virtual camera positions around the model at intervals of 30 degrees, adding illumination during rendering for more obviously highlighting depth information of the three-dimensional model, and representing the depth of the surface of the three-dimensional model by using the brightness of the two-dimensional view, so as to obtain 12 two-dimensional views, further processing, unifying the view sizes to 256 x 256, and converting the view sizes into 8-channel gray level images.

And thirdly, respectively training two intra-domain feature extractors corresponding to the sketch and the three-dimensional model.

The method provided by the invention considers that the basis for solving the cross-domain retrieval problem is to fully understand learning for the data characteristics in a single domain. Thus, as shown in FIG. 1, the overall framework includes two parts, intra-domain feature extraction and cross-domain feature alignment, a step which specifically describes intra-domain feature extraction as shown on the left side of FIG. 1.

For the sketch feature extractor, because a large blank area exists in the sketch and huge style differences exist among different sketch samples, although the two-dimensional image belongs to the two-dimensional image, the invention does not select to directly use convolutional neural networks specially designed for real photos like the prior art, but extracts the features of the sketch by newly designing a new network on the basis of AlexNet: the convolution kernel of the first layer is increased from 11 to 15 to filter blank information, and the dropout rate after the last full-connection layer is increased to 0.5 to reduce the solidification of sketch style learning.

For the three-dimensional model feature extractor, the invention takes a three-dimensional model multi-view method MVCNN with good performance as a framework, and takes ResNet-50 pre-trained on a large image dataset ImageNet as a backbone convolution neural network for processing a gray rendering graph of each view.

For both the sketch feature extractor and the three-dimensional model feature extractor, the invention uses the samples preprocessed in the step two to calculate their clustering prototypes, so that each sample is gathered near the prototypes of the corresponding class and is far away from the prototypes of other classes as far as possible, as shown in fig. 2. More specifically, the loss function used to iteratively update intra-domain feature extractor network parameters as training is prototype-versus-lossThe corresponding calculation formula is as follows:

where v is a feature vector extracted from the sample; sim is a similarity measurement function, using the L2 norm during the experiment; τ is a control coefficient, which is conventionally set to 0.07 according to contrast learning during experiments; pc is a clustering prototype of c category characteristics in the memory data; c+ represents a positive class; p is the set of all cluster prototypes.

In the third step, the feature extractor is trained, a series of clustering prototypes are calculated by using the feature extractor in a training mode, then a loss function of prototype comparison loss is calculated by using the clustering prototypes, and parameters of the feature extractor are updated, so that the classification is more accurate in the feature extraction classification process of the feature extractor.

Training a mapping network aligned across domains.

So far, two intra-domain feature extractors corresponding to sketches and three-dimensional models have been trained based on prototype-contrast learning. Next, sketches and three-dimensional model features semantically belonging to the same category should be mapped onto adjacent locations in a common semantic space, as required by the cross-domain retrieval task. This step is the cross-domain feature alignment shown on the right side of fig. 1.

Since the prototype-contrast learning method as shown in fig. 2 is not constrained to a specific multimedia form, it is still used to train a mapping network that is aligned across domains: and (3) respectively calculating the intra-domain features of the sketch and the three-dimensional model by utilizing the two feature extractors obtained in the step (III), matching the intra-domain features of the sketch and the three-dimensional model into pairs according to categories, calculating a common clustering prototype through a corresponding mapping network, and keeping the common prototype away from other categories as far as possible, calculating prototype comparison loss according to a formula (1), and iteratively updating the mapping network aligned across domains. The mapping network is a three-layer fully-connected network with the scale of 2048-1024-256, and the sketch intra-domain feature extractor and the three-dimensional intra-domain model feature extractor correspond to one another.

For the method provided by the invention, two general sketch three-dimensional model data sets of SHREC '13 and SHREC'14 are used for experimental test, and search indexes FT, DCG, E, mAP on the two data sets are recorded respectively as shown in tables 1 and 2. The closer E is to 0%, the better FT, DCG, mAP is to 100% in the index.

Table 1 comparative search Performance of different schemes on SHREC'13 dataset (%)

Scheme for the production of a semiconductor device	FT	DCG	E	mAP
					Comparative scheme 1	14.2	20.8	76.1	20.8
Comparative scheme 2	43.3	51.0	54.7	52.5
					Comparative scheme 3	52.0	60.8	53.4	58.8
The scheme of the invention	54.7	71.9	45.4	65.9

Table 2 comparative retrievability of different schemes on the shrc' 14 dataset (%)

Scheme for the production of a semiconductor device	FT	DCG	E	mAP
					Comparative scheme 1	12.9	20.0	76.1	20.7
Comparative scheme 2	32.5	41.4	61.2	41.9
					Comparative scheme 3	35.0	41.7	61.6	44.2
The scheme of the invention	40.3	45.5	58.1	49.1

In the above table, comparative scheme 1 is the twin network method proposed by Wang et al in "Sketch-based 3d shape retrieval using convolutional neural networks", comparative scheme 2 is the overall metric learning method proposed by Dai et al in "Deep correlated holistic metric learning for Sketch-based3D shape retrieval", and comparative scheme 3 is the Cross-domain guided training method proposed by Dai et al in "Cross-modal guidance network for Sketch-based3D shape retrieval". From the observation of tables 1 and 2, it can be seen that FT, DCG, E, mAP search indexes on two general data sets of SHREC '13 and SHREC'14 in the scheme of the invention have certain advantages compared with the comparison scheme, and three-dimensional models of corresponding categories can be searched through inputting sketches, thereby having good utilization prospects.

An example of the results of a search performed on two datasets using the method provided by the present invention is shown in FIG. 3, with the left side being the input sketch and the right side being the top 5 three-dimensional model results retrieved; the failure cases are framed by boxes and attached with corresponding category names, and can be seen that even if the case is failed in retrieval, the failure cases have a certain visual similarity with the successful cases, and the effect of the method provided by the invention can be proved to a certain extent

The foregoing is only for aiding in understanding the method and the core of the invention, but the scope of the invention is not limited thereto, and it should be understood that the technical scheme and the inventive concept according to the invention are equivalent or changed within the scope of the invention by those skilled in the art. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A sketch three-dimensional model retrieval method based on prototype comparison learning is characterized by comprising the following steps:

s1, selecting a data set sample;

s2, preprocessing a data set sample;

s4, training a mapping network aligned across domains;

2. The sketch three-dimensional model retrieval method according to claim 1, wherein: the sketch three-dimensional model retrieval data set comprises different types of data, each type of sketch data is divided into a training sample and a test sample according to a ratio of 5:3, and each type of three-dimensional model is divided into a training sample and a test sample according to a ratio of 8:2.

3. The sketch three-dimensional model retrieval method according to claim 2, wherein: and in the step S2, rendering each three-dimensional model data in the three-dimensional model data subset to obtain 12 two-dimensional gray scale views, wherein the three-dimensional gray scale views are specifically as follows:

4. A sketch three-dimensional model retrieval method according to claim 3, characterized in that: in the step S3, a clustering mode is used for calculating a category prototype, a comparison learning mode is used for pulling similar prototypes and keeping away from heterogeneous prototypes, and the method specifically comprises the following steps:

5. The sketch three-dimensional model retrieval method according to claim 4, wherein: the loss function used for iteratively updating the intra-domain feature extractor network parameters during the training in S33 is prototype-to-prototype comparison loss

The corresponding calculation formula is as follows:

6. The sketch three-dimensional model retrieval method according to claim 5, wherein: and in S4, prototype comparison loss is calculated according to the formula (1) in S3, and the mapping network aligned across domains is updated iteratively.

7. The sketch three-dimensional model retrieval method according to claim 6, wherein: the mapping network is a three-layer fully-connected network with the scale of 2048-1024-256, and the sketch intra-domain feature extractor and the three-dimensional intra-domain model feature extractor correspond to one another.