CN113052110B

CN113052110B - Three-dimensional interest point extraction method based on multi-view projection and deep learning

Info

Publication number: CN113052110B
Application number: CN202110359551.7A
Authority: CN
Inventors: 舒振宇; 杨思鹏; 辛士庆; 庞超逸; 金小刚; 刘利刚; 吴皓钰
Original assignee: Zhejiang University of Science and Technology ZUST
Current assignee: Zhejiang University of Science and Technology ZUST
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-07-29
Anticipated expiration: 2041-04-02
Also published as: CN113052110A

Abstract

The invention discloses a three-dimensional interest point extraction method based on multi-view projection and deep learning, which comprises the steps of projecting a marked 3D object into a plurality of 2D views to collect training data, constructing interest point training probability distribution, training a neural network through 2D image data and the interest point training probability distribution, obtaining probability distribution Q according to the trained neural network and an improved density peak clustering algorithm, and extracting three-dimensional interest points of the 3D object. The automatic detection of the 3D object interest points is realized through a small amount of data, and satisfactory results can be obtained without depending on manually set feature descriptors or a large amount of expensive 3D training data.

Description

Three-dimensional interest point extraction method based on multi-view projection and deep learning

Technical Field

The invention relates to the technical field of image processing, in particular to a three-dimensional interest point extraction method based on multi-view projection and deep learning.

Background

Points of interest (POIs), also referred to as feature points, are generally defined as unique features on the surface of a 3D object, and POIs play a crucial role in many geometric processing tasks, such as viewpoint selection, shape enhancement, shape retrieval, mesh registration, and mesh segmentation.

POIs can be easily distinguished from other points on a 3D shape by human visual perception. However, it is not easy to define POIs accurately from a geometric point of view, although they do relate to geometric features, so automatic detection of POIs that conform to human visual perception remains a challenging problem.

It is generally accepted that determining whether a point is a POI is subjective because different people may have different opinions about the POI. Based on the above observations, a data-driven approach is applied to efficiently detect POIs on 3D shapes. With recent advances in relevant research areas, deep learning has been introduced to detect POIs on 3D shapes to achieve satisfactory results by learning complex mappings between geometric features and POI probability values for each point on the surface.

However, the scarcity of 3D training data forces learning-based methods to rely heavily on artificially set geometric features rather than learning features directly from raw data, since acquiring high-quality training data of 3D shapes is much more expensive than acquiring 2D images. This greatly limits the ability of this approach to achieve better detection performance. Furthermore, the difference between the subjective visual perception of a person and the geometric features on the 3D shape also limits its performance.

Therefore, solving the scarcity of 3D training data and employing data-driven is a problem that those skilled in the art need to solve.

Disclosure of Invention

In view of the above, the present invention provides a three-dimensional interest point extraction method based on multi-view projection and deep learning, which projects a labeled 3D object into multiple 2D views, learns required features from the 2D views in an end-to-end manner, and automatically detects interest points in a 3D object test by applying a trained neural network and an improved density peak clustering algorithm, so that satisfactory results can be obtained without depending on artificially set feature descriptors or a large amount of expensive 3D training data.

In order to achieve the purpose, the invention adopts the following technical scheme:

a three-dimensional interest point extraction method based on multi-view projection and deep learning is characterized by comprising the following steps:

s1, acquiring training image data: projecting the 3D object into a plurality of 2D views in different directions, and recording the corresponding relation between each pixel in the 2D views and each vertex on the surface of the 3D object;

s2, constructing the training probability distribution of the artificially marked 3D object interest points: constructing a training probability distribution P of the artificially marked interest points on the surface of each 3D object based on the normal probability density;

S3, training a neural network: training a neural network according to training image data and training probability distribution P of the artificial marker to obtain a neural network model capable of automatically generating 3D object interest point probability distribution;

s4, obtaining probability distribution Q of interest points on the surface of the test 3D object: projecting a 3D model of interest points to be extracted to obtain a 2D view, inputting images into a trained neural network model to obtain probability distribution of the interest points to be extracted on the 2D view, and then back-projecting the probability distribution to the surface of a 3D object to obtain the probability distribution Q of three-dimensional interest points on the surface of the tested 3D object;

s5, extracting three-dimensional interest points: according to the probability distribution Q of step S4, a three-dimensional interest point on the 3D object is extracted using a modified density peak clustering algorithm.

Preferably, the step S1 of projecting the 3D object into the 2D views in a plurality of different directions includes:

s11, constructing a virtual three-dimensional boundary sphere by taking the 3D object as a sphere center;

s12, taking any point on the surface of the virtual three-dimensional boundary sphere as an initial position and selecting any direction to construct longitude and latitude, wherein the initial position point is contained on the equator of the constructed longitude and latitude;

s13, placing a first virtual camera at the initial position of the virtual three-dimensional boundary sphere, and uniformly arranging the virtual cameras along the equator by using the initial position as a starting point and using the same longitude angle;

S14, taking the positions of virtual cameras arranged at different longitudes on the equator of the virtual three-dimensional boundary sphere as a reference, correspondingly and uniformly arranging virtual cameras on latitude lines with the same latitude angle on two sides of the equator, and respectively arranging one virtual camera at each of the two polar positions of the virtual three-dimensional boundary sphere;

and S15, shooting a 3D object located at the sphere center position of the virtual three-dimensional boundary sphere through the virtual camera to acquire a 2D image, wherein the 2D image comprises a shadow image and positive and negative depth images of the 3D object.

Preferably, the same longitude angle intersection comprises a 45 ° longitude angle and the same latitude angle comprises a 45 ° latitude angle.

Preferably, the virtual camera is rotated 4 times at angular intervals of 90 degrees to increase the amount of training data when capturing the image of the object.

Preferably, the interest point training probability distribution P constructed on the surface of the 3D object in the step S2 is generated according to the attenuation of the geodesic distance of the nearest interest point, and any vertex v _i The upper P is defined as:

wherein d (v) _i ，p _i ) Representing a vertex v _i To the nearest point of interest p _i σ is a parameter that can be used to control the decay rate.

Preferably, the training of the neural network according to the training image data and the training probability distribution P of the artificial label in step S3 includes the following steps:

S31, taking an encoder network formed by Conv + BN + ReLU and a pooling layer as a convolutional layer of a neural network to design a feature extractor of a 2D view, extracting a feature value of each pixel of a 2D image, and forming a decoder network by an upsampling layer and the Conv + BN + ReLU layer;

s32, the up-sampling layer receives corresponding pooling indexes from the pooling layer, each pixel characteristic value is set in a position according to the merging indexes, and a density characteristic diagram is generated by the conversion layer;

s33, feeding the feature map to the softmax layer to classify each pixel independently.

Preferably, the following strategy is adopted to obtain the probability distribution Q of the interest points on the surface of the test 3D object in step S4:

wherein Q _i For the surface vertex v of the 3D object _i The true probability of whether the interest point is; q. q.s _i ， _j Is that the 3D object is at the corresponding vertex v _i 0 indicates that there is no pixel point at the corresponding vertex, and n indicates that there are n pixel points at the corresponding vertex.

Preferably, the step S5 uses an improved density peak clustering algorithm, and specifically includes the following steps:

s51, connecting all the vertexes v on the 3D object _i Mapping to a two-dimensional decision graph with the horizontal and vertical axes representing p and δ, respectively, and with δ i defined as per vertex v of the 3D object _i Influence radius of (2):

Wherein d (v) _i ，v _j ) Is the vertex v _i And v _j Geodesic distance therebetween;

s52, defining a curve function on the two-dimensional decision diagram

Separation curves for determining three-dimensional points of interest and other common vertices, where C ₁ And C ₂ Variables representing up and down movement of the curve, defining a separation curve

S53, according to the separation curve on the two-dimensional decision diagram ₃ Determining three-dimensional points of interestAnd mapping the obtained three-dimensional interest point back to the 3D object to obtain the final three-dimensional interest point.

According to the technical scheme, compared with the prior art, the invention discloses the three-dimensional interest point extraction method based on multi-view projection and deep learning, and compared with the prior art, the three-dimensional interest point extraction method has the following beneficial effects:

1. the method for extracting the three-dimensional interest points is end-to-end, has robustness, and can realize better performance than the traditional method: good performance can be obtained even if only a small number of training samples are provided.

2. The invention is completely driven by data, and can further improve the extraction performance of the three-dimensional interest points as long as enough labeled samples are provided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a process provided by the present invention;

FIG. 2 is a schematic diagram of the projection position of a 3D object provided by the present invention;

fig. 3 is a schematic diagram of different curves of a two-dimensional decision diagram provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention shown in fig. 1 discloses a three-dimensional interest point extraction method based on multi-view projection and deep learning, which comprises the following steps:

s1, training data acquisition: marking the vertexes of the 3D object and projecting the vertexes into a plurality of two-dimensional views in different directions, acquiring 2D image data of the 3D object and recording the relation among the vertexes of the 3D object:

for each shape of 3D object, a virtual three-dimensional boundary sphere and the longitude and latitude thereof are constructed by taking the 3D object as the sphere center, as shown in fig. 2, 26 virtual cameras are defined at different longitude and latitude positions of a virtual three-dimensional boundary sphere, the virtual cameras are placed at the radius of the three-dimensional boundary sphere, any position at the radius of the three-dimensional boundary sphere is selected and a direction is randomly set as an initial azimuth angle, the selected position point is contained on the built equator of the longitude and latitude, the first virtual camera (camera 1) is then placed in this position, and another 7 virtual cameras (cameras 2 to 8) are fixed uniformly at every 45 degrees longitude angle along the equator, when the latitudes of the elevation angles reach 45 degrees and-45 degrees, respectively, another 16 virtual cameras (cameras 9 to 16 and cameras 17 to 24) are placed at the same longitude angles as the cameras 1 to 8 in latitudes of 45 ° and-45 °. The last two cameras are placed on both poles. Further, each camera was rotated 4 times at intervals of 90 degrees to increase the training data set. After obtaining shadow and depth images of 3D objects, we convert the single channel images into three channel images: setting the positive depth image as a first channel as an input image of the projection neural network, setting the shadow image as a second channel, and setting the negative depth image as a third channel.

In other embodiments, the angle at which the virtual cameras are uniformly arranged along the equator with the initial position as the starting point may be other values, such as 30 ° and 60 °, and the like, and the elevation angle of the latitude may also be other angles, and different angles may be selected according to different 3D objects, so that different numbers of virtual cameras may be set according to different 3D objects.

S2, constructing the training probability distribution of the artificially marked 3D object interest points: constructing an interest point training probability distribution P on the surface of each 3D object based on normal probability density;

the human labeled POIs are usually several vertices on the mesh, so if the projection images of those labeled shapes are used directly in the neural network training process, the distributions of the positive and negative samples will be greatly unbalanced, for which we will construct probability distributions for the 3D labeled object' S POI before the training starts, i.e. the 3D object surface constructed POI training probability distribution P is generated in step S2 according to the attenuation of the nearest POI geodesic distance, and any 3D object surface vertex v _i Above defines P as:

wherein d (v) _i ，p _i ) Representing a vertex v _i To the nearest point of interest p _i σ is a parameter that can be used to control the decay rate, and in experiments where σ is set to one fifth of the maximum geodesic distance, the probability distribution P will be mapped onto all 2D images projected from the 3D object.

and training the convolutional neural network by using the prepared data to predict the probability distribution of the POI of the 3D object on the 2D view, taking the 3D-shaped 2D image as input by using the projection neural network in the training process, and outputting a corresponding labeled image. The neural network attempts to learn a mapping from each pixel of the input image (containing shadow and depth information of the 3D shape) to the corresponding pixel of the output 2D image (containing probability distribution information of whether the vertex is a POI).

The neural network training process specifically comprises the following steps:

s31, designing an encoder network consisting of Conv + BN + ReLU and a pooling layer as a convolutional layer of a neural network into a feature extractor of a 2D view, extracting a feature value of each pixel of a 2D image, and forming a decoder network by an upsampling layer and the Conv + BN + ReLU layer;

s33, feeding the feature map to the softmax layer to classify each pixel independently, and adding a weighted pixel classification layer after the softmax layer to achieve class balancing.

It is worth noting that even if the probability distribution is constructed without using the projection image (usually binary image) of the marker shape directly, the unbalanced distribution is still faced in the training sample, and the positive sample (pixel) only accounts for 7.5% of the output image in the experiment, which may negatively affect the prediction performance of the neural network. Therefore, class weighting is used to balance the samples of different classes, in which the weight w of the kth class of samples is used _k Is defined as:

wherein W _k Representing the number of samples (pixels) in the kth class of images, and then using the weight w _k Establishing a weighted pixel classification layer with cross entropy loss, wherein a loss function is defined as:

where m is the pixel on the output image, t _km Is the ground truth probability that pixel m belongs to class k, y _km Is the predicted probability that pixel m belongs to class k, the last layer of the neural network, the weighted pixel classification layer, is represented by w _k Composition, i.e. class weight of each pixel.

S4, obtaining probability distribution Q of interest points on the surface of the test 3D object: and directly mapping the obtained POI probability distribution on the 2D view back to the surface of the 3D object according to the relation between the pixel point of the 2D image and the vertex of the 3D object recorded in the projection process, and obtaining the three-dimensional interest point probability distribution Q of the surface of the 3D object. Since one vertex may appear in a plurality of projected 2D images, the following strategy is adopted using step S4 Determining vertices v on a 3D object _i True probability Q of whether or not it is a POI _i ：

Wherein q is _i ， _j Is that the 3D object is at the corresponding vertex v _i Respectively, the predicted value of the jth pixel of (1) is considered, and no pixel or n pixels corresponding to the vertex v are considered _i The case (1).

S5, extracting three-dimensional interest points: the detection of extracting a specific POI on a 3D shape by a three-dimensional interest point probability distribution Q of the surface of the 3D object may be understood as extracting a point having a locally highest probability value on the surface of the 3D shape. Specifically, a new density peak value clustering algorithm is adopted, the defect that the traditional method lacks an automatic peak value extraction mechanism is overcome, a new method for automatically extracting required peak points is established, and each peak point v is subjected to a clustering algorithm _i Let ρ i denote v _i After the two-dimensional decision diagram is obtained, POI is distributed at the upper right corner, and other points are distributed near a coordinate axis, at the moment, the extraction problem of the interest points becomes the simplest two-classification problem in machine learning, so that the values of variable parameters C1 and C2 are adjusted according to the result of training data, and a better separation curve is obtained ₃ . For different interest point extraction tasks, the position of the curve can change along with the change of C1 and C2.

The method specifically comprises the following steps:

S51, as shown in FIG. 3, mapping all the vertexes vi on the 3D object to a two-dimensional decision graph, wherein the horizontal axis and the vertical axis represent ρ and δ, respectively, and δ i is defined as each vertex v of the 3D object _i Influence radius of (2):

wherein d (v) _i ，v _j ) Is the vertex v _i And v _j The geodesic distance between the two is large at the upper right corner on the decision diagramPoints of δ and large ρ are considered good candidates for POI;

s52, defining a curve function on the two-dimensional decision diagram

For finding separate curves that define three-dimensional points of interest and other common vertices (POI always appears in the upper right corner, while other common vertices appear around two axes), where C ₁ And C ₂ Variables representing the up and down movement of the curve, defining a separation curve

S53, according to the separation curve on the two-dimensional decision diagram of FIG. 3 ₃ Determining that three-dimensional interest point is located curve ₃ And the vertex on the right side and the top side is regarded as POI, and the obtained three-dimensional interest point is mapped to a 3D object, namely the final three-dimensional interest point.

On the basis of the original algorithm in the step S51, the density peak value clustering algorithm is improved and used for extracting the interest points.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A three-dimensional interest point extraction method based on multi-view projection and deep learning is characterized by comprising the following steps:

s1, collecting training image data: projecting the 3D object into a plurality of 2D views in different directions, and recording the corresponding relation between each pixel in the 2D views and each vertex on the surface of the 3D object;

The method comprises the following specific steps:

s33, feeding the feature map to a softmax layer to classify each pixel independently;

s5, extracting three-dimensional interest points: extracting three-dimensional interest points on the 3D object by using an improved density peak clustering algorithm according to the probability distribution Q of the step S4;

the improved density peak clustering algorithm specifically comprises the following steps:

S51, connecting all the vertexes v of the 3D object _i Mapping to a two-dimensional decision graph, where the horizontal and vertical axes represent p and δ, respectively, while δ is represented _i Defined as each vertex v of a 3D object _i Influence radius of (2):

s52, defining a curve function on the two-dimensional decision diagram

Separation curves for determining three-dimensional points of interest and other common vertices, where C ₁ And C ₂ Variables representing the up and down movement of the curve, defining a separation curve

S53, according to the separation curve on the two-dimensional decision diagram ₃ And determining a three-dimensional interest point, and mapping the obtained three-dimensional interest point back to the 3D object to obtain the final three-dimensional interest point.

2. The multi-view projection and deep learning three-dimensional interest point extraction method according to claim 1, wherein the step S1 of projecting the 3D object into the 2D views in a plurality of different directions comprises:

S14, correspondingly and uniformly arranging virtual cameras on latitude lines with the same latitude angle on two sides of the equator by taking the positions of the virtual cameras arranged at different longitudes on the equator of the virtual three-dimensional boundary sphere as a reference, wherein two virtual cameras are respectively arranged at two polar positions of the virtual three-dimensional boundary sphere;

and S15, shooting a 3D object located at the sphere center position of the virtual three-dimensional boundary sphere through the virtual camera to acquire a 2D image, wherein the 2D image comprises a shadow image and a positive and negative depth image of the 3D object.

3. The multi-view projection and deep learning three-dimensional point of interest extraction method of claim 2, wherein the same longitude angle comprises a 45 ° longitude angle and the same latitude angle comprises a 45 ° latitude angle.

4. The multi-view projection and deep learning three-dimensional interest point extracting method according to claim 2, wherein the virtual camera rotates 4 times at an angle interval of 90 degrees to increase the amount of training data when acquiring the object image.

5. The method for extracting three-dimensional interest points through multi-view projection and deep learning according to claim 1, wherein the training probability distribution P of the 3D object interest points constructed in the step S2 and labeled by human is generated according to attenuation of geodesic distance of the nearest interest points, and any vertex v is generated _i The upper P is defined as:

6. The multi-view projection and deep learning three-dimensional interest point extraction method according to claim 1, wherein the probability distribution Q of the interest points on the surface of the test 3D object obtained in the step S4 adopts the following strategy:

wherein Q _i For the surface vertex v of the 3D object _i The true probability of whether the interest point is; q. q.s _i J is the 3D object at the corresponding vertex v _i 0 indicates that there is no pixel point at the corresponding vertex, and n indicates that there are n pixel points at the corresponding vertex.