CN115761177A

CN115761177A - Meta-universe-oriented three-dimensional reconstruction method for cross-border financial places

Info

Publication number: CN115761177A
Application number: CN202211638688.7A
Authority: CN
Inventors: 孙筱和; 邓建华; 吴春江
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-03-07

Abstract

The invention discloses a three-dimensional reconstruction method for a cross-border financial place facing a metauniverse, which comprises the following steps of; constructing a training data set; training a semantic segmentation network; selecting a financial place needing three-dimensional reconstruction, obtaining a large number of image samples of each angle in the financial place, wherein each image sample comprises an RGB-D image and pose information when the RGB-D image is obtained, identifying a dynamic target in the RGB-D image by using a semantic segmentation model while obtaining, and tracking a static scene; and performing three-dimensional real-time reconstruction by using all image samples of the financial place, sequentially processing all the image samples, and finally obtaining the octree map without the discrete points as a three-dimensional reconstruction model. The model reconstructed by the method can well remove the dynamic target, is particularly suitable for a large-scene environment, and the three-dimensional reconstruction model reconstructed in real time is used for generating a meta-universe scene which is used in online cross-border financial services.

Description

Cross-border financial place three-dimensional reconstruction method oriented to meta universe

Technical Field

The invention relates to a dynamic three-dimensional reconstruction method, in particular to a three-dimensional reconstruction method for a cross-border financial place facing a meta universe.

Background

The metasma is a virtual world which is constructed by human beings by using digital technology, is mapped by or exceeds the real world and can interact with the real world, and with the improvement of science and technology, the application scenes of the metasma are more and more extensive, for example, metasma scenes are generated and are used for online cross-border financial services. The generation of the meta-space scene can be combined with a real scene, and a three-dimensional model is reconstructed by adopting a synchronous positioning and mapping technology.

The synchronous positioning and mapping technology is called SLAM for short, and is called Simultaneous localization and mapping in English. SLAM enables real-time reconstruction of surrounding scenes while acquiring sensor motion information, and in recent years, many excellent algorithms for SLAM for scene reconstruction have emerged. However, the current mainstream methods are methods for reconstructing a static scene. However, in order to recognize a scene faster and more accurately, a robot moving at a high speed is mostly adopted to collect photos in a real scene. Because there are many moving targets such as people and robots in the real scene, the environment becomes a high dynamic environment. The dynamic targets not only can generate errors for the pose estimation of the sensor, but also a geometric model of the environment reconstructed by the robot in the environment can contain a three-dimensional model of the dynamic targets, so that the method is difficult to be used for navigation, condition monitoring and other practical applications.

How to eliminate a model of a dynamic target in a reconstructed scene geometric model and improve the pose accuracy of the robot by using the information of the dynamic target becomes a key problem to be faced when reconstructing a scene.

In addition, scenes related to the line cross-border financial service belong to a large scene range, and the dense map obtained by the traditional reconstruction method has extremely high hardware requirements for the real-time reconstruction of the large scene, so that how to solve the real-time reconstruction in the scene is another problem solved herein.

Disclosure of Invention

The invention aims to provide a three-dimensional reconstruction method for cross-border financial places facing the meta universe, which can solve the problems and quickly complete real-time reconstruction in a large scene.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a three-dimensional reconstruction method for a cross-border financial place facing a metauniverse comprises the following steps;

(1) Constructing a training data set;

constructing a training data set: acquiring a large number of RGB-D images in a scene of a financial place, marking dynamic targets in the RGB-D images, taking the RGB-D images marked with the dynamic targets as sample data, and forming a training data set by all the sample data;

(2) Training a semantic segmentation network;

classifying the dynamic targets, numbering according to categories, training a semantic segmentation network by using a training data set to obtain a semantic segmentation model, inputting RGB-D images into the semantic segmentation model, and outputting semantic information which can identify the dynamic targets in the RGB-D images;

(3) Selecting a financial place needing three-dimensional reconstruction, obtaining a large number of image samples of each angle in the financial place, wherein each image sample comprises an RGB-D image and pose information when the RGB-D image is obtained, identifying a dynamic target in the RGB-D image by using a semantic segmentation model while obtaining the pose information, and tracking a static scene;

(4) Performing three-dimensional real-time reconstruction by using the image samples in the step (3), wherein the image samples are sequentially processed according to frames, and each frame corresponds to a moment;

(41) Generating a dense point cloud model according to the RGB-D image and the pose information in the frame of image sample;

(42) Removing dynamic targets in the dense point cloud model by using semantic information corresponding to the image sample to obtain a dense point cloud map;

(43) Sparse representation is carried out on the dense point cloud map through an octree, 1 node corresponds to the point cloud in 1 dense point cloud map, the octree map is obtained, each node is represented through the occupied probability of the node, and the occupied probability y of 1 node is calculated through the following formula;

in the formula, x is the number of times that the point cloud corresponding to the node is observed from the first frame to the current frame image sample in an accumulated mode, the initial value is 0, if the point cloud is observed once in one frame image sample, the number of times is increased by 1, otherwise, the number of times is decreased by 1;

(44) Updating the octree map by using the value obtained in the step (43) to obtain an updated octree map;

(45) Analyzing discrete points in the updated octree map by combining the distance information, and removing the discrete points to obtain the octree map from which the discrete points are removed;

(5) And (5) processing all image samples according to the step (4), and finally obtaining the octree map with the discrete points removed as a three-dimensional reconstruction model.

Preferably, the method comprises the following steps: in the step (1), the category of the dynamic object includes a human and a robot.

Preferably, the method comprises the following steps: in the step (2), the expression of the semantic information of the current RGB-D image f is [ Z × W × H ]] _f ；

Wherein Z is the number of dynamic targets in f, Z is less than or equal to 10, W is the category number of each target in f, H is a probability matrix,

x _f is the serial number of the current target in f, x _f The serial number of the current target category is shown, u and v are coordinate values of pixel points on an x axis and a y axis respectively, and p is a probability value of a category label corresponding to the pixel points;

presetting a probability threshold delta, and when identifying the dynamic target, the semantic information [ Z multiplied by W multiplied by H ] is used] _f Comparing the p value of each pixel point on the dynamic target with a preset probability threshold value delta, if p is larger than delta, considering the pixel point as a dynamic point, reserving the dynamic point, and if p is less than or equal to delta, deleting the pixel point from the dynamic target.

Preferably, the method comprises the following steps: in the step (3), a robot carrying a depth camera and a pose sensor is adopted to obtain an image sample, wherein an RGB-D image is obtained by the depth camera, and pose information is obtained by the pose sensor.

Preferably, the method comprises the following steps: and (44) presetting a rejection threshold, if the y value corresponding to a node is smaller than the rejection threshold in the step (43), rejecting the node in the octree map, wherein the rejection threshold is 0.3-0.4.

Preferably, the method comprises the following steps: the step (45) is concretely;

(451) For 1 node in the updated octree map, if a neighborhood node exists, calculating the relative three-dimensional distance between the node and each neighborhood node;

wherein the relative three-dimensional distance Dis between the node and the 1 neighbor node is obtained by the following formula;

in the formula, N _x 、N _y 、N _z Is the coordinate value of the node on the x-axis, y-axis and z-axis, N _x’ 、N _y’ 、N _z’ Coordinate values of the neighborhood nodes on an x axis, a y axis and a z axis are obtained, and r is the maximum horizontal distance of the second octree map;

(452) Calculating a penalty value of each neighborhood node, wherein the penalty value P of 1 neighborhood node is obtained through the following formula;

P＝Ce ^-Dis

in the formula, C is a preset parameter, and e is a natural constant;

(453) Presetting a distance threshold

Judging the punishment value of each neighborhood node, if the punishment value is larger than the punishment value

Removing as discrete points, otherwise, keeping;

(454) And (4) processing all nodes in the second octree map according to the steps (451) - (453) to obtain the octree map after the discrete points are removed.

The overall thought of the invention is as follows: in the steps (1) and (2), a training data set is firstly constructed, dynamic targets are predefined, a semantic segmentation network is trained to output semantic information, the dynamic targets in the RGB-D image can be identified according to the semantic information, so that a static scene and the dynamic targets are segmented, and the semantic segmentation network has the capability of preliminary segmentation.

During reconstruction, in the step (3), a large number of image samples of each angle are obtained, and each image sample comprises an RGB-D image and pose information; firstly, generating a dense point cloud model based on the RGB-D image and the pose information, and then removing dynamic targets in the dense point cloud model by using semantic information to obtain a dense point cloud map; at this time, the obtained dense point cloud map has been rejected once, but as semantic information segmentation still has many inaccurate places, we need further rejection, and in addition, the data volume of the dense point cloud map is too large, we perform sparse expression on the dense point cloud map to reduce the data volume, and judge whether the node is the node of the dynamic target by counting the probability y occupied by 1 node in the octree map, for example, in step (43), it is counted that the y value of 1 node is 0.5, and in combination with step (44), if 0.5 is greater than the dynamic threshold value 0.3, it is determined that the node belongs to the dynamic target, and the node is rejected in the octree map, so as to further reject the dynamic target. In the step (5), discrete points are removed according to the distance information, and the finally obtained model is the three-dimensional model of the static scene with high precision.

Compared with the prior art, the invention has the advantages that: when a robot is adopted to obtain image samples of all angles in a financial place in a large quantity, firstly, a semantic segmentation model is used to segment a dynamic target and a static scene in an RGB-D image, and the static scene is tracked; the image building effect depends on the positioning accuracy, and the robot tracks the static scene while collecting, so that the positioning accuracy of the robot can be improved.

The method comprises the steps of utilizing a color image and a depth image of an RGB-D image and combining sensor pose information to generate a dense point cloud model, enabling the dense point cloud model to contain a three-dimensional model of a static scene, firstly removing a dynamic target for the first time through a semantic segmentation model, counting the probability of occupying nodes of the dynamic target after the dynamic target is stored as an octree map, removing the dynamic target for the second time, and finally judging and removing discrete points through the combination of distance information to finally obtain a geometric model only containing the static scene.

The model reconstructed by the method is suitable for a large scene environment, and the three-dimensional reconstruction model reconstructed in real time is used for generating a meta-universe scene which is used in online cross-border financial services.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention will be further explained with reference to the drawings.

Example 1: referring to fig. 1, a three-dimensional reconstruction method for a cross-border financial site facing a meta universe includes the following steps;

(1) Constructing a training data set;

(2) Training a semantic segmentation network;

classifying the dynamic targets, numbering according to categories, training a semantic segmentation network by using a training data set to obtain a semantic segmentation model, inputting RGB-D images by the semantic segmentation model, and outputting semantic information which can identify the dynamic targets in the RGB-D images;

(42) Removing the dynamic target in the dense point cloud model by using semantic information corresponding to the image sample to obtain a dense point cloud map;

(43) Sparse expression is carried out on the dense point cloud map through an octree, 1 node corresponds to the point cloud in 1 dense point cloud map, the octree map is obtained, each node is expressed through the probability of the node being occupied, and the probability y of the 1 node being occupied is calculated through the following formula;

Example 2: referring to fig. 1, based on embodiment 1, we further define that, in step (1), the classes of the dynamic object include a human and a robot. In fact, the dynamic target can be set according to actual conditions. Not limited to only the above two categories.

Example 3: referring to fig. 1, on the basis of embodiment 1 or embodiment 2, the step (2) is further defined, since during training, one sample data and one sample data are sequentially input, the current RGB-D image is set as f, and the expression of semantic information is [ Z × W × H [] _f ；

Wherein Z is the number of dynamic targets in f, Z is less than or equal to 10, W is the class number of each target in f, H is a probability matrix,

x _f is the serial number, x, of the current target in f _f The serial number of the current target category is shown, u and v are coordinate values of pixel points on an x axis and a y axis respectively, and p is a probability value of a category label corresponding to the pixel points;

In the step (3), a robot carrying a depth camera and a pose sensor is adopted to obtain an image sample, wherein an RGB-D image is obtained by the depth camera, and pose information is obtained by the pose sensor. After the RGB-D image and the pose information are obtained, the semantic information can be used for identifying the RGB-D image dynamic target, the dynamic target and the static scene are initially segmented, the robot is used for tracking the static scene, the robot can track the static scene while collecting, and the positioning accuracy of the robot can be improved

And (44) presetting a rejection threshold, if the y value corresponding to a node is smaller than the rejection threshold in the step (43), rejecting the node in the octree map, wherein the rejection threshold is 0.3-0.4.

The step (45) is concretely;

in the formula, N _x 、N _y 、N _z Is the node at the x-axis,Coordinate values on the y-axis and z-axis, N _x’ 、N _y’ 、N _z’ Coordinate values of the neighborhood nodes on an x axis, a y axis and a z axis are shown, and r is the maximum horizontal distance of the second octree map;

P＝Ce ^-Dis

in the formula, C is a preset parameter, and e is a natural constant;

(453) Presetting a distance threshold

Removing as discrete points, otherwise, keeping;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A three-dimensional reconstruction method for a cross-border financial place facing a meta universe is characterized by comprising the following steps: comprises the following steps;

(1) Constructing a training data set;

constructing a training data set: acquiring a large number of RGB-D images in a financial place scene, marking dynamic targets in the RGB-D images, taking the RGB-D images marked with the dynamic targets as sample data, and forming a training data set by all the sample data;

(2) Training a semantic segmentation network;

(3) Selecting a financial place needing three-dimensional reconstruction, obtaining a large number of image samples of each angle in the financial place, wherein each image sample comprises an RGB-D image and pose information when the RGB-D image is obtained, identifying a dynamic target in the RGB-D image by using a semantic segmentation model while obtaining, and tracking a static scene;

(4) Performing three-dimensional real-time reconstruction by using the image samples in the step (3), wherein the image samples are sequentially processed according to frames, and each frame corresponds to a moment, specifically;

2. The method for three-dimensional reconstruction of a metastic-oriented trans-border financial site according to claim 1, wherein: in the step (1), the category of the dynamic object includes a human and a robot.

3. The method for three-dimensional reconstruction of a metastic-oriented trans-border financial site according to claim 1, wherein: in the step (2), the expression of the semantic information of the current RGB-D image f is [ Z × W × H ]] _f ；

presetting a probability threshold delta, and when identifying the dynamic target, the semantic information [ Z multiplied by W multiplied by H ] is used] _f Comparing the p value of each pixel point on each dynamic target with a preset probability threshold value delta, if p is larger than delta, considering the pixel point as a dynamic point, reserving the dynamic point, and if p is less than or equal to delta, deleting the pixel point from the dynamic target.

4. The method for three-dimensional reconstruction of a cross-border financial site oriented to the metauniverse of claim 1, wherein: in the step (3), a robot carrying a depth camera and a pose sensor is adopted to obtain an image sample, wherein an RGB-D image is obtained by the depth camera, and pose information is obtained by the pose sensor.

5. The method for three-dimensional reconstruction of a metastic-oriented trans-border financial site according to claim 1, wherein: and (44) presetting a rejection threshold, if the y value corresponding to a node is smaller than the rejection threshold in the step (43), rejecting the node in the octree map, wherein the rejection threshold is 0.3-0.4.

6. The method for three-dimensional reconstruction of a metastic-oriented trans-border financial site according to claim 1, wherein: the step (45) is concretely;

wherein the relative three-dimensional distance Dis between the node and the 1 neighborhood node is obtained by the following formula;

in the formula, N _x 、N _y 、N _z Is the coordinate value of the node on the x-axis, y-axis and z-axis, N _x’ 、N _y’ 、N _z’ Coordinate values of the neighborhood nodes on an x axis, a y axis and a z axis are shown, and r is the maximum horizontal distance of the second octree map;

P＝Ce ^-Dis

in the formula, C is a preset parameter, and e is a natural constant;

(453) Presetting a distance threshold

Removing as discrete points, otherwise, reserving;