CN117911662B - Digital twin scene semantic segmentation method and system based on depth hough voting - Google Patents
Digital twin scene semantic segmentation method and system based on depth hough voting Download PDFInfo
- Publication number
- CN117911662B CN117911662B CN202410318275.3A CN202410318275A CN117911662B CN 117911662 B CN117911662 B CN 117911662B CN 202410318275 A CN202410318275 A CN 202410318275A CN 117911662 B CN117911662 B CN 117911662B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- scene
- semantic
- semantic segmentation
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000005259 measurement Methods 0.000 claims abstract description 71
- 230000014509 gene expression Effects 0.000 claims description 61
- 230000006870 function Effects 0.000 claims description 30
- 230000002776 aggregation Effects 0.000 claims description 25
- 238000004220 aggregation Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 13
- 239000000284 extract Substances 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 abstract description 10
- 230000003993 interaction Effects 0.000 abstract description 7
- 238000013135 deep learning Methods 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 abstract description 5
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000001788 irregular Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Architecture (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of digital twinning, and solves the technical problem of low semantic segmentation precision caused by insufficient extraction of space point context information in the current three-dimensional point cloud semantic segmentation method based on deep learning, in particular to a digital twinning scene semantic segmentation method and system based on deep Hough voting. The digital twin scene semantic segmentation method provided by the invention can realize rapid and high-precision semantic segmentation of three-dimensional measurement data of any industrial digital twin scene, thereby providing high-precision three-dimensional semantic information for digital twin industrial application, facilitating subsequent rapid positioning of the position of key production elements in the industrial scene, performing intelligent editing and interaction on objects in the scene, and enhancing user experience.
Description
Technical Field
The invention relates to the technical field of digital twinning, in particular to a digital twinning scene semantic segmentation method and system based on depth Hough voting.
Background
In the field of intelligent factory construction in manufacturing industry, the digital twin technology is utilized to realize virtual-real mapping of production service scenes, service processes, processing equipment and the like, and the virtual-real mapping has become a hot spot for research of domestic and foreign scholars. The three-dimensional semantic segmentation technology can effectively extract and analyze geometric structure information in the digital twin scene, divide objects in the scene into different semantic categories and allocate labels, and is widely applied to digital twin modeling and analysis. Based on the segmentation result, high-quality three-dimensional semantic information can be provided for the digital twin workshop, so that the position of an interested object in a scene can be rapidly positioned, intelligent editing and interaction can be performed on the object in the scene, and the intelligent level of workshop management is improved.
In the prior art, three-dimensional point cloud deep learning methods can be mainly divided into three types: a projection-based depth learning method, a voxel-based depth learning method, and a spatial point-based depth learning method. For dealing with irregular three-dimensional point clouds, the most intuitive approach is to convert the irregular representation into a regular representation. In a projection-based depth learning framework, geometric information inside a three-dimensional point cloud is folded during the projection phase. When a dense grid of pixels is formed on the projection plane, the sparseness of the point cloud may be ignored. In addition, the selection of the projection plane can also seriously influence the three-dimensional point cloud feature extraction; another method of converting irregular point clouds into regular representations is three-dimensional voxelization, and using three-dimensional convolution to extract features, the application of conventional three-dimensional discrete convolution generally incurs a large amount of computation and memory overhead; the deep learning method based on the space points designs a deep network structure and directly acts on the point cloud set embedded into the continuous space. Although these approaches have achieved impressive results in three-dimensional object recognition and semantic segmentation, there is still a tremendous exploration space to enhance twin scene understanding based on contextual information analysis.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a digital twin scene semantic segmentation method and a digital twin scene semantic segmentation system based on depth Hough voting, which solve the technical problem of low semantic segmentation precision caused by insufficient extraction of spatial point context information in the current three-dimensional point cloud semantic segmentation method based on depth learning.
In order to solve the technical problems, the invention provides the following technical scheme: a digital twin scene semantic segmentation method based on depth hough voting comprises the following steps:
S1, extracting context information of different scales of a neighborhood of a three-dimensional measurement point from three-dimensional measurement data in an obtained industrial scene through a U-shaped feature coding module to form a first-level feature expression;
s2, capturing context information of object levels in the industrial scene through a Hough voting module and a vote aggregation module to form a second-level feature expression;
S3, splicing the first-level feature expression and the corresponding second-level feature expression of each three-dimensional measurement point by adopting a semantic prediction module, and predicting semantic information of each three-dimensional measurement point;
s4, determining central regression loss and three-dimensional semantic segmentation loss of object objects in the industrial scene, training on an industrial scene data set, and updating parameters of all the modules to form a three-dimensional semantic segmentation system oriented to the industrial digital twin scene;
s5, taking three-dimensional measurement data of the industrial digital twin scene as input, and outputting high-quality three-dimensional semantic information of the scene by the three-dimensional semantic segmentation system.
Further, in step S1, the specific process includes the following steps:
s11, preprocessing the data, and obtaining three-dimensional measurement data Voxelization to convert unordered spatial points into ordered three-dimensional voxels, wherein each three-dimensional measurement point/>From its coordinates/>N represents the number of spatial points;
s12, taking ordered three-dimensional voxels as input, obtaining a first-level feature vector of each space point based on the mapping relation between the space points and the three-dimensional voxels through a U-shaped feature coding module, and forming a first-level feature set 。
Further, in step S2, the specific process includes the following steps:
S21, three-dimensional measurement data And corresponding constitution first-level feature set/>As input, the hough voting module outputs each three-dimensional measurement point/>European spatial offset/>And feature offset/>And generates vote information/>, for that point;
S22, based on each three-dimensional measuring pointVote information/>Constitute vote set/>;
S23, utilizing the furthest point sampling method to collect the spatial position of the ballotSampling M vote positions to obtain/>;
S24, for eachClustering/>, by finding votes adjacent to it in European space;
S25, clustering formed by processing three full-connection layers in the vote aggregation moduleTo obtain a cluster feature set/>。
Further, in step S3, specifically includes: for each three-dimensional measurement pointFor the first-level feature expression/>, which is output by the U-shaped feature coding moduleSecond level feature expression/>, of the cluster feature with which it is locatedAnd splicing and sending the spliced features into a semantic prediction module to predict the three-dimensional semantic category corresponding to the spatial point.
Further, in step S4, the specific process includes the following steps:
S41, determining the central regression loss of the object in the industrial scene ;
S42, determining three-dimensional semantic segmentation loss of object objects in industrial scene;
S43, according to center regression lossAnd three-dimensional semantic segmentation loss/>The joint loss function L is calculated, and the calculation formula of the joint loss function L is as follows:
;
In the above-mentioned method, the step of, Is a hyper-parameter used to balance two different loss function terms;
S44, training on the industrial scene data set based on the joint loss function L;
s45, updating parameters of a feature encoding module, a Hough voting module, a vote aggregation module and a semantic prediction module by adopting a random gradient descent optimization algorithm.
Further, in step S41, the expression of the center regression loss is:
;
In the above formula, N represents the number of spatial points; Is an indication function indicating spatial point/> Whether belonging to a semantic category included in the scene; /(I)Is a space point/>From the initial position/>European spatial deviation truth value to the center of the object to which it belongs; /(I)Is the hough voting module pair space point/>Predicted European spatial offset.
Further, in step S42, three-dimensional semantic segmentation is lostThe expression of (2) is:
;
in the above formula, K represents the number of semantic categories in the dataset; is a sign function, i.e. when the spatial point/> When the true class of (1) equals j,/>In other cases,/>;/>Is the pair of space points/>, of the semantic prediction moduleA predictive probability of belonging to category j; /(I)Is the weight of the j-th category in the dataset, as determined by the dataset.
Further, in step S5, specifically: inputting three-dimensional measurement data of the industrial digital twin scene into a three-dimensional semantic segmentation system, and extracting a first-level feature expression of each three-dimensional measurement point by a U-shaped feature coding module in the three-dimensional semantic segmentation system;
Based on the above, the hough voting module and the ballot aggregation module in the system extract the second level feature expression; and splicing the two levels of feature expressions, and transmitting the feature expressions to a semantic prediction module to acquire the semantic category of each three-dimensional measurement point, so that the high-quality three-dimensional semantic information of the whole scene can be obtained.
The technical scheme also provides a system for realizing the digital twin scene semantic segmentation method, the system is integrated by a plurality of modules to form a three-dimensional semantic segmentation frame facing the industrial digital twin scene, and the modules integrated by the system comprise:
The U-shaped feature coding module is built based on sub-manifold sparse convolution and is used for extracting context information of different scales of a three-dimensional measurement point neighborhood from three-dimensional measurement data in an industrial scene to form a first-level feature expression;
The Hough voting module and the vote aggregation module are formed based on a Hough voting mechanism and are used for capturing context information of object levels in a scene to form a second-level feature expression;
The three-dimensional measurement point semantic prediction module is used for splicing the first-level feature expression and the second-level feature expression of each three-dimensional measurement point and predicting semantic information of each three-dimensional measurement point.
Further, the U-shaped feature coding module is a U-shaped framework built based on a combination of a conventional sparse convolution layer, a conventional sparse inverse convolution layer and a sub-manifold sparse convolution block;
The Hough voting module is based on a multi-layer perceptron and consists of a full connection layer FC, an activation function ReLU and batch normalization BN;
the vote aggregation module consists of a full connection layer FC, an activation function ReLU, batch normalization BN and maximum pooling MaxPooling;
the semantic prediction module consists of two fully connected layers FC.
By means of the technical scheme, the digital twin scene semantic segmentation method and system based on the depth Hough voting provided by the invention have the following beneficial effects:
1. The digital twin scene semantic segmentation method provided by the invention can realize rapid and high-precision semantic segmentation of three-dimensional measurement data of any industrial digital twin scene, thereby providing high-precision three-dimensional semantic information for digital twin industrial application, facilitating subsequent rapid positioning of the position of key production elements in the industrial scene, performing intelligent editing and interaction on objects in the scene, and enhancing user experience.
2. Compared with the existing three-dimensional semantic segmentation method, the digital twin scene semantic segmentation method provided by the invention can effectively capture and fuse context information of different levels, and improves the semantic segmentation precision of the three-dimensional scene.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a digital twin scene semantic segmentation method of the present invention;
FIG. 2 is a network block diagram of a U-shaped feature encoding module of the present invention;
Fig. 3 is a network structure diagram of the hough voting module according to the present invention;
FIG. 4 is a network block diagram of the ballot aggregation module of the present invention;
FIG. 5 is a schematic representation of three-dimensional measurement data of a typical industrial power plant scenario of the present invention;
FIG. 6 is a schematic diagram of the semantic segmentation result of the digital twin industrial scene of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. Therefore, the realization process of how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The digital twin technology is to create a virtual model of the physical entity in a digital mode, and by means of virtual-real interaction feedback, data fusion analysis, decision iteration optimization and the like, the model, data, intelligent and integrated multidisciplinary technology is fully utilized, new capacity is added or expanded for the physical entity, and virtual representation of the physical entity on a computer is established. The digital twin model is a three-dimensional digital model of a workshop physical entity and is used as a twin carrier and a digital base for production process optimization, and twin data are generated in real time in the production process by the workshop entity. The digital twin model is a digital reconstruction of a production field, is a virtual mapping of the characteristics of workshop entity geometry, physics, behavior, rules and the like, and realizes the accurate mapping of the physical world to the virtual space. In the field of intelligent factory construction in manufacturing industry, the digital twin technology is utilized to realize virtual-real mapping of production service scenes, service processes, processing equipment and the like, and the virtual-real mapping has become a hot spot for research of domestic and foreign scholars. The three-dimensional semantic segmentation technology can effectively extract and analyze geometric structure information in the digital twin scene, divide objects in the scene into different semantic categories and allocate labels, and is widely applied to digital twin modeling and analysis. Based on the segmentation result, high-quality three-dimensional semantic information can be provided for the digital twin workshop, so that the position of an interested object in a scene can be rapidly positioned, intelligent editing and interaction can be performed on the object in the scene, and the intelligent level of workshop management is improved.
In the intelligent factory application based on digital twin, three-dimensional semantic segmentation can be realized through three-dimensional point cloud measurement data, so that a high-fidelity digital twin model is established. Unlike a regular two-dimensional grid of pixels, a three-dimensional point cloud is a discrete set embedded in a continuous space, with irregular and disordered features. Thus, while deep convolutional networks exhibit excellent performance in structured two-dimensional computer vision tasks, they cannot be directly applied to such unstructured data.
To address this challenge, a variety of three-dimensional point cloud deep learning approaches have emerged. The main categories can be divided into three types: a projection-based depth learning method, a voxel-based depth learning method, and a spatial point-based depth learning method. For dealing with irregular three-dimensional point clouds, the most intuitive approach is to convert the irregular representation into a regular representation. In a projection-based depth learning framework, geometric information inside a three-dimensional point cloud is folded during the projection phase. When a dense grid of pixels is formed on the projection plane, the sparseness of the point cloud may be ignored. In addition, the selection of the projection plane may also seriously affect the three-dimensional point cloud feature extraction. Another method of converting irregular point clouds into regular representations is three-dimensional voxelization and feature extraction using three-dimensional convolution. The application of conventional three-dimensional discrete convolution typically involves significant computational and memory overhead. The deep learning method based on the space points designs a deep network structure and directly acts on the point cloud set embedded into the continuous space. Although these approaches have achieved impressive results in three-dimensional object recognition and semantic segmentation, there is still a tremendous exploration space to enhance twin scene understanding based on contextual information analysis.
Based on the technical defects existing in the prior art, please refer to fig. 1-6, which illustrate a specific implementation of the present embodiment, in which three-dimensional measurement data of an industrial digital twin scene is input into a three-dimensional semantic segmentation frame, and a U-shaped feature encoding module inside the three-dimensional semantic segmentation frame extracts a first-level feature expression of each three-dimensional measurement point; based on the above, the hough voting module and the ballot aggregation module in the framework extract the second level feature expression; and splicing the two levels of feature expressions, and transmitting the feature expressions to a semantic prediction module to acquire the semantic category of each three-dimensional measurement point, so that the high-quality three-dimensional semantic information of the whole scene can be obtained.
Referring to fig. 1, the present embodiment provides a digital twin scene semantic segmentation method based on depth hough voting, which includes the following steps:
S1, extracting context information of different scales of a neighborhood of a three-dimensional measurement point from three-dimensional measurement data in an obtained industrial scene through a U-shaped feature coding module to form a first-level feature expression; in step S1, the specific process includes the following steps:
s11, preprocessing the data, and obtaining three-dimensional measurement data Voxelization to convert unordered spatial points into ordered three-dimensional voxels, wherein each three-dimensional measurement point/>From its coordinates/>N represents the number of spatial points, and the voxel size is set to 0.02m;
S12, taking ordered three-dimensional voxels as input, obtaining a first-level feature vector of each space point based on the mapping relation between the space points and the three-dimensional voxels through a U-shaped feature coding module, and forming a feature set Wherein/>For the first-level feature vector corresponding to the ith space point, namely the first-level feature expression, N represents the number of space points, as shown in fig. 2, 5 layers in the U-shaped feature coding module are designed in the same combination mode, only the first layer and the fifth layer are shown in the figure, and the middle three layers are omitted; the U-shaped feature coding module can efficiently process ordered three-dimensional voxels, and finally obtains a first-level feature vector of each spatial point based on the mapping relation between the spatial points and the three-dimensional voxels to form a first-level feature set/>。
In the step, the U-shaped feature encoding module can effectively extract context information of different scales in the three-dimensional measurement data, so that the local semantics of the measurement data can be understood more deeply, and the robustness to noise data is improved.
S2, capturing context information of object levels in the industrial scene through a Hough voting module and a vote aggregation module to form a second-level feature expression; in step S2, the specific process includes the following steps:
S21, three-dimensional measurement data Corresponding feature set/>As input, the hough voting module outputs each three-dimensional measurement point/>European spatial offset/>And feature offset/>And generating ballot information for the point; As shown in fig. 3, a network structure diagram of the hough voting module is composed of three full connection layers FC, two batch normalization BN layers, and an activation function ReLU. Wherein given three-dimensional measurement data/>And corresponding first-level feature set/>The module outputs each three-dimensional measurement point/>European spatial offset/>And feature offset/>I.e.And generates vote information/>, for that point;
S22, based on each three-dimensional measuring pointVote information/>Constitute vote set/>;
S23, utilizing the furthest point sampling method to collect the spatial position of the ballotSampling M vote positions to obtain/>,/>Is from above vote set/>A set formed by M elements is sampled;
S24, for each Clustering/>, by finding votes adjacent to it in European space,Wherein/>The representation belongs to the/>Number/>, of clustersA ballot; /(I)Representing votes/>Is a three-dimensional spatial location of (2); /(I)Representing cluster center/>Is a three-dimensional spatial location of (2); /(I)A vote aggregation radius threshold value is represented, set by the user, and in this embodiment set to 0.2;
s25, clustering formed by processing three full-connection layers in the vote aggregation module To obtain a cluster feature set/>As shown in fig. 4, which is a network structure diagram of the ticket aggregation module, after sampling and clustering, the clustering feature set is obtained through the full connection layer FC, the batch normalization BN, the activation function ReLU, and the maximum pooling MaxPooling. Wherein/>For the second-level cluster feature corresponding to the kth vote position, that is, the second-level feature expression, the value of M is irrelevant to the number of object objects in the industrial scene, and m=128 is set in this embodiment.
In this step, the hough voting module and the ballot aggregation module may capture contextual relationships between the object objects, such as their relative positions, correlations, etc., to enhance understanding of the overall scene.
S3, splicing the first-level feature expression and the corresponding second-level feature expression of each three-dimensional measurement point by adopting a semantic prediction module, and predicting semantic information of each three-dimensional measurement point to obtain high-quality three-dimensional semantic information of the whole scene so as to assist subsequent digital twin application, such as: and (3) rapidly positioning the position of the object of interest in the scene, and performing intelligent editing and interaction on the object in the scene.
In step S3, specifically: for each three-dimensional measurement pointFirst-level feature expression/>, output by the U-shaped feature encoding moduleSecond level feature expression/>, of the cluster feature with which it is locatedAnd splicing and sending the spliced features into a semantic prediction module to predict the three-dimensional semantic category corresponding to the spatial point.
In the step, the multi-level and multi-scale semantic information fusion of each three-dimensional measurement point is realized by splicing the feature expressions of the first level and the second level. The semantic prediction module predicts semantic information by utilizing the fused feature expression, so that finer and accurate semantic label prediction for each measurement point can be realized. By utilizing the feature expressions of two different layers, the model can more comprehensively utilize information from different modules, so that the understanding and analyzing capability of the whole system to industrial scenes is improved.
S4, determining central regression loss and three-dimensional semantic segmentation loss of object objects in the industrial scene, training on an industrial scene data set, and updating parameters of all the modules to form a three-dimensional semantic segmentation frame oriented to the industrial digital twin scene; in step S4, the specific process includes the following steps:
S41, determining the central regression loss of the object in the industrial scene Determining center regression loss/>The goal of (a) is to supervise the hough voting module to learn the contextual characteristics of the object level, thus constraining each three-dimensional spatial point/>, hereCan sense the central position/>, of the object. Center regression loss/>The expression of (2) is:
;
In the above formula, N represents the number of spatial points; Is an indication function indicating spatial point/> Whether belonging to a semantic category included in the scene; /(I)Is a space point/>From the initial position/>European spatial deviation truth value to the center of the object to which it belongs; /(I)Is the hough voting module pair space point/>Predicted European spatial offset.
S42, determining three-dimensional semantic segmentation loss of object objects in industrial sceneDetermining three-dimensional semantic segmentation lossThe aim of the method is to monitor the whole system to accurately predict each three-dimensional space point/>Semantic category/>Three-dimensional semantic segmentation loss/>The expression of (2) is:
;
in the above formula, K represents the number of semantic categories in the dataset; is a sign function, i.e. when the spatial point/> When the true class of (1) equals j,/>In other cases,/>;/>Is the pair of space points/>, of the semantic prediction moduleA predictive probability of belonging to category j; /(I)Is the weight of the j-th category in the dataset, as determined by the dataset.
S43, according to center regression lossAnd three-dimensional semantic segmentation loss/>The joint loss function L is calculated, and the calculation formula of the joint loss function L is as follows:
;
In the above-mentioned method, the step of, Is a hyper-parameter used to balance two different loss function terms,/>May be set to 0.6.
S44, training is carried out on the industrial scene data set based on a joint loss function L, and compared with a traditional semantic segmentation loss function, the joint loss function utilizes complementarity of a scene object center regression loss function and a semantic segmentation cross entropy loss function, and senses the center position of an object where a three-dimensional space point is located while carrying out semantic prediction, so that the overall semantic segmentation performance of the model is enhanced.
S45, updating parameters of a feature encoding module (such as various network layers involved in fig. 2), a Hough voting module (such as various network layers involved in fig. 3), a vote aggregation module (such as various network layers involved in fig. 4) and a semantic prediction module (two fully connected layers) by adopting a random gradient descent optimization algorithm.
S5, taking three-dimensional measurement data of the industrial digital twin scene as input, and outputting high-quality three-dimensional semantic information of the scene by the three-dimensional semantic segmentation system. The method comprises the following steps: inputting three-dimensional measurement data of the industrial digital twin scene into a three-dimensional semantic segmentation system, and extracting a first-level feature expression of each three-dimensional measurement point by a U-shaped feature coding module in the three-dimensional semantic segmentation system; based on the above, the hough voting module and the ballot aggregation module in the system extract the second level feature expression; splicing the two levels of feature expressions, and transmitting the feature expressions to a semantic prediction module to acquire semantic categories of each three-dimensional measurement point, so that high-quality three-dimensional semantic information of the whole scene can be obtained, and as shown in fig. 5, three-dimensional measurement data of a typical industrial power station scene can be used as input of the three-dimensional semantic segmentation system; FIG. 6 is an output visualization result of the three-dimensional semantic segmentation system of the present invention, the result containing 8 semantic categories of: reactor 1, wire 2, filter 3, pole 4, voltage divider 5, ground 6, overhead 7, miscellaneous 8, different depths representing different semantic categories.
The digital twin scene semantic segmentation method based on the depth Hough voting can realize rapid and high-precision semantic segmentation of three-dimensional measurement data of any industrial digital twin scene, thereby providing high-precision three-dimensional semantic information for digital twin industrial application, facilitating subsequent rapid positioning of the position of key production elements in the industrial scene, and performing intelligent editing and interaction on objects in the scene. Compared with the existing three-dimensional semantic segmentation method, the method can effectively capture and fuse context information of different levels, and improves the semantic segmentation precision of the three-dimensional scene.
The system provided by this embodiment corresponds to the digital twin scene semantic segmentation method provided by the foregoing embodiment, so that implementation of the foregoing digital twin scene semantic segmentation method is also applicable to the system provided by this embodiment, and will not be described in detail in this embodiment.
Referring to fig. 2-4, which show a network architecture diagram of a three-dimensional semantic segmentation framework provided by the present embodiment, the system is integrated by a plurality of modules to form a three-dimensional semantic segmentation framework facing an industrial digital twin scene, and the modules integrated by the system include:
As shown in fig. 2, a network architecture diagram of a U-shaped feature encoding module is constructed based on sub-manifold sparse convolution, and is used for extracting context information of different scales of a three-dimensional measurement point neighborhood from three-dimensional measurement data in an industrial scene to form a first-level feature expression, and the U-shaped feature encoding module is a U-shaped architecture constructed based on a combination of a conventional sparse convolution layer, a conventional sparse inverse convolution layer and a sub-manifold sparse convolution block.
The Hough voting module and the vote aggregation module are formed based on a Hough voting mechanism and are used for capturing context information of object levels in a scene to form a second-level feature expression; as shown in fig. 3, a network architecture diagram of a hough voting module, which is based on a multi-layer perceptron and consists of a full connection layer FC, an activation function ReLU and a batch normalization BN; as shown in fig. 4, a network architecture diagram of a vote aggregation module is provided, where the vote aggregation module is composed of a full connection layer FC, an activation function ReLU, a batch normalization BN, and a maximum pooling MaxPooling.
The semantic prediction module is used for splicing the first-level feature expression and the second-level feature expression of each three-dimensional measurement point and predicting semantic information of each three-dimensional measurement point, and consists of two full-connection layers.
According to the embodiment, three-dimensional measurement data of an industrial digital twin scene are input into a three-dimensional semantic segmentation frame, and a U-shaped feature coding module in the three-dimensional semantic segmentation frame extracts a first-level feature expression of each three-dimensional measurement point; based on the above, the hough voting module and the ballot aggregation module in the framework extract the second level feature expression; and splicing the two levels of feature expressions, and transmitting the feature expressions to a semantic prediction module to acquire the semantic category of each three-dimensional measurement point, so that the high-quality three-dimensional semantic information of the whole scene can be obtained.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in a method of implementing an embodiment described above may be implemented by a program to instruct related hardware, and thus, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For each of the above embodiments, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description of the method embodiment for relevant points.
The foregoing embodiments have been presented in a detail description of the invention, and are presented herein with a particular application to the understanding of the principles and embodiments of the invention, the foregoing embodiments being merely intended to facilitate an understanding of the method of the invention and its core concepts; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (10)
1. The digital twin scene semantic segmentation method based on the depth hough voting is characterized by comprising the following steps of:
S1, extracting context information of different scales of a neighborhood of a three-dimensional measurement point from three-dimensional measurement data in an obtained industrial scene through a U-shaped feature coding module to form a first-level feature expression;
s2, capturing context information of object levels in the industrial scene through a Hough voting module and a vote aggregation module to form a second-level feature expression;
S3, splicing the first-level feature expression and the corresponding second-level feature expression of each three-dimensional measurement point by adopting a semantic prediction module, and predicting semantic information of each three-dimensional measurement point;
s4, determining central regression loss and three-dimensional semantic segmentation loss of object objects in the industrial scene, training on an industrial scene data set, and updating parameters of all the modules to form a three-dimensional semantic segmentation system oriented to the industrial digital twin scene;
s5, taking three-dimensional measurement data of the industrial digital twin scene as input, and outputting high-quality three-dimensional semantic information of the scene by the three-dimensional semantic segmentation system.
2. The digital twin scene semantic segmentation method according to claim 1, wherein in step S1, the specific process comprises the steps of:
s11, preprocessing the data, and obtaining three-dimensional measurement data Voxelization to convert unordered spatial points into ordered three-dimensional voxels, wherein each three-dimensional measurement point/>From its coordinates/>N represents the number of spatial points;
s12, taking ordered three-dimensional voxels as input, obtaining a first-level feature vector of each space point based on the mapping relation between the space points and the three-dimensional voxels through a U-shaped feature coding module, and forming a first-level feature set 。
3. The digital twin scene semantic segmentation method according to claim 1, wherein in step S2, the specific process comprises the steps of:
S21, three-dimensional measurement data And corresponding constitution first-level feature set/>As input, the hough voting module outputs each three-dimensional measurement point/>European spatial offset/>And feature offset/>And generates vote information/>, for that point;
S22, based on each three-dimensional measuring pointVote information/>Constitute vote set/>;
S23, utilizing the furthest point sampling method to collect the spatial position of the ballotSampling M vote positions to obtain/>;
S24, for eachClustering/>, by finding votes adjacent to it in European space;
S25, clustering formed by processing three full-connection layers in the vote aggregation moduleTo obtain a cluster feature set。
4. The digital twin scene semantic segmentation method according to claim 1, characterized in that in step S3, it specifically comprises: for each three-dimensional measurement pointFor the first-level feature expression/>, which is output by the U-shaped feature coding moduleSecond level feature expression/>, of the cluster feature with which it is locatedAnd splicing and sending the spliced features into a semantic prediction module to predict the three-dimensional semantic category corresponding to the spatial point.
5. The digital twin scene semantic segmentation method according to claim 1, wherein in step S4, the specific process comprises the steps of:
S41, determining the central regression loss of the object in the industrial scene ;
S42, determining three-dimensional semantic segmentation loss of object objects in industrial scene;
S43, according to center regression lossAnd three-dimensional semantic segmentation loss/>The joint loss function L is calculated, and the calculation formula of the joint loss function L is as follows:
;
In the above-mentioned method, the step of, Is a hyper-parameter used to balance two different loss function terms;
S44, training on the industrial scene data set based on the joint loss function L;
s45, updating parameters of a feature encoding module, a Hough voting module, a vote aggregation module and a semantic prediction module by adopting a random gradient descent optimization algorithm.
6. The digital twin scene semantic segmentation method according to claim 5, wherein in step S41, the expression of the central regression loss is:
;
In the above formula, N represents the number of spatial points; Is an indication function indicating spatial point/> Whether belonging to a semantic category included in the scene; /(I)Is a space point/>From the initial position/>European spatial deviation truth value to the center of the object to which it belongs; is the hough voting module pair space point/> Predicted European spatial offset.
7. The method of claim 5, wherein in step S42, three-dimensional semantic segmentation is lostThe expression of (2) is:
;
in the above formula, K represents the number of semantic categories in the dataset; is a sign function, i.e. when the spatial point/> When the true class of (1) equals j,/>In other cases,/>;/>Is the pair of space points/>, of the semantic prediction moduleA predictive probability of belonging to category j; /(I)Is the weight of the j-th category in the dataset, as determined by the dataset.
8. The digital twin scene semantic segmentation method according to claim 1, characterized in that in step S5, it is specifically: inputting three-dimensional measurement data of the industrial digital twin scene into a three-dimensional semantic segmentation system, and extracting a first-level feature expression of each three-dimensional measurement point by a U-shaped feature coding module in the three-dimensional semantic segmentation system;
Based on the above, the hough voting module and the ballot aggregation module in the system extract the second level feature expression; and splicing the two levels of feature expressions, and transmitting the feature expressions to a semantic prediction module to acquire the semantic category of each three-dimensional measurement point, so that the high-quality three-dimensional semantic information of the whole scene can be obtained.
9. A system for implementing the digital twin scene semantic segmentation method according to any of claims 1-8, characterized in that the system is integrated by several modules to form a three-dimensional semantic segmentation framework for industrial digital twin scenes, the modules integrated by the system comprising:
The U-shaped feature coding module is built based on sub-manifold sparse convolution and is used for extracting context information of different scales of a three-dimensional measurement point neighborhood from three-dimensional measurement data in an industrial scene to form a first-level feature expression;
The Hough voting module and the vote aggregation module are formed based on a Hough voting mechanism and are used for capturing context information of object levels in a scene to form a second-level feature expression;
The three-dimensional measurement point semantic prediction module is used for splicing the first-level feature expression and the second-level feature expression of each three-dimensional measurement point and predicting semantic information of each three-dimensional measurement point.
10. The system of claim 9, wherein the U-shaped feature encoding module is a U-shaped architecture built based on a combination of a conventional sparse convolution layer, a conventional sparse inverse convolution layer, and a sub-manifold sparse convolution block;
The Hough voting module is based on a multi-layer perceptron and consists of a full connection layer FC, an activation function ReLU and batch normalization BN;
the vote aggregation module consists of a full connection layer FC, an activation function ReLU, batch normalization BN and maximum pooling MaxPooling;
the semantic prediction module consists of two fully connected layers FC.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410318275.3A CN117911662B (en) | 2024-03-20 | 2024-03-20 | Digital twin scene semantic segmentation method and system based on depth hough voting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410318275.3A CN117911662B (en) | 2024-03-20 | 2024-03-20 | Digital twin scene semantic segmentation method and system based on depth hough voting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117911662A CN117911662A (en) | 2024-04-19 |
CN117911662B true CN117911662B (en) | 2024-05-14 |
Family
ID=90686284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410318275.3A Active CN117911662B (en) | 2024-03-20 | 2024-03-20 | Digital twin scene semantic segmentation method and system based on depth hough voting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117911662B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563446A (en) * | 2020-04-30 | 2020-08-21 | 郑州轻工业大学 | Human-machine interaction safety early warning and control method based on digital twin |
CN111964575A (en) * | 2020-07-06 | 2020-11-20 | 北京卫星制造厂有限公司 | Digital twin modeling method for milling of mobile robot |
CN115512040A (en) * | 2022-08-26 | 2022-12-23 | 中国人民解放军军事科学院国防工程研究院 | Digital twinning-oriented three-dimensional indoor scene rapid high-precision reconstruction method and system |
WO2023166175A1 (en) * | 2022-03-03 | 2023-09-07 | Samp | Method for generating a digital twin of a facility |
-
2024
- 2024-03-20 CN CN202410318275.3A patent/CN117911662B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563446A (en) * | 2020-04-30 | 2020-08-21 | 郑州轻工业大学 | Human-machine interaction safety early warning and control method based on digital twin |
CN111964575A (en) * | 2020-07-06 | 2020-11-20 | 北京卫星制造厂有限公司 | Digital twin modeling method for milling of mobile robot |
WO2022007753A1 (en) * | 2020-07-06 | 2022-01-13 | 北京卫星制造厂有限公司 | Digital twin modeling method oriented to mobile robot milling processing |
WO2023166175A1 (en) * | 2022-03-03 | 2023-09-07 | Samp | Method for generating a digital twin of a facility |
CN115512040A (en) * | 2022-08-26 | 2022-12-23 | 中国人民解放军军事科学院国防工程研究院 | Digital twinning-oriented three-dimensional indoor scene rapid high-precision reconstruction method and system |
Non-Patent Citations (1)
Title |
---|
基于数字孪生的生产线三维检测与交互算法研究;陈末然;邓昌义;张健;郭锐锋;;小型微型计算机***;20200515(05);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117911662A (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110703057B (en) | Power equipment partial discharge diagnosis method based on data enhancement and neural network | |
Chen et al. | Pointgpt: Auto-regressively generative pre-training from point clouds | |
CN112085072B (en) | Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information | |
CN109902714B (en) | Multi-modal medical image retrieval method based on multi-graph regularization depth hashing | |
CN114580263A (en) | Knowledge graph-based information system fault prediction method and related equipment | |
CN112187554B (en) | Operation and maintenance system fault positioning method and system based on Monte Carlo tree search | |
CN114528755A (en) | Power equipment fault detection model based on attention mechanism combined with GRU | |
Zhang et al. | A deep learning-based approach for machining process route generation | |
CN113870160A (en) | Point cloud data processing method based on converter neural network | |
CN116049450A (en) | Multi-mode-supported image-text retrieval method and device based on distance clustering | |
CN114972794A (en) | Three-dimensional object recognition method based on multi-view Pooll transducer | |
Friedrich et al. | A Hybrid Approach for Segmenting and Fitting Solid Primitives to 3D Point Clouds. | |
CN117495422A (en) | Cost management system and method based on power communication network construction | |
CN117154256A (en) | Electrochemical repair method for lithium battery | |
CN117911662B (en) | Digital twin scene semantic segmentation method and system based on depth hough voting | |
CN116662307A (en) | Intelligent early warning method, system and equipment based on multi-source data fusion | |
CN116826734A (en) | Photovoltaic power generation power prediction method and device based on multi-input model | |
CN114494284B (en) | Scene analysis model and method based on explicit supervision area relation | |
CN106816871B (en) | State similarity analysis method for power system | |
CN117011219A (en) | Method, apparatus, device, storage medium and program product for detecting quality of article | |
CN115204318A (en) | Event automatic hierarchical classification method and electronic equipment | |
CN115424012A (en) | Lightweight image semantic segmentation method based on context information | |
CN110163091B (en) | Three-dimensional model retrieval method based on LSTM network multi-mode information fusion | |
CN117421386B (en) | GIS-based spatial data processing method and system | |
Wang et al. | Keyframe image processing of semantic 3D point clouds based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |