CN113808253B

CN113808253B - Method, system, equipment and medium for processing dynamic object of three-dimensional reconstruction of scene

Info

Publication number: CN113808253B
Application number: CN202111010819.2A
Authority: CN
Inventors: 熊彪; 李安琪; 武睿祺; 纪元; 王汝婷; 吴俊豪
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2023-08-15
Anticipated expiration: 2041-08-31
Also published as: CN113808253A

Abstract

The application relates to a method, a system and a device medium for processing a dynamic object for three-dimensional reconstruction of a scene, wherein the method comprises the following steps: acquiring shooting data of a target shooting scene, wherein the shooting data comprises a color image set and a depth image set; reconstructing a target three-dimensional model of a target shooting scene based on the color image set and the depth image set; and respectively projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model. The method and the device can effectively eliminate the influence of the dynamic object and improve the accuracy of the image model.

Description

Method, system, equipment and medium for processing dynamic object of three-dimensional reconstruction of scene

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, a system, an apparatus, and a medium for processing a dynamic object for three-dimensional reconstruction of a scene.

Background

At present, along with the development of modern city construction, more and more large-scale buildings rise, and the three-dimensional reconstruction technology is widely applied to scene modeling of buildings. In the existing three-dimensional reconstruction process, dynamic objects such as tourists often exist, which can influence the three-dimensional reconstruction effect, and the situation that no dynamic object exists in a target scene in each three-dimensional reconstruction process cannot be ensured. Accordingly, the inventors believe that further research is needed for the problem of dynamic objects in the scene modeling process.

Disclosure of Invention

In view of the above, the present application provides a method, a system, a device and a medium for processing a dynamic object for three-dimensional reconstruction of a scene, which are used for solving the technical problem of how to eliminate the dynamic object in the existing three-dimensional reconstruction process.

In order to solve the above problems, in a first aspect, the present application provides a method for processing a dynamic object for three-dimensional reconstruction of a scene, the method comprising:

acquiring shooting data of a target shooting scene, wherein the shooting data comprises a color image set and a depth image set;

reconstructing a target three-dimensional model of a target shooting scene based on the color image set and the depth image set;

acquiring a query frame image of a target shooting scene;

and respectively projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

Optionally, the color image set includes RGB images of each frame, and the depth image set includes depth images of each frame; reconstructing a target three-dimensional model of the target shooting scene based on the color image set and the depth image set, including:

converting all pixel coordinates in the depth image of each frame from a camera coordinate system to a scene coordinate system to obtain point cloud data of each frame, and determining a multi-resolution depth image pyramid according to the point cloud data of each frame;

aiming at the point cloud data of the current frame and the point cloud data of the adjacent previous frame, constructing a global space model according to the point cloud data of the adjacent previous frame and the corresponding RGB image by utilizing a truncated symbol distance function;

determining the prediction data of the adjacent previous frame according to the global space model by utilizing a ray casting method;

determining the initial camera pose of the current frame by utilizing an iterative nearest point algorithm according to the point cloud data of the current frame and the prediction data of the adjacent previous frame;

according to the initial camera pose of the current frame, calculating the target camera pose of the current frame step by utilizing a multi-resolution depth map pyramid;

based on a frame-model fusion mode and a target camera pose of a current frame, a depth image of the current frame is fused into the global space model to update the global space model;

and determining the target three-dimensional model according to the updated global space model by utilizing a ray casting method.

Optionally, the acquiring the query frame image of the target shooting scene includes:

query frame images of a target shooting scene are randomly acquired from a preset query set, wherein the query set represents a set of single frame images under a camera coordinate system.

Optionally, projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution respectively, and comparing the projection results to identify a target dynamic point set, which includes:

projecting the query frame image and the subgraph corresponding to the query frame image in the target three-dimensional model according to a preset resolution respectively to obtain a depth image of the query frame image and a depth image of the subgraph corresponding to the query frame image in the target three-dimensional model;

determining a pixel difference matrix by utilizing pixel subtraction according to the depth image of the query frame image and the depth image of the sub-image at the corresponding position in the target three-dimensional model;

judging whether single pixel differences in the pixel difference matrix exceed a preset threshold value according to the pixel difference matrix, and if so, taking a set of pixel points corresponding to the pixel differences exceeding the preset threshold value as an initial dynamic point set;

and carrying out iterative projection processing on the query frame image and the subgraphs of the target three-dimensional model, which correspond to the query frame image, so as to optimize the initial dynamic point set and obtain a target dynamic point set.

Optionally, performing iterative projection processing on the query frame image and a sub-image of the target three-dimensional model corresponding to the query frame image, so as to optimize the initial dynamic point set, and obtain a target dynamic point set, where the iterative projection processing includes:

and carrying out iterative projection on the query frame image and a subgraph of the target three-dimensional model at a position corresponding to the query frame image according to a mode that the preset resolution is gradually reduced so as to optimize the initial dynamic point set and obtain a target dynamic point set.

Optionally, the cost function of the iterative closest point algorithm adopts a point-to-plane distance measurement mode; the prediction data comprise prediction point cloud data, normal vectors of each point of the prediction point cloud and camera pose when the previous frame is adjacent; according to the point cloud data of the current frame and the prediction data of the adjacent previous frame, and by utilizing an iterative nearest point algorithm, determining the initial camera pose of the current frame comprises the following steps:

according to the minimum principle of a cost function, calculating an optimal transformation matrix between the point cloud of the current frame and the point cloud of the adjacent previous frame and taking the optimal transformation matrix as the camera pose of the current frame, wherein the cost function is specifically as follows:

wherein ,T_g，k Is a transformation matrix between the point cloud of the current frame and the point cloud of the adjacent previous frame, E (T _g，k ) As a cost function, Ω _k(u)≠null Indicating that the pixel point u is present,the coordinates of the space points corresponding to the pixels u on the current k frame depth image;for the spatial coordinates of the predicted point corresponding to pixel u on the k-1 frame depth image, +.>The upper right label T represents the transpose of the matrix for the normal vector of the predicted point corresponding to pixel u on the k-1 frame depth image.

Optionally, the information of each voxel in the constructed global space model includes a truncated symbol distance value and RGB information.

In a second aspect, the present application provides a dynamic object processing system for three-dimensional reconstruction of a scene, the system comprising:

the first acquisition module is used for acquiring shooting data of a target shooting scene, wherein the shooting data comprises a color image set and a depth image set;

the reconstruction module is used for reconstructing a target three-dimensional model of a target shooting scene based on the color image set and the depth image set;

the second acquisition module is used for acquiring a query frame image of the target shooting scene;

and the dynamic processing module is used for respectively projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

In a third aspect, the present application provides a computer device, which adopts the following technical scheme:

a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program to perform the steps of a dynamic object processing method for three-dimensional reconstruction of a scene.

In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:

a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of a dynamic object handling method for three-dimensional reconstruction of a scene.

The beneficial effects of adopting the embodiment are as follows: reconstructing a target three-dimensional model of the target shooting scene according to the color image set and the depth image set shot by the target shooting scene, so that dynamic and static objects of the target shooting scene can be identified comprehensively; and respectively projecting and comparing the query frame image of the target shooting scene and the corresponding subgraphs in the target three-dimensional model, so that the target dynamic point set can be conveniently identified, the target dynamic point set is deleted from the target three-dimensional model, and the finally obtained target three-dimensional model is a static scene of the target shooting scene, thereby effectively eliminating the influence of dynamic objects and improving the accuracy of the image model.

Drawings

FIG. 1 is a schematic view of an application scenario of a dynamic object processing system for three-dimensional reconstruction of a scenario provided by the present application;

FIG. 2 is a flowchart illustrating a method for processing a dynamic object for three-dimensional reconstruction of a scene according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method according to an embodiment of step S202;

FIG. 4 is a schematic diagram of a cost function using point-to-plane distance measurement according to the present application;

FIG. 5 is a schematic diagram of the depth image of the current frame blended into a global space model according to the present application;

FIG. 6 is a schematic diagram of a surface solution of a three-dimensional model of interest provided by the present application;

FIG. 7 is a flowchart of a method according to an embodiment of step S204 of the present application

FIG. 8 is a schematic block diagram of an embodiment of a dynamic object handling system for three-dimensional reconstruction of a scene provided by the present application;

fig. 9 is a schematic block diagram of an embodiment of a computer device provided in the present application.

Detailed Description

The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.

In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The application provides a method, a system, equipment and a medium for processing a dynamic object for three-dimensional reconstruction of a scene, which are respectively described in detail below.

Fig. 1 is a schematic view of a scene of a dynamic object processing system for three-dimensional reconstruction of a scene according to an embodiment of the present application, where the system may include a server 100, and the server 100 is integrated with the dynamic object processing system for three-dimensional reconstruction of a scene, such as the server in fig. 1.

The server 100 in the embodiment of the present application is mainly used for:

reconstructing a target three-dimensional model of the target shooting scene based on the color image set and the depth image set;

acquiring a query frame image of a target shooting scene;

and respectively projecting the query frame image and the subgraphs of the target three-dimensional model at positions corresponding to the query frame image according to the prediction resolution, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

In the embodiment of the present application, the server 100 may be an independent server, or may be a server network or a server cluster formed by servers, for example, the server 100 described in the embodiment of the present application includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing).

It will be appreciated that the terminal 200 used in embodiments of the present application may be a device that includes both receive and transmit hardware, i.e., a device having receive and transmit hardware capable of performing bi-directional communications over a bi-directional communication link. Such a device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display. The specific terminal 200 may be a desktop computer, a portable computer, a web server, a palm computer (Personal DigitalAssistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, etc., and the embodiment is not limited to the type of the terminal 200.

It will be appreciated by those skilled in the art that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may further include more or fewer terminals than those shown in fig. 1, for example, only 2 terminals are shown in fig. 1, and it will be appreciated that the dynamic object processing system for three-dimensional reconstruction of a scenario may further include one or more other terminals, which is not limited herein.

In addition, referring to fig. 1, the dynamic object processing system for three-dimensional reconstruction of a scene may further include a memory 200 for storing data, such as photographed data, inquiry frame images, and the like.

It should be noted that, the scene schematic diagram of the scene three-dimensional reconstruction dynamic object processing system shown in fig. 1 is only an example, and the scene three-dimensional reconstruction dynamic object processing system and the scene described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application, and as a person of ordinary skill in the art can know that, with the evolution of the scene three-dimensional reconstruction dynamic object processing system and the appearance of a new service scene, the technical solution provided by the embodiments of the present application is equally applicable to similar technical problems.

Referring to fig. 2, a method flowchart of an embodiment of a method for processing a dynamic object for three-dimensional reconstruction of a scene provided by the present application includes the following steps:

s201, acquiring shooting data of a target shooting scene, wherein the shooting data comprises a color image set and a depth image set;

s202, reconstructing a target three-dimensional model of a target shooting scene based on a color image set and a depth image set;

s203, acquiring a query frame image of a target shooting scene;

s204, projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, comparing the projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

In the embodiment, a target three-dimensional model of the target shooting scene is reconstructed according to a color image set and a depth image set shot by the target shooting scene, so that dynamic and static objects of the target shooting scene can be identified comprehensively; and respectively projecting and comparing the query frame image of the target shooting scene and the corresponding subgraphs in the target three-dimensional model, so that the target dynamic point set can be conveniently identified, the target dynamic point set is deleted from the target three-dimensional model, and the finally obtained target three-dimensional model is a static scene of the target shooting scene, thereby effectively eliminating the influence of dynamic objects and improving the accuracy of the image model.

In this embodiment, the target shooting scene may be an internal scene of a building, such as an indoor building scene of a large museum, library, etc., and in other embodiments, the target shooting scene may also be an external scene of the building. An image sensor is arranged in a target shooting scene, and specifically, a kinect camera capable of rotating shooting in real time can be arranged. The photographing data includes a color image set including RGB images of each frame and a depth image set including depth images of each frame.

In an embodiment, referring to fig. 3, a flowchart of a method according to an embodiment of step S202 provided by the present application is shown, where the step S202 is to reconstruct a target three-dimensional model of a target shooting scene based on a color image set and a depth image set, and includes the following steps:

s301, converting all pixel coordinates in a depth image of each frame from a camera coordinate system to a scene coordinate system, obtaining point cloud data of each frame, and determining a multi-resolution depth map pyramid according to the point cloud data of each frame;

in this embodiment, the camera coordinate system is a coordinate system established by cameras laid on the target scene, and the scene coordinate system is a coordinate system established by the target shooting scene; the point cloud data of each frame includes three-dimensional coordinates of the pixel point in the scene coordinate system and a corresponding normal vector.

Specifically, the depth image K at the K time can be obtained by a kinect camera _k Obtaining three-dimensional coordinates of each pixel point according to the internal parameter K back projection of the camera:

wherein u is a pixel point, and is expressed as u= (x, y) in two-dimensional coordinates ^T ，The coordinates after the time spent are expressed,the upper right label T indicates the transpose of the matrix.

Further, the adjacent pixel points (x+1, y) and (x, y+1) of the pixel u are respectively pointed to (x, y), two vectors are obtained, and then cross multiplication is carried out, and the obtained new vector is the point V in the space corresponding to the pixel point u _k Normal vector of (u):

N _k ＝(u)＝v[V _k (x+1，y)-V _k (x，y))×(V _k (x，y+1)-V _k (x，y))]

wherein v [. Cndot. ] represents the normal vector obtained by cross-multiplying the two vectors.

Further, the coordinates and normal vectors of the space points corresponding to all pixels in the depth image of each frame under the scene coordinate system are calculated in sequence.

Further, for the depth image of each frame, a depth image pyramid can be constructed in a Gaussian filtering and downsampling mode, and the depth image pyramid is an image pyramid with three layers of resolutions, and the three layers of resolutions are from low to high.

S302, aiming at the point cloud data of the current frame and the point cloud data of the adjacent previous frame, constructing a global space model according to the point cloud data of the adjacent previous frame and the corresponding RGB image by utilizing a truncated symbol distance function;

specifically, point cloud data of an immediately preceding frame is converted into voxels of the global spatial model using a truncated symbol distance function (truncated signed distance function, TSDF) and the global spatial model is updated with the corresponding RGB image.

It should be noted that, the global space model is formed by lxwxh voxels, the global space model constructed by the TSDF algorithm is equal to the target three-dimensional model to be created in scale, in this embodiment, the information of each voxel includes a truncated symbol distance value of the corresponding voxel and an RGB value, where the truncated symbol distance value of the voxel is used to determine the surface of the target three-dimensional model to be created, and the RGB value is used to assign a color texture to the surface of the target three-dimensional model.

S303, determining prediction data of the adjacent previous frame according to the global space model by utilizing a ray casting method;

in this embodiment, the global space module is obtained by converting point cloud data of an adjacent previous frame, so according to the global space model, the prediction data of the global space model is obtained by calculating by using an optical fiber projection method and combining an initial point cloud registration matrix, that is, the prediction data of the adjacent previous frame to the current frame; in this embodiment, the initial point Yun Peizhun matrix is an identity matrix. In addition, it should be noted that the prediction data of the adjacent previous frame includes prediction point cloud data, a normal vector of each point of the prediction point cloud, and a camera pose when the adjacent previous frame is adjacent, where the prediction point cloud data refers to the coordinates of each point in the predicted space.

S304, determining the initial camera pose of the current frame by utilizing an iterative nearest point algorithm according to the point cloud data of the current frame and the prediction data of the adjacent previous frame;

in one embodiment, the cost function of the iterative closest point algorithm uses a point-to-plane (Point-to-plane) distance metric, as shown in FIG. 4.

Specifically, according to the minimum principle of the cost function, calculating an optimal transformation matrix between the point cloud of the current frame and the point cloud of the adjacent previous frame and taking the optimal transformation matrix as the camera pose of the current frame, wherein the cost function is specifically as follows:

It should be noted that, in this embodiment, the initial camera pose of the current frame is determined according to the point cloud data of the current frame and the prediction data of the adjacent previous frame, where a frame-to-model fusion mode is adopted, so that the initial camera pose of the current frame is conveniently and accurately solved.

S305, calculating the target camera pose of the current frame step by step according to the initial camera pose of the current frame by utilizing a multi-resolution depth map pyramid;

in this embodiment, according to the initial camera pose of the current frame, and using the multi-resolution depth map pyramid, the target camera pose of the current frame is obtained by step-by-step calculation from low resolution to high resolution, thereby improving the accuracy of the target camera pose of the current frame.

S306, based on a frame and model fusion mode and a target camera pose of the current frame, the depth image of the current frame is fused into the global space model to update the global space model;

in this embodiment, according to the pose of the target camera of the current frame, the depth image of the current frame is merged into the global space model, specifically, as shown in fig. 5, each voxel in the global space model is projected onto the camera according to the pose of the target camera of the current frame, if the projection result appears in the view cone of the camera, there is a pixel corresponding to the pixel, and then the TSDF value=the depth value d of the corresponding pixel of the corresponding voxel _s -distance value d of corresponding voxel to camera coordinate origin _v The method comprises the steps of carrying out a first treatment on the surface of the And comparing the TSDF value of each voxel with a preset threshold value, setting the voxels larger than the threshold value as invalid voxels, and carrying out normalized truncation on the voxels smaller than or equal to the threshold value, and further updating the TSDF value of each voxel in a weighted average mode. The TSDF value of a voxel represents the truncated symbol distance value of the voxel.

S307, determining a target three-dimensional model according to the updated global space model by utilizing a ray casting method.

Specifically, as shown in fig. 6, according to the updated global space model, a ray casting method is adopted to simulate the pose of the target camera of the current frame, rays are cast out according to the internal parameters K of the camera by taking the pose as a starting point, the rays pass through all voxels of the global space model, the voxel position with the TSDF value of 0 is screened out to be the surface of the target three-dimensional model, and further, a linear interpolation method is utilized to optimize the surface of the target three-dimensional model.

In one embodiment, step S203, that is, acquiring a query frame image of the target shooting scene, includes:

In this embodiment, regarding the query set, a local camera coordinate system may be marked as Q, and a target three-dimensional model coordinate system as M; using P ^Q Representing single query frame data, P, in camera coordinate system ^M Representing a map point set under a coordinate system of the target three-dimensional model; usingRepresents the kth query frame data in camera coordinates, < >>Representing a subgraph under the coordinates of the target three-dimensional model corresponding to the kth query frame; all query frames constitute a query set +.>It should be noted that the query frame data includes the pixel coordinates of the query frame in the camera coordinate system.

In an embodiment, referring to fig. 7, which is a flowchart of a method of an embodiment of step S204 provided by the present application, in step S204, sub-graphs of a query frame image and a position corresponding to the query frame image in a target three-dimensional model are projected according to a prediction resolution, and projection results are compared to identify a target dynamic point set, including the following steps:

s701, projecting the query frame image and a sub-image corresponding to the query frame image in the target three-dimensional model according to a preset resolution respectively to obtain a depth image of the query frame image and a depth image of the sub-image corresponding to the query frame image in the target three-dimensional model;

specifically, the subgraph corresponding to the query frame image in the target three-dimensional modelProjecting into a depth map with the same resolution as the query frame image, and recording the projected depth image as +.>Wherein m and n are determined by a preset resolution and a horizontal and vertical field angle range, (i, j) represents pixel coordinates; the specific projection process is as follows:

wherein ,is depth image +.>R (·) represents the local coordinate range of a point p under the query frame image, +.>Is->Specifically a set of points in pixel (i, j) having spherical coordinates (i.e. elevation and azimuth). Through the projection process of the embodiment, the depth map +_of the sub-graph in the target three-dimensional model corresponding to the kth query frame is obtained>And Point cloud->Similarly, query depth image of frame +.>And corresponding point cloud->The method can also be used for the above-mentioned determination.

S702, determining a pixel difference matrix by utilizing pixel subtraction according to the depth image of the query frame image and the depth image of the sub-image at the corresponding position in the target three-dimensional model;

specifically, the pixel difference matrix is written asThe method comprises the following steps:

s703, judging whether single pixel difference in the pixel difference matrix exceeds a preset threshold value according to the pixel difference matrix, and if so, taking a set of pixel points corresponding to the pixel difference exceeding the preset threshold value as an initial dynamic point set;

specifically, the initial set of dynamic points is noted asThe method comprises the following steps:

wherein τ_D In this embodiment, the preset threshold may be determined according to actual situations.

Optionally, if the single pixel difference in the pixel difference matrix does not exceed the preset threshold, taking the set of pixel points corresponding to the pixel difference which does not exceed the preset threshold as the initial static point set;

the initial dynamic point set and the initial static point set are mutually exclusive subsets, the former is the complement of the latter, and the initial static point set is recorded asThe expression is as follows:

s704, carrying out iterative projection processing on the query frame image and the subgraph of the target three-dimensional model at the position corresponding to the query frame image so as to optimize the initial dynamic point set and obtain the target dynamic point set.

In an embodiment, according to a mode that the preset resolution is gradually reduced, iterative projection is performed on the query frame image and a sub-image corresponding to the query frame image in the target three-dimensional model, so as to optimize an initial dynamic point set and obtain a target dynamic point set.

Specifically, by gradually reducing the resolution of projection to perform iterative processing, dynamic points in the initial static point set are gradually reduced, and static points in the initial dynamic point set are restored, so that the effect of a dynamic object in the target three-dimensional model is better eliminated. In particular embodiments, the lowest value for progressively decreasing the resolution of the projection map may be determined on a case-by-case basis.

Compared with the prior art, the method and the device are different from the prior art, and the three-dimensional target model of the target shooting scene is reconstructed according to the color image set and the depth image set shot by the target shooting scene, so that dynamic and static objects of the target shooting scene can be identified comprehensively; and carrying out projection comparison according to the query frame image of the target shooting scene and the corresponding subgraph of the target three-dimensional model, so that a target dynamic point set is conveniently identified, the target dynamic point set is deleted from the target three-dimensional model, and the finally obtained target three-dimensional model is a static scene of the target shooting scene, thereby effectively eliminating the influence of a dynamic object and improving the accuracy of the image model.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

The embodiment also provides a system for processing the dynamic object of the three-dimensional reconstruction of the scene, and the system for processing the dynamic object of the three-dimensional reconstruction of the scene corresponds to the method for processing the dynamic object of the three-dimensional reconstruction of the scene in the embodiment one by one. As shown in fig. 8, the dynamic object processing system for three-dimensional reconstruction of a scene includes a first acquisition module 801, a reconstruction module 802, a second acquisition module 803, and a dynamic processing module 804. The functional modules are described in detail as follows:

a first obtaining module 801, configured to obtain shooting data of a target shooting scene, where the shooting data includes a color image set and a depth image set;

a reconstruction module 802, configured to reconstruct a target three-dimensional model of a target shooting scene based on the color image set and the depth image set;

a second obtaining module 803, configured to obtain a query frame image of the target shooting scene;

the dynamic processing module 804 is configured to project the query frame image and the sub-image corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, compare the projection results to identify a target dynamic point set, and delete the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

For specific limitation of each module of the dynamic object processing system for three-dimensional reconstruction of a scene, reference may be made to the limitation of the dynamic object processing method for three-dimensional reconstruction of a scene hereinabove, and details thereof are not repeated herein. The modules in the dynamic object processing system for three-dimensional reconstruction of a scene can be implemented in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Referring to fig. 9, the present embodiment further provides a computer device, which may be a mobile terminal, a desktop computer, a notebook computer, a palm computer, a server, or other computing devices. The computer device includes a processor 10, a memory 20, and a display 30. Fig. 7 shows only some of the components of the computer device, but it is understood that not all of the illustrated components are required to be implemented, and more or fewer components may alternatively be implemented.

The memory 20 may in some embodiments be an internal storage unit of a computer device, such as a hard disk or memory of a computer device. The memory 20 may also be an external storage device of the computer device in other embodiments, such as a plug-in hard disk provided on the computer device, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. Further, the memory 20 may also include both internal storage units and external storage devices of the computer device. The memory 20 is used for storing application software installed on the computer device and various types of data, such as program codes for installing the computer device. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 has a computer program 40 stored thereon.

The processor 10 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 20, such as a dynamic object processing method for performing three-dimensional reconstruction of a scene, etc.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 30 is for displaying information at the computer device and for displaying a visual user interface. The components 10-30 of the computer device communicate with each other via a system bus.

In one embodiment, the following steps are implemented when the processor 10 executes the computer program 40 in the memory 20:

acquiring a query frame image of a target shooting scene;

and carrying out projection comparison on the query frame image and the subgraph of the corresponding position in the target three-dimensional model according to the preset projection resolution to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring a query frame image of a target shooting scene;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above.

Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.

Claims

1. A method for processing a dynamic object for three-dimensional reconstruction of a scene, the method comprising:

acquiring a query frame image of a target shooting scene;

projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution respectively, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model;

projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution respectively, and comparing projection results to identify a target dynamic point set, wherein the method comprises the following steps:

determining a pixel difference matrix by utilizing pixel subtraction according to the depth image of the query frame image and the depth image of the subgraph corresponding to the query frame image in the target three-dimensional model;

performing iterative projection processing on the query frame image and a subgraph of the target three-dimensional model at a position corresponding to the query frame image so as to optimize the initial dynamic point set and obtain a target dynamic point set;

performing iterative projection processing on the query frame image and a sub-image of the target three-dimensional model at a position corresponding to the query frame image to optimize the initial dynamic point set to obtain a target dynamic point set, wherein the method comprises the following steps:

2. The method according to claim 1, wherein the color image set includes RGB images of each frame, and the depth image set includes depth images of each frame; reconstructing a target three-dimensional model of the target shooting scene based on the color image set and the depth image set, including:

3. The method for processing a dynamic object for three-dimensional reconstruction of a scene as set forth in claim 1, wherein the acquiring a query frame image of a target shooting scene comprises:

4. The method for processing the dynamic object of the three-dimensional reconstruction of the scene according to claim 2, wherein the cost function of the iterative closest point algorithm adopts a point-to-plane distance measurement mode; the prediction data comprise prediction point cloud data, normal vectors of each point of the prediction point cloud and camera pose of the adjacent previous frame; according to the point cloud data of the current frame and the prediction data of the adjacent previous frame, and by utilizing an iterative nearest point algorithm, determining the initial camera pose of the current frame comprises the following steps:

wherein ,for the transformation matrix between the point cloud of the current frame and the point cloud of the neighboring previous frame +.>As a function of the cost,representing pixel dot +.>Exist (S)>Is the current firstkPixel point on frame depth image>Corresponding spatial coordinates;is the firstk-Pixel point on 1 frame depth image>Spatial coordinates of the corresponding predicted point, +.>Is the firstk-Pixel point on 1 frame depth image>The normal vector of the corresponding predicted point, the upper right label T, represents the transpose of the matrix.

5. The method according to claim 2, wherein the information of each voxel in the constructed global space model includes truncated symbol distance value and RGB information.

6. A dynamic object processing system for three-dimensional reconstruction of a scene, the system comprising:

the dynamic processing module is used for respectively projecting the query frame image and the subgraphs corresponding to the query frame image in the target three-dimensional model according to the prediction resolution, comparing projection results to identify a target dynamic point set, and deleting the target dynamic point set from the target three-dimensional model to update the target three-dimensional model;

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for processing a dynamic object for three-dimensional reconstruction of a scene according to any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of a method for processing dynamic objects for three-dimensional reconstruction of a scene according to any one of claims 1 to 5.