CN114359474A

CN114359474A - Three-dimensional reconstruction method and device, computer equipment and storage medium

Info

Publication number: CN114359474A
Application number: CN202111471006.3A
Authority: CN
Inventors: 池鹏可
Original assignee: Guangzhou Xaircraft Technology Co Ltd
Current assignee: Guangzhou Xaircraft Technology Co Ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-04-15

Abstract

A three-dimensional reconstruction method and apparatus, a computer device, and a storage medium solve the problem of long waiting time in a three-dimensional reconstruction process caused by off-line computation of dense three-dimensional point clouds. The method comprises the following steps: determining the estimated memory capacity required by the depth map fusion of the multiple frames of images based on the respective depth maps of the multiple frames of images; grouping all the images based on the estimated memory capacity, the obtained current memory free capacity, the camera external parameters of each of the multiple frames of images and the sparse point cloud; respectively carrying out depth map fusion on at least one group to obtain at least one dense point cloud; and fusing at least one dense point cloud to obtain the dense point cloud of the preset scene.

Description

Three-dimensional reconstruction method and device, computer equipment and storage medium

Technology neighborhood

The application relates to computer image processing, computer vision and remote sensing mapping technology neighborhoods, in particular to a three-dimensional reconstruction method and device, computer equipment and a storage medium.

Background

In general, when ground three-dimensional reconstruction is performed by using an unmanned aerial vehicle image, the unmanned aerial vehicle is required to shoot a complete scene by using a camera, and then off-line calculation of dense three-dimensional point cloud is performed on the basis of all images at a ground control end. The three-dimensional reconstruction mode occupies larger content capacity, and causes higher hardware cost.

Disclosure of Invention

In view of this, embodiments of the present application provide a three-dimensional reconstruction method and apparatus, a computer device, and a storage medium, which solve the problem in the prior art that a memory capacity occupied by a three-dimensional reconstruction process is large.

A first aspect of the present application provides a three-dimensional reconstruction method, including: determining the estimated memory capacity required by the depth map fusion of the multiple frames of images based on the respective depth maps of the multiple frames of images; grouping the multi-frame images based on the estimated memory capacity, the obtained current memory free capacity, the respective camera external parameters of the multi-frame images and the sparse point cloud to obtain at least one group; respectively carrying out depth map fusion on at least one group to obtain at least one dense point cloud; and fusing at least one dense point cloud to obtain the dense point cloud of the preset scene.

In one embodiment, determining the estimated memory capacity required for the depth map fusion of the multiple frames of images based on the respective depth maps of the multiple frames of images includes: and determining and estimating the memory capacity based on the number of the multi-frame images, the size and the coverage area of the single-frame images, and the down-sampling multiple and the sampling interval during depth map fusion.

In one embodiment, the clustering the multiple frames of images based on the estimated memory capacity, the obtained current memory free capacity, the obtained camera external parameters of the multiple frames of images, and the obtained sparse point cloud to obtain at least one group comprises: determining a maximum value of the number of images in the group based on the current free memory capacity and the estimated memory capacity; based on the maximum value of the number of images in the group, clustering and grouping are carried out on the multi-frame images by combining the camera external parameters of the multi-frame images and the sparse point cloud, so as to obtain at least one group.

In one embodiment, determining the maximum number of images in the group based on the current free memory capacity and the estimated memory capacity comprises: when the current free memory capacity is larger than the estimated memory capacity, determining the frame number of the multi-frame images as the maximum value of the number of the images in the group; and when the current free memory capacity is smaller than or equal to the estimated memory capacity, determining the maximum value of the number of the images in the group based on the current free memory capacity, the estimated memory capacity and the number of the multi-frame images.

In one embodiment, before determining, based on the respective depth maps of the multiple frames of images, an estimated memory capacity required for depth map fusion of the multiple frames of images, the method further includes: determining a plurality of neighborhood frame images in a plurality of frame images based on GPS information carried by the plurality of frame images acquired by the unmanned aerial vehicle, wherein the plurality of frame images are partial continuous frame images in a plurality of frame images; determining respective depth maps of the plurality of neighborhood frame images.

In one embodiment, determining a plurality of neighborhood frame images in the plurality of frame images based on GPS information carried by each of the plurality of frame images acquired by the drone comprises: selecting any frame image in the plurality of frame images as a target frame image; determining a circular area which takes the GPS information of the target frame image as an origin and takes the preset length as a radius; and determining the multi-frame image corresponding to the GPS information in the circular area as a plurality of neighborhood frame images.

In one embodiment, before determining the depth map of each of the plurality of neighborhood frame images, the method further includes: and respectively extracting the characteristic points of the plurality of frame images to obtain respective key points and descriptors of the plurality of frame images. Determining the respective depth maps of the plurality of neighborhood frame images comprises: performing feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the plurality of neighborhood frame images; determining sparse point clouds corresponding to the plurality of neighborhood frame images and respective camera internal parameters and camera external parameters of the plurality of neighborhood frame images by adopting a local beam adjustment algorithm based on the characteristic point matching relationship and GPS information carried by the plurality of neighborhood frame images; determining respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the camera intrinsic parameters and the camera extrinsic parameters.

In one embodiment, determining the depth map of each of the plurality of neighborhood frame images based on the sparse point cloud, the intra-camera parameters, and the extra-camera parameters specifically includes: determining a first frame image and a second frame image which are adjacent in time sequence based on time sequence information carried by multiple frames of neighborhood images; correcting the first frame image and the second frame image based on the camera external parameters by adopting a polar geometry algorithm so as to enable the corrected first frame image and second frame image to meet binocular stereo matching conditions; determining respective disparity maps based on the corrected first frame image and the second frame image; a depth map is determined based on the disparity map.

In one embodiment, the three-dimensional reconstruction method is implemented as a computer program stored on a storage medium, the computer program comprising a depth map determination module performing the step of determining a depth map for each of a plurality of neighborhood frame images, the depth map determination module comprising a feature extraction sub-module, a feature matching sub-module, a local method adjustment sub-module, a state detection sub-module, and a depth computation sub-module: the feature extraction submodule respectively extracts feature points of the plurality of frames of images to obtain respective key points and descriptors of the plurality of frames of images; the feature matching submodule performs feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the plurality of neighborhood frame images; the local adjustment sub-module determines sparse point clouds corresponding to the plurality of neighborhood frame images and respective camera internal parameters and camera external parameters of the plurality of neighborhood frame images based on the feature point matching relationship and GPS information carried by the plurality of neighborhood frame images; the state detection submodule determines the operation states of the feature extraction submodule, the feature matching submodule and the local adjustment submodule; determining an available module according to the module with the running state as the end; determining the number of depth calculator sub-modules according to the number of available modules; the depth measurement operator module determines respective depth maps of a plurality of neighborhood frame images based on the sparse point cloud, the camera internal parameters and the camera external parameters, and calculates the respective depth maps of the multi-frame neighborhood images.

A second aspect of the present application provides a three-dimensional reconstruction apparatus, including: the determining module is used for determining the estimated memory capacity required by the depth map fusion of the multi-frame images based on the respective depth maps of the multi-frame images, wherein the multi-frame images are acquired in the process of aerial photography of the preset scene by the unmanned aerial vehicle; the clustering module is used for clustering the multi-frame images based on the estimated memory capacity, the obtained current memory free capacity, the respective camera external parameters of the multi-frame images and the sparse point cloud to obtain at least one group; the fusion module is used for respectively carrying out depth map fusion on at least one group to obtain at least one dense point cloud; and fusing at least one dense point cloud to obtain the dense point cloud of the preset scene.

A third aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executed by the processor, wherein the processor implements the steps of the method for calculating dense point cloud based on an unmanned aerial vehicle image, provided in any of the embodiments, when executing the computer program.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for calculating dense point cloud based on unmanned aerial vehicle image provided in any of the above embodiments.

According to the three-dimensional reconstruction method and device, the computer equipment and the storage medium, all the frame images are grouped according to the current free memory capacity, the dense point clouds of all the groups are respectively calculated, namely only one group of dense point clouds is calculated at the same time, then the dense point clouds of all the groups are fused to obtain the dense point clouds of the whole scene, the full utilization of the memory capacity is realized, the requirement on the memory capacity is reduced, and the hardware cost is reduced.

Drawings

Fig. 1 shows an architectural schematic of a drone system.

Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of a three-dimensional reconstruction method according to a second embodiment of the present application.

Fig. 4 is a schematic diagram of an implementation process of a three-dimensional reconstruction method according to a third embodiment of the present application.

Fig. 5 is a block diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application.

Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without any creative effort belong to the protection scope of the present application.

Summary of the application

Most of the conventional aerial survey methods are ground three-dimensional reconstruction based on sfm (structure Form motion) technology at a ground control end, and specifically include: firstly, recovering camera external parameters and sparse three-dimensional point cloud of each image; secondly, calculating a depth map corresponding to each frame of image based on the camera external parameters of each image and the sparse three-dimensional point cloud; then, the depth map is fused to calculate a dense three-dimensional point cloud of the whole scene. The three-dimensional reconstruction method cannot calculate dense point cloud on line in real time, so that the waiting time of the three-dimensional reconstruction process is long, and the efficiency of map generation is affected.

In view of this, embodiments of the present application provide a three-dimensional reconstruction method and apparatus, a computer device, and a storage medium, which perform clustering calculation on all frame images according to a current free memory capacity to obtain dense point clouds of a whole scene, and then fuse the dense point clouds of all clusters to obtain the dense point clouds of the whole scene, thereby achieving full utilization of the memory capacity, reducing a requirement on the memory capacity, and reducing hardware cost.

Exemplary System

Fig. 1 shows an architectural schematic of a drone system. The unmanned aerial vehicle system can apply the three-dimensional reconstruction method or the three-dimensional reconstruction device provided by the embodiment of the application. As shown in fig. 1, the drone system 100 includes an onboard end 110, a ground control end 120, and a network 130.

The network 130 is used to provide a medium for communication links between the airborne terminal 110 and the ground control terminal 120, and the network 130 is typically a wireless communication link. The ground control end 120 may be, for example, a control handle. The onboard end 110 is a drone configured with a camera or a camera. In this way, the user may use the ground control terminal 120 to interact with the onboard terminal 110 through the network 130 to receive or send messages.

For example, the user uses the ground control terminal 120 to control the onboard terminal 110 to traverse the predetermined scene for flying, and the onboard terminal 110 records video or takes pictures of the predetermined scene during flying. In this case, the onboard terminal 110 executes the three-dimensional reconstruction method provided in the embodiment of the present application based on the video or the photograph, and calculates the dense point cloud of the predetermined scene. Accordingly, a three-dimensional reconstruction device is disposed at the onboard end 110.

For another example, the user uses the ground control terminal 120 to control the onboard terminal 110 to traverse a predetermined scene for flying, the onboard terminal 110 records a video or takes a picture of the predetermined scene during the flying process, and the onboard terminal 110 transmits the video or the picture to the ground control terminal 120 in real time. The ground control end 120 executes the three-dimensional reconstruction method provided by the embodiment of the present application based on the video or the photograph, and calculates the dense point cloud of the predetermined scene. Accordingly, a three-dimensional reconstruction device is provided at the surface control end 120.

Exemplary method

In order to create a three-dimensional map of a predetermined scene, an unmanned aerial vehicle is used for shooting an image of the whole scene, then image frame extraction is carried out on the image according to a predetermined frequency, all continuous frame images for describing the whole scene are obtained, then three-dimensional dense point clouds of the predetermined scene can be calculated based on all the continuous frame images, and then the three-dimensional map is created according to the three-dimensional dense point clouds.

Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present application. As shown in fig. 2, the three-dimensional reconstruction method 200 includes:

step S210, determining an estimated memory capacity required for performing depth map fusion on multiple frames of images based on respective depth maps of the multiple frames of images.

In one embodiment, the estimated memory capacity is determined based on the number of multi-frame images, the size and coverage area of a single-frame image, and the down-sampling multiple and sampling interval during depth map fusion.

Specifically, assuming that the number of images is N, the image size is image _ size (width × height), the maximum image coverage area is N × image _ size, the image is down-sampled by M times, and a dot depth value is calculated every S dots. Then, the memory consumption when the depth map fusion thread performs depth map fusion on all the frame images is estimated to be:

step S220, based on the estimated memory capacity, the obtained current memory free capacity, the camera external parameters of the multi-frame images and the sparse point cloud, the multi-frame images are grouped to obtain at least one group.

First, a maximum number of images in the group is determined based on the current free memory capacity and the estimated memory capacity.

Bearing the above example, the current free memory capacity free _ memory of the airborne terminal is obtained. When pre _ memory _ cost<And (4) free _ memory, which indicates that the depth map fusion thread can directly calculate the dense point cloud of all the frame images. In this case, the maximum number of images in a group is N, and the number of groups is 1, that is, all frame images are divided into one group. When the pre _ memory _ cost is more than or equal to the free _ memory, the memory is consumed by each frame of image averagely

Determining the maximum value of the current calculable image frame number of the depth map fusion thread as

In a specific application process, the number of images in each group can be adaptively adjusted within a predetermined range set based on the maximum value of the number of images in the group, and the predetermined range is [0.9image _ numbers, for example]。

Secondly, clustering all the frame images based on the maximum value of the number of the images in the group by combining the camera external parameters and the sparse point cloud of all the frame images to obtain at least one group. In one example, an AP clustering algorithm is employed to cluster all frame images. All the frame images are grouped based on a clustering algorithm, and continuous adjacent frame images and corresponding sparse point clouds can be divided into a group.

Step S230, respectively performing depth map fusion on at least one group to obtain at least one dense point cloud. The step of depth map fusion can be implemented using conventional schemes, which are not described in detail herein.

And S240, fusing at least one dense point cloud to obtain the dense point cloud of the preset scene. For example, at least one dense point cloud is stitched to obtain a dense point cloud of the predetermined scene. Each dense point cloud corresponds to a geographic area, and overlapping areas may or may not exist between different dense point clouds.

According to the three-dimensional reconstruction method provided by the embodiment, dense point clouds are calculated by grouping all the frame images according to the current free memory capacity, and then the dense point clouds of all the groups are fused to obtain the dense point clouds of the whole scene, so that the memory capacity is fully utilized, the requirement on the memory capacity is reduced, and the hardware cost is reduced.

Fig. 3 is a schematic flow chart of a three-dimensional reconstruction method according to a second embodiment of the present application. As shown in fig. 3, the three-dimensional reconstruction method 300 further includes, on the basis of the three-dimensional reconstruction method 200 shown in fig. 2, the following steps performed before step S210:

step S310, determining a plurality of neighborhood frame images in a plurality of frame images based on GPS information carried by the plurality of frame images acquired by the unmanned aerial vehicle, wherein the plurality of frame images are partial continuous frame images in a plurality of frame images.

For example, firstly, any one frame image in a plurality of frame images is selected as a target frame image; secondly, determining a circular area which takes the GPS information of the target frame image as an origin and takes the preset length as a radius; and then, determining that the multi-frame image corresponding to the GPS information in the circular area is a multi-frame neighborhood image. It should be understood that the circular regions referred to herein may also be rectangular regions, polygonal regions, etc.

In step S320, a depth map of each of the plurality of field frame images is determined.

In this embodiment, step S320 is specifically executed as:

step S321, feature point extraction is performed on the plurality of frame images, respectively, to obtain respective key points and descriptors of the plurality of frame images.

And selecting representative pixel points, namely key points, from the image. The key point information and the descriptors in each frame of image can be obtained through the characteristic point extraction process. The key point information includes position information of the key point in the image, and may further include direction and scale information of the key point. A descriptor is typically an artificially set vector that describes the pixel information around a keypoint.

In one embodiment, Scale-invariant feature transform (SIFT) algorithm or speeded-Up Robust Features (SURF) algorithm is used for feature point extraction.

The execution sequence of step S321 and step S322 may be interchanged.

Step S322, feature point matching is carried out on the multiple neighborhood frame images based on the descriptors, and respective feature point matching relations of the multiple neighborhood frame images are obtained.

The feature point matching is used for determining the same physical entity point in the images of different visual angles, that is, the same object in the images of different visual angles can be matched according to the feature point matching process. The present embodiment utilizes the similarity of the descriptors to determine whether two key points indicate the same physical entity point, that is, the higher the similarity of the descriptors of the two key points is, the higher the probability that the two key points indicate the same physical entity point is; conversely, the smaller.

The process of calculating the similarity of the descriptors of the two key points, i.e. the feature point Matching process, may adopt a Brute Force Matching (BFM) algorithm.

In one embodiment, the feature point matching thread further comprises a process of filtering out false matches, corresponding to a process of checking the matching points, for determining whether the matching points match correctly. The process of filtering the mismatching may use a geometric constraint method, for example, the hamming distance is less than twice the minimum distance, and may also use a cross-matching algorithm, a K Nearest Neighbors (KNN) matching algorithm, and the like.

Step S323, determining sparse point clouds corresponding to the plurality of neighborhood frame images and camera internal parameters and camera external parameters of the plurality of neighborhood frame images by using a local beam adjustment algorithm based on the characteristic point matching relationship and GPS information carried by the plurality of neighborhood frame images.

The sparse point cloud refers to physical entity points corresponding to key points in the multi-frame neighborhood images. The camera parameters include intra-camera parameters and extra-camera parameters. The camera internal parameters include the focal length and the image principal point position of the camera, and the camera internal parameters can be calibrated in advance or estimated under the condition of enough redundant observation. The camera-outside parameters comprise world coordinates and rotation angles of the center of the camera, and uniquely describe the position and attitude orientation of the camera in a world coordinate system.

Step S324, determining respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the camera internal parameters and the camera external parameters.

Specifically, first, two temporally adjacent frame images are determined based on the timing information carried by each of the multiple frames of neighborhood images, and for convenience of description, the two temporally adjacent frame images are respectively referred to as a first frame image M1 and a second frame image M2. Next, the first frame image M1 and the second frame image M2 are corrected based on the respective off-camera parameters of the first frame image M1 and the second frame image M2 by using a polar geometry algorithm so that the corrected first frame image M1 and second frame image M2 satisfy a binocular stereo matching condition. Next, based on the corrected first frame image M1 and second frame image M2, respective disparity maps are determined; a depth map is determined based on the disparity map.

For example, the respective poses of the first frame image M1 and the second frame image M2 are respectively noted as: pose₁(R₁|C₁)，pose₂(R₂|C₂) Wherein R is_i(i is 1 or 2) is an initial projection matrix, and the initial projection matrix defines the relationship between the three-dimensional world point coordinates and the corresponding pixel point coordinates; c_i(i ═ 1 or 2) are world coordinates of the camera center.

First, an initial projection matrix R based on the first frame image M1₁And an initial projection matrix R of the second frame image M2₂Calculating a rotation matrix R to rotateThe converted first frame image M1 and second frame image M2 are coplanar and parallel to the baseline. The base line is a connection line of the camera centers corresponding to the first frame image M1 and the second frame image M2, and is denoted as a base line L ═ C₂-C₁|。

Next, new projection matrices for the first frame image M1 and the second frame image M2 are determined based on the rotation matrix R, respectively: p_n1＝K[R_n|-R_nC₁]，P_n2＝K[R_n|-R_nC₂]And K is camera internal reference. Determining a corrective transformation matrix, respectively T, based on the new projection matrix₁＝(P_n1(1: 3) ((K) × R1 ')', T2 ═ (Pn2(1:3,1:3) ((K) × R2 '))'. And performing resampling mapping transformation on the first frame image and the second frame image according to the rectification transformation matrixes T1 and T2 to obtain a corrected first frame image m1 and a corrected second frame image m 2.

Then, calculating respective disparity maps of the corrected first frame image m1 and second frame image m2 by adopting a binocular stereo matching algorithm; according to the formula, i.e.

Calculating a depth map, wherein D represents a depth value; f represents a focal length value in the camera intrinsic parameter; s represents a disparity value. In one embodiment, the binocular stereo matching algorithm may be a local stereo matching algorithm, a global stereo matching algorithm, or the like.

And obtaining respective depth maps of the multi-frame neighborhood images.

According to the three-dimensional reconstruction method provided by the embodiment, the dense point cloud can be calculated after the unmanned aerial vehicle acquires partial images, namely a plurality of frames of images, of the whole scene, so that the effect of calculating the dense point cloud on line in real time is achieved, namely, the map is obtained while flashing.

Fig. 4 is a schematic diagram of an implementation process of a three-dimensional reconstruction method according to a third embodiment of the present application. The three-dimensional reconstruction method shown in fig. 4 is different from the three-dimensional reconstruction method shown in fig. 3 in the implementation manner of step S320. Specifically, in the present embodiment, the three-dimensional reconstruction method is implemented as a computer program stored on a storage medium, the computer program including a depth map determination module that performs step S320, the depth map determination module including a feature extraction sub-module 421, a feature matching sub-module 422, a local method adjustment sub-module 423, a state detection sub-module 424, and a depth measurement operator module 425. These modules may be run on the cpu, or some modules may be run on the GPU, such as the feature extraction sub-module 421 and the feature matching sub-module 422.

The feature extraction submodule 421 extracts feature points of the acquired multiple frames of images to obtain respective key points and descriptors of the multiple frames of images, and the multiple frames of images are selected from partial continuous frames of images in all images extracted from the unmanned aerial vehicle image.

The feature matching sub-module 422 performs feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relationships of the plurality of neighborhood frame images.

The local adjustment sub-module 423 determines sparse point clouds corresponding to the plurality of neighborhood frame images and respective intra-camera parameters and extra-camera parameters of the plurality of neighborhood frame images based on the feature point matching relationship and the GPS information carried by the plurality of neighborhood frame images.

The state detection submodule 424 determines the operation states of the feature extraction submodule 421, the feature matching submodule 422 and the local adjustment submodule 423; determining an available module according to the module with the running state as the end; the number of depth calculator modules 425 is determined from the number of available modules.

For example, the feature extraction sub-module 421, the feature matching sub-module 422, and the local normal adjustment sub-module 423 correspond to one thread, respectively. In this case, first, the total number of device threads thread _ sum is acquired to be 3, and the currently available thread number is thread _ current — thread _ sum-3. Secondly, whether the threads corresponding to the feature extraction submodule 421, the feature matching submodule 422 and the local method adjustment submodule 423 are finished or not is respectively detected, and if one thread is finished, the thread _ current is added by 1. Then, thread _ current threads are started, and the step of the depth calculator module 425, that is, calculating the depth map of each frame of image, is executed respectively.

The depth calculation sub-module 425 determines a depth map for each of the plurality of neighborhood frame images based on the sparse point cloud, the intra-camera parameters, and the extra-camera parameters.

It should be noted that, in the embodiment, details of the implementation of the feature extraction sub-module 421, the feature matching sub-module 422, the local adjustment sub-module 423, and the depth measurement operator module 425 may refer to the embodiment shown in fig. 3, which is not described herein again.

According to the three-dimensional reconstruction method provided by the embodiment, the parallel execution number of the depth calculation submodule 425 can be determined according to the running states of the feature extraction submodule 421, the feature matching submodule 422, the local adjustment submodule 423 and the depth calculation operator module 425, and the full utilization of the calculation resources is further realized.

Exemplary devices

The application also provides a three-dimensional reconstruction device. Fig. 5 is a block diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application. As shown in fig. 5, the three-dimensional reconstruction apparatus 50 includes: the determining module 51 is configured to determine, based on respective depth maps of multiple frames of images, an estimated memory capacity required for depth map fusion of the multiple frames of images, where the multiple frames of images are acquired during an aerial photography process of a predetermined scene by an unmanned aerial vehicle. The clustering module 52 is configured to cluster the multiple frames of images based on the estimated memory capacity, the obtained current memory free capacity, the obtained camera external parameters of the multiple frames of images, and the obtained sparse point cloud, so as to obtain at least one group. A fusion module 53, configured to perform depth map fusion on at least one group respectively to obtain at least one dense point cloud; and fusing at least one dense point cloud to obtain the dense point cloud of the preset scene.

In one embodiment, the determining module 51 is specifically configured to determine the estimated memory capacity based on the number of the multiple frames of images, the size and the coverage area of a single frame of image, and the down-sampling multiple and the sampling interval during depth map fusion.

In one embodiment, the clustering module 52 is specifically configured to determine a maximum value of the number of images in the cluster based on the current free memory capacity and the estimated memory capacity; based on the maximum value of the number of images in the group, clustering and grouping are carried out on the multi-frame images by combining the camera external parameters of the multi-frame images and the sparse point cloud, so as to obtain at least one group.

In one example, when the current free memory capacity is larger than the estimated memory capacity, the number of frames of the multi-frame image is determined as the maximum value of the number of images in the group. And when the current free memory capacity is smaller than or equal to the estimated memory capacity, determining the maximum value of the number of the images in the group based on the current free memory capacity, the estimated memory capacity and the number of the multi-frame images.

In one embodiment, the determining module 51 is further configured to determine, based on GPS information carried by each of a plurality of frames of images acquired by the drone, a plurality of neighborhood frame images in the plurality of frames of images, where the plurality of frames of images are partial continuous frames of images in the multi-frame image. The three-dimensional reconstruction apparatus 50 further includes: and the depth map determining module is used for determining the respective depth maps of the plurality of field frame images.

In one embodiment, the determining module 51 is specifically configured to select any one of the plurality of frame images as a target frame image; determining a circular area which takes the GPS information of the target frame image as an origin and takes the preset length as a radius; and determining the multi-frame image corresponding to the GPS information in the circular area as a plurality of neighborhood frame images.

In one embodiment, the depth map determination module includes a feature extraction sub-module, a feature matching sub-module, a local method adjustment sub-module, and a depth calculation sub-module. The feature extraction submodule is used for respectively extracting feature points of the plurality of frames of images to obtain respective key points and descriptors of the plurality of frames of images. The feature matching submodule is used for performing feature point matching on the multiple neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the multiple neighborhood frame images. The local adjustment sub-module is used for determining sparse point clouds corresponding to the neighborhood frame images and camera internal parameters and camera external parameters of the neighborhood frame images by adopting a local beam adjustment algorithm based on the characteristic point matching relationship and GPS information carried by the neighborhood frame images. The depth measurement operator module is used for determining respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the camera internal parameters and the camera external parameters.

In another embodiment, the depth map determination module includes a feature extraction sub-module, a feature matching sub-module, a local method adjustment sub-module, a state detection sub-module, and a depth calculator sub-module. The feature extraction submodule respectively extracts feature points of the plurality of frames of images to obtain respective key points and descriptors of the plurality of frames of images. And the feature matching submodule performs feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the plurality of neighborhood frame images. And the local adjustment submodule determines sparse point clouds corresponding to the plurality of neighborhood frame images and respective camera internal parameters and camera external parameters of the plurality of neighborhood frame images based on the characteristic point matching relationship and the GPS information carried by the plurality of neighborhood frame images. The state detection submodule determines the operation states of the feature extraction submodule, the feature matching submodule and the local adjustment submodule; determining an available module according to the module with the running state as the end; the number of depth calculator sub-modules is determined from the number of available modules. The depth measurement operator module determines respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the camera internal parameters and the camera external parameters.

The three-dimensional reconstruction device provided by the embodiment belongs to the same application concept as the three-dimensional reconstruction method provided by the embodiment of the application, can execute the three-dimensional reconstruction method provided by any embodiment of the application, and has corresponding functional modules and beneficial effects for executing the three-dimensional reconstruction method. For details of the three-dimensional reconstruction method provided in the embodiments of the present application, reference may be made to the details of the technique not described in detail in the embodiments of the present application, which are not described herein again.

Exemplary electronic device

Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 6, the electronic device 6 may be the onboard terminal 110 or the bottom control terminal 120 shown in fig. 1. The electronic device 6 comprises one or more processors 61 and a memory 62.

The processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 60 to perform desired functions.

The memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 11 to implement the three-dimensional reconstruction methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 60 may further include: an input device 63 and an output device 64, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, the input device 63 may be a camera for capturing images of a predetermined scene. Where the electronic device is a stand-alone device, the input means 63 may be a communication network connector for receiving signals from the network 130 shown in fig. 1. The input device 63 may also include, for example, a keyboard, a mouse, and the like.

The output device 64 may output various information including the collected image, the three-dimensional dense point cloud, and the like to the outside. Output devices 64 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for the sake of simplicity, only some of the components of the electronic device 6 relevant to the present application are shown in fig. 6, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 6 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the three-dimensional reconstruction method according to various embodiments of the present application described in the above-mentioned "exemplary methods" section of this specification.

The computer program product may include program code for carrying out operations for embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor 11 to perform the three-dimensional reconstruction steps according to various embodiments of the present application described in the "exemplary methods" section above in this description.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of three-dimensional reconstruction, comprising:

determining the estimated memory capacity required by the depth map fusion of the multiple frames of images based on the respective depth maps of the multiple frames of images;

grouping the multi-frame images based on the estimated memory capacity, the obtained current memory free capacity, the respective camera external parameters of the multi-frame images and the sparse point cloud to obtain at least one group;

respectively carrying out depth map fusion on the at least one group to obtain at least one dense point cloud;

and fusing the at least one dense point cloud to obtain the dense point cloud of the preset scene.

2. The three-dimensional reconstruction method according to claim 1, wherein the determining, based on the respective depth maps of the plurality of frames of images, an estimated memory capacity required for depth map fusion of the plurality of frames of images comprises:

and determining the estimated memory capacity based on the number of the multi-frame images, the size and the coverage area of the single-frame images, and the down-sampling multiple and the sampling interval during the depth map fusion.

3. The three-dimensional reconstruction method according to claim 1, wherein the clustering the plurality of frames of images based on the estimated memory capacity, the obtained current memory free capacity, the respective camera external parameters of the plurality of frames of images, and the sparse point cloud to obtain at least one group comprises:

determining a maximum number of images in the group based on the current free memory capacity and the estimated memory capacity;

and based on the maximum value of the number of the images in the group, carrying out clustering and grouping on the multi-frame images by combining the camera external parameters and the sparse point cloud of the multi-frame images to obtain at least one group.

4. The three-dimensional reconstruction method of claim 3, wherein said determining a maximum number of images within a group based on a current free memory capacity and said estimated memory capacity comprises:

when the current free memory capacity is larger than the estimated memory capacity, determining the frame number of the multi-frame images as the maximum value of the number of the images in the group;

when the current free memory capacity is smaller than or equal to the estimated memory capacity, determining the maximum value of the number of the images in the group based on the current free memory capacity, the estimated memory capacity and the number of the multi-frame images.

5. The three-dimensional reconstruction method according to any one of claims 1 to 4, further comprising, before determining an estimated memory capacity required for depth map fusion of the plurality of frames of images based on respective depth maps of the plurality of frames of images:

determining a plurality of neighborhood frame images in a plurality of frame images based on GPS information carried by the plurality of frame images acquired by the unmanned aerial vehicle, wherein the plurality of frame images are partial continuous frame images in the plurality of frame images;

determining respective depth maps of the plurality of neighborhood frame images.

6. The three-dimensional reconstruction method of claim 5, wherein said determining a plurality of neighborhood frame images of the plurality of frame images based on GPS information carried by each of the plurality of frame images acquired by the UAV comprises:

selecting any frame image in the plurality of frame images as a target frame image;

determining a circular area which takes the GPS information of the target frame image as an origin point and takes a preset length as a radius;

and determining the multi-frame image corresponding to the GPS information in the circular area as the plurality of neighborhood frame images.

7. The three-dimensional reconstruction method according to claim 5, further comprising, before said determining the depth map of each of said plurality of neighborhood frame images:

respectively extracting characteristic points of the plurality of frame images to obtain respective key points and descriptors of the plurality of frame images;

the determining the respective depth maps of the plurality of neighborhood frame images comprises:

performing feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the plurality of neighborhood frame images;

determining sparse point clouds corresponding to the plurality of neighborhood frame images and respective camera internal parameters and camera external parameters of the plurality of neighborhood frame images by adopting a local beam adjustment algorithm based on the characteristic point matching relationship and GPS information carried by the plurality of neighborhood frame images;

determining respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the intra-camera parameters, and the extra-camera parameters.

8. The three-dimensional reconstruction method of claim 7, wherein the determining the respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the intra-camera parameters, and the extra-camera parameters specifically comprises:

determining a first frame image and a second frame image which are adjacent in time sequence based on the time sequence information carried by the multiple frames of neighborhood images;

correcting the first frame image and the second frame image based on the camera external parameters by adopting a polar geometric algorithm so as to enable the corrected first frame image and the corrected second frame image to meet binocular stereo matching conditions;

determining respective disparity maps based on the corrected first frame image and the second frame image;

determining the depth map based on the disparity map.

9. The three-dimensional reconstruction method according to claim 5, wherein the three-dimensional reconstruction method is implemented as a computer program stored on a storage medium, the computer program comprising a depth map determination module that performs the step of determining the depth map of each of the plurality of neighborhood frame images, the depth map determination module comprising a feature extraction sub-module, a feature matching sub-module, a local mean square sub-module, a state detection sub-module, and a depth computation sub-module:

the feature extraction submodule respectively extracts feature points of the plurality of frame images to obtain respective key points and descriptors of the plurality of frame images;

the feature matching submodule performs feature point matching on the plurality of neighborhood frame images based on the descriptor to obtain respective feature point matching relations of the plurality of neighborhood frame images;

the local adjustment sub-module determines sparse point clouds corresponding to the plurality of neighborhood frame images and respective camera internal parameters and camera external parameters of the plurality of neighborhood frame images based on the feature point matching relationship and GPS information carried by the plurality of neighborhood frame images;

the state detection submodule determines the operating states of the feature extraction submodule, the feature matching submodule and the local adjustment submodule; determining an available module according to the module with the running state as the end; determining the number of depth calculator sub-modules according to the number of the available modules;

the depth calculator module determines respective depth maps of the plurality of neighborhood frame images based on the sparse point cloud, the intra-camera parameters, and the extra-camera parameters.

10. A three-dimensional reconstruction apparatus, comprising:

the determining module is used for determining the estimated memory capacity required by the depth map fusion of the multi-frame images based on the respective depth maps of the multi-frame images, wherein the multi-frame images are acquired in the process of aerial photography of a predetermined scene by the unmanned aerial vehicle;

the clustering module is used for clustering the multi-frame images based on the estimated memory capacity, the obtained current memory free capacity, the respective camera external parameters of the multi-frame images and the sparse point cloud to obtain at least one cluster;

the fusion module is used for respectively carrying out depth map fusion on the at least one group to obtain at least one dense point cloud; and fusing the at least one dense point cloud to obtain the dense point cloud of the preset scene.

11. A computer device comprising a memory, a processor and a computer program stored on the memory for execution by the processor, characterized in that the processor implements the steps of the three-dimensional reconstruction method according to any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the three-dimensional reconstruction method according to one of claims 1 to 9.