CN110706329A

CN110706329A - Three-dimensional scene reconstruction method and device

Info

Publication number: CN110706329A
Application number: CN201910839704.0A
Authority: CN
Inventors: 陈松; 袁训明; 井方伟
Original assignee: Shenzhen Asian Union Development Technology Co Ltd
Current assignee: Shenzhen Asian Union Development Technology Co Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2020-01-17

Abstract

The embodiment of the application belongs to the field of automatic navigation of unmanned aerial vehicles, and relates to a three-dimensional scene reconstruction method, which comprises the following steps: calibrating cameras of at least one group of binocular cameras to correct the fact that two groups of cameras of the binocular cameras have the same retina plane; continuously acquiring a first video image and a second video image through a binocular camera; matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image; calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image; performing point cloud alignment and splicing of the images according to the image depth data disclosed in the depth map to reconstruct a three-dimensional curved surface; and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene. The application also relates to a three-dimensional scene reconstruction device. The technical scheme provided by the application can quickly and effectively realize the reconstruction of the three-dimensional scene, and the cost is far lower than that of the conventional three-dimensional scene reconstruction scheme.

Description

Three-dimensional scene reconstruction method and device

Technical Field

The application relates to the field of unmanned aerial vehicle automatic navigation, in particular to a three-dimensional scene reconstruction method and device.

Background

Unmanned aerial vehicle has received extensive application because of its convenience and low cost of use, along with unmanned aerial vehicle's the volume of keeping continues to increase, especially many unmanned aerial vehicle's use scene is densely populated, city and industrial area that tall building concentrated, and some forest farms and the field environment of landform complicacy, carry out unmanned aerial vehicle's navigation operation among these environments, it is too big to only rely on artifical obvious degree of difficulty, it is high to operator's driving specialty degree requirement, need provide the unmanned aerial vehicle platform that can automatic navigation, in the middle of unmanned aerial vehicle automatic navigation's process, need carry out accurate judgement to surrounding environment, it is concrete, need adopt three-dimensional reconstruction's mode to digitize surrounding environment, use in order to be equipped with unmanned aerial vehicle navigation. The existing three-dimensional reconstruction mode comprises the steps of carrying out tomography scanning on a three-dimensional object, and connecting and triangularizing extracted layers to obtain the internal structure and the form of the three-dimensional object; acquiring a dot matrix on the surface of the three-dimensional object by using a probe or laser and other means with definite directivity, and carrying out triangulation on the whole to acquire an accurate external form of the three-dimensional object; the method comprises the steps of obtaining point cloud data of an object through three-dimensional scanning, reconstructing the point cloud data to generate three-dimensional data of the three-dimensional object, and performing three-dimensional reconstruction through a binocular vision reconstruction mode.

Disclosure of Invention

The technical problem to be solved by the embodiment of the application is to provide a scheme capable of quickly reconstructing a three-dimensional scene with low overhead.

In order to solve the above technical problem, an embodiment of the present application provides a three-dimensional scene reconstruction method, which adopts the following technical solutions:

a method of three-dimensional scene reconstruction, the method comprising: calibrating cameras of at least one group of binocular cameras to correct the fact that two groups of cameras of the binocular cameras have the same retina plane; continuously acquiring a first video image and a second video image through a binocular camera; matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image; calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image; performing point cloud alignment and splicing of the images according to the image depth data disclosed in the depth map to reconstruct a three-dimensional curved surface; and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene.

Further, after point cloud alignment and stitching of the images are performed according to the image depth data disclosed in the depth map to reconstruct the three-dimensional curved surface, a three-dimensional space is drawn to realize reconstruction of the three-dimensional scene, and the method further includes: generating a texture matrix according to the texture information of the first video image and the second video image to perform texture mapping, and specifically comprising: determining texture coordinates of the characteristic factors under the light source viewpoint according to texture information of the characteristic factors in the first video image and the second video image; generating texture coordinates under the viewpoint position through an opengl algorithm and texture coordinates under the light source viewpoint; generating a texture shadow map according to the texture coordinates and generating a texture matrix; the texture matrix is associated with the three-dimensional surface data.

Further, the matching of the feature factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image specifically includes: extracting characteristic factors in the first video image and the second video image; matching the characteristic factors in the first video image and the second video image to correlate the physical positions of the first video image and the second video image; and converging to generate a disparity map according to the position difference of the characteristic factors in the first video image and the second video image.

Further, the specific method for matching the feature factors in the first video image and the second video image is as follows: matching feature factors in the first video image and the second video image through a SURF feature matching algorithm; determining that the characteristic factors are matched without errors according to the difference between the characteristic factors in the first video image and the second video image reaching a preset value; determining a homography matrix of the characteristic factors between the first video image and the second video image through a RANSAC algorithm; mapping the coordinates of the characteristic factors in the first video image to the second video image according to the homography matrix, and determining that the characteristic factors are matched without errors according to the matching of the characteristic factors in the mapped first video image and the characteristic factors in the second video image; and discarding part of the characteristic factors according to the condition that the distance between at least two groups of characteristic factors is smaller than a preset value, so that the minimum distance between at least two groups of characteristic factors is larger than the preset value.

Further, calculating depth data of the image according to the disparities extracted from the multiple consecutive groups of disparity maps to generate a depth map specifically includes: separating a target area and an auxiliary area from the disparity map by a k-means method; determining the parallax of each pixel in the target area in the parallax map through the local sliding window so as to determine the depth data of each pixel in the target area; determining the parallax of each pixel in the auxiliary region through a binary characteristic approximate matching algorithm based on multi-level clustering tree priority search so as to determine the depth data of each pixel in the auxiliary region; fusing the depth data of the target area and the auxiliary area into an integral image; the whole image is filtered by a median filtering algorithm to generate a depth map.

Further, the calibrating the at least one group of binocular cameras to correct that two groups of cameras of the binocular cameras have the same retinal plane specifically includes: respectively acquiring internal parameters and external parameters of the two groups of cameras, wherein the internal parameters comprise: the length and width of the camera photosensitive element, the offset range and the focal length of the midpoint of the camera photosensitive element, and the external parameters comprise: a baseline distance; adjusting the camera parameters of the two groups of cameras to be consistent so that the focal lengths of the two groups of cameras are the same and a retina plane is shared approximately; and shooting a group of images through the two groups of cameras, and adjusting the two groups of cameras according to the coordinates of the characteristic factors in the two groups of images in the images and the corresponding relation among the object distance, the focal length and the baseline distance, so that the coordinates of the characteristic factors in the two groups of images accord with the corresponding relation among the object distance, the focal length and the baseline distance.

In order to solve the above technical problem, the present application further discloses a three-dimensional scene reconstruction device:

a three-dimensional scene reconstruction apparatus comprising: the camera module is used for calibrating the cameras of at least one group of binocular cameras so as to correct the two groups of cameras of the binocular cameras to have the same retinal plane; continuously acquiring a first video image and a second video image through a binocular camera; the processing module is used for matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image; calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image; the drawing module is used for carrying out point cloud alignment and splicing on the images according to the image depth data disclosed in the depth map so as to reconstruct a three-dimensional curved surface; and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene.

Further, the rendering module is further configured to: generating a texture matrix for texture mapping, specifically, the rendering module is further configured to generate a texture matrix for texture mapping, and is configured to determine texture coordinates of the characteristic factors under the light source viewpoint according to texture information of the characteristic factors in the first video image and the second video image; generating texture coordinates under the viewpoint position through an opengl algorithm and texture coordinates under the light source viewpoint; generating a texture shadow map according to the texture coordinates and generating a texture matrix; the texture matrix is associated with the three-dimensional surface data.

Further, the processing module is further configured to: extracting characteristic factors in the first video image and the second video image; matching the characteristic factors in the first video image and the second video image to correlate the physical positions of the first video image and the second video image; and converging to generate a disparity map according to the position difference of the characteristic factors in the first video image and the second video image.

Further, the processing module is further configured to match feature factors in the first video image and the second video image through a SURF feature matching algorithm; determining that the characteristic factors are matched without errors according to the difference between the characteristic factors in the first video image and the second video image reaching a preset value; determining a homography matrix of the characteristic factors between the first video image and the second video image through a RANSAC algorithm; mapping the coordinates of the characteristic factors in the first video image to the second video image according to the homography matrix, and determining that the characteristic factors are matched without errors according to the matching of the characteristic factors in the mapped first video image and the characteristic factors in the second video image; and discarding part of the characteristic factors according to the condition that the distance between at least two groups of characteristic factors is smaller than a preset value, so that the minimum distance between at least two groups of characteristic factors is larger than the preset value.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the method comprises the steps of calibrating a camera of a double camera to obtain a first stable video image and a second stable video image, matching characteristic factors to generate a disparity map, determining depth data of the images according to the disparity map, performing point cloud alignment and splicing according to the depth data of the images to reconstruct a three-dimensional curved surface and draw a three-dimensional space, and achieving reconstruction of a three-dimensional scene.

Drawings

In order to illustrate the solution of the present application more clearly, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is a flow chart of a three-dimensional scene reconstruction method of the present invention;

FIG. 2 is a flowchart of a three-dimensional scene reconstruction method step S600 according to the present invention;

FIG. 3 is a flowchart of step S300 of FIG. 1;

FIG. 4 is a flowchart of step S302 of FIG. 3;

FIG. 5 is a flowchart of step S400 of FIG. 1;

FIG. 6 is a flowchart of step S100 of FIG. 1;

fig. 7 is a schematic structural diagram of a three-dimensional scene reconstruction apparatus according to the present invention.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Embodiment one of three-dimensional scene reconstruction method

A method of three-dimensional scene reconstruction, the method comprising:

step S100: calibrating cameras of at least one group of binocular cameras to correct the fact that two groups of cameras of the binocular cameras have the same retina plane;

step S200: continuously acquiring a first video image and a second video image through a binocular camera;

step S300: matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image;

step S400: calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image;

step S500: performing point cloud alignment and splicing of the images according to the image depth data disclosed in the depth map to reconstruct a three-dimensional curved surface;

step S700: and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene.

Further, after point cloud alignment and stitching of the images are performed according to the image depth data disclosed in the depth map to reconstruct the three-dimensional curved surface, a three-dimensional space is drawn to realize reconstruction of the three-dimensional scene, and the method further includes:

step S600: generating a texture matrix according to the texture information of the first video image and the second video image to perform texture mapping, and specifically comprising:

step S601: determining texture coordinates of the characteristic factors under the light source viewpoint according to texture information of the characteristic factors in the first video image and the second video image;

step S602: generating texture coordinates under the viewpoint position through an opengl algorithm and texture coordinates under the light source viewpoint;

step S603: generating a texture shadow map according to the texture coordinates and generating a texture matrix;

step S604: the texture matrix is associated with the three-dimensional surface data.

The scheme is favorable for improving the generation precision of the three-dimensional space so as to improve the accuracy of the three-dimensional scene.

Further, the matching of the feature factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image specifically includes:

step S301: extracting characteristic factors in the first video image and the second video image;

step S302: matching the characteristic factors in the first video image and the second video image to correlate the physical positions of the first video image and the second video image;

step S303: and converging to generate a disparity map according to the position difference of the characteristic factors in the first video image and the second video image.

The scheme is favorable for improving the accuracy of the parallax depth of the parallax map.

Further, the specific method for matching the feature factors in the first video image and the second video image is as follows:

step S3021: matching feature factors in the first video image and the second video image through a SURF feature matching algorithm;

step S3022: determining that the characteristic factors are matched without errors according to the difference between the characteristic factors in the first video image and the second video image reaching a preset value;

step S3023: determining a homography matrix of the characteristic factors between the first video image and the second video image through a RANSAC algorithm;

step S3024: mapping the coordinates of the characteristic factors in the first video image to the second video image according to the homography matrix, and determining that the characteristic factors are matched without errors according to the matching of the characteristic factors in the mapped first video image and the characteristic factors in the second video image;

step S3025: and discarding part of the characteristic factors according to the condition that the distance between at least two groups of characteristic factors is smaller than a preset value, so that the minimum distance between at least two groups of characteristic factors is larger than the preset value.

According to the scheme, the accuracy of determining the characteristic factors in the disparity map is ensured through judging the characteristic factors, and the density space of the characteristic factors in the disparity map is ensured so as to ensure that the disparity map can be accurate and the heights of the characteristic factors in the first video image and the second video image are revealed without redundancy.

Further, calculating depth data of the image according to the disparities extracted from the multiple consecutive groups of disparity maps to generate a depth map specifically includes:

step S401: separating a target area and an auxiliary area from the disparity map by a k-means method;

step S402: determining the parallax of each pixel in the target area in the parallax map through the local sliding window so as to determine the depth data of each pixel in the target area;

step S403: determining the parallax of each pixel in the auxiliary region through a binary characteristic approximate matching algorithm based on multi-level clustering tree priority search so as to determine the depth data of each pixel in the auxiliary region;

step S404: fusing the depth data of the target area and the auxiliary area into an integral image;

step S405: the whole image is filtered by a median filtering algorithm to generate a depth map.

According to the scheme, the acquisition area of the depth map can be divided into a target area which is crucial to distance judgment and an auxiliary area with lower importance, and operation is performed, so that the generation efficiency of the depth map is greatly improved on the premise that the depth map is accurately generated.

Further, the calibrating the at least one group of binocular cameras to correct that two groups of cameras of the binocular cameras have the same retinal plane specifically includes:

step S101: respectively acquiring internal parameters and external parameters of the two groups of cameras, wherein the internal parameters comprise: the length and width of the camera photosensitive element, the offset range and the focal length of the midpoint of the camera photosensitive element, and the external parameters comprise: a baseline distance;

step S102: adjusting the camera parameters of the two groups of cameras to be consistent so that the focal lengths of the two groups of cameras are the same and a retina plane is shared approximately;

step S103: and shooting a group of images through the two groups of cameras, and adjusting the two groups of cameras according to the coordinates of the characteristic factors in the two groups of images in the images and the corresponding relation among the object distance, the focal length and the baseline distance, so that the coordinates of the characteristic factors in the two groups of images accord with the corresponding relation among the object distance, the focal length and the baseline distance.

According to the scheme, through the application of the reference coordinates and the adjustment of the position relation of the characteristic factors in the imaging of the two groups of cameras in the coordinates, the two groups of cameras are adjusted, so that the retina planes of the two groups of cameras are shared, and the accuracy of subsequent image recognition and the accuracy of the establishment of a three-dimensional model are guaranteed.

An embodiment of a three-dimensional scene reconstruction device is as follows:

a three-dimensional scene reconstruction apparatus comprising:

the camera module is used for calibrating the cameras of at least one group of binocular cameras so as to correct the two groups of cameras of the binocular cameras to have the same retinal plane; continuously acquiring a first video image and a second video image through a binocular camera;

the processing module is used for matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image; calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image;

the drawing module is used for carrying out point cloud alignment and splicing on the images according to the image depth data disclosed in the depth map so as to reconstruct a three-dimensional curved surface; and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for reconstructing a three-dimensional scene, the method comprising:

calibrating cameras of at least one group of binocular cameras to correct the fact that two groups of cameras of the binocular cameras have the same retina plane;

continuously acquiring a first video image and a second video image through a binocular camera;

matching the characteristic factors of the first video image and the second video image to generate a disparity map based on the first video image and the second video image;

calculating the depth data of the image according to the parallaxes extracted from the continuous multiple groups of parallax images to generate a depth image;

performing point cloud alignment and splicing of the images according to the image depth data disclosed in the depth map to reconstruct a three-dimensional curved surface;

and drawing a three-dimensional space to realize the reconstruction of the three-dimensional scene.

2. The method of claim 1, wherein after performing point cloud alignment and stitching of the images according to the image depth data disclosed in the depth map to reconstruct the three-dimensional curved surface, the method further comprises before rendering a three-dimensional space to reconstruct the three-dimensional scene: generating a texture matrix according to the texture information of the first video image and the second video image to perform texture mapping, and specifically comprising:

determining texture coordinates of the characteristic factors under the light source viewpoint according to texture information of the characteristic factors in the first video image and the second video image;

generating texture coordinates under the viewpoint position through an opengl algorithm and texture coordinates under the light source viewpoint;

generating a texture shadow map according to the texture coordinates and generating a texture matrix;

the texture matrix is associated with the three-dimensional surface data.

3. The method according to claim 1, wherein the matching of the feature factors of the first video image and the second video image to generate the disparity map based on the first video image and the second video image specifically comprises:

extracting characteristic factors in the first video image and the second video image;

matching the characteristic factors in the first video image and the second video image to correlate the physical positions of the first video image and the second video image;

and converging to generate a disparity map according to the position difference of the characteristic factors in the first video image and the second video image.

4. The method according to claim 3, wherein the specific method for matching the feature factors in the first video image and the second video image is:

matching feature factors in the first video image and the second video image through a SURF feature matching algorithm;

determining that the characteristic factors are matched without errors according to the difference between the characteristic factors in the first video image and the second video image reaching a preset value;

determining a homography matrix of the characteristic factors between the first video image and the second video image through a RANSAC algorithm;

mapping the coordinates of the characteristic factors in the first video image to the second video image according to the homography matrix, and determining that the characteristic factors are matched without errors according to the matching of the characteristic factors in the mapped first video image and the characteristic factors in the second video image;

and discarding part of the characteristic factors according to the condition that the distance between at least two groups of characteristic factors is smaller than a preset value, so that the minimum distance between at least two groups of characteristic factors is larger than the preset value.

5. The method of claim 1, wherein the computing depth data of the image according to the disparities extracted from the consecutive groups of disparity maps to generate the depth map specifically comprises:

separating a target area and an auxiliary area from the disparity map by a k-means method;

determining the parallax of each pixel in the target area in the parallax map through the local sliding window so as to determine the depth data of each pixel in the target area;

determining the parallax of each pixel in the auxiliary region through a binary characteristic approximate matching algorithm based on multi-level clustering tree priority search so as to determine the depth data of each pixel in the auxiliary region;

fusing the depth data of the target area and the auxiliary area into an integral image;

the whole image is filtered by a median filtering algorithm to generate a depth map.

6. The method of claim 1, wherein the camera calibration of at least one set of binocular cameras to correct two sets of cameras of the binocular cameras having the same retinal plane specifically comprises:

respectively acquiring internal parameters and external parameters of the two groups of cameras, wherein the internal parameters comprise: the length and width of the camera photosensitive element, the offset range and the focal length of the midpoint of the camera photosensitive element, and the external parameters comprise: a baseline distance;

adjusting the camera parameters of the two groups of cameras to be consistent so that the focal lengths of the two groups of cameras are the same and a retina plane is shared approximately;

and shooting a group of images through the two groups of cameras, and adjusting the two groups of cameras according to the coordinates of the characteristic factors in the two groups of images in the images and the corresponding relation among the object distance, the focal length and the baseline distance, so that the coordinates of the characteristic factors in the two groups of images accord with the corresponding relation among the object distance, the focal length and the baseline distance.

7. A three-dimensional scene reconstruction apparatus, comprising:

8. The apparatus of claim 7, wherein the rendering module is further configured to: generating a texture matrix for texture mapping, specifically, the rendering module is further configured to generate a texture matrix for texture mapping, and is configured to determine texture coordinates of the characteristic factors under the light source viewpoint according to texture information of the characteristic factors in the first video image and the second video image; generating texture coordinates under the viewpoint position through an opengl algorithm and texture coordinates under the light source viewpoint; generating a texture shadow map according to the texture coordinates and generating a texture matrix; the texture matrix is associated with the three-dimensional surface data.

9. The apparatus of claim 7, wherein the processing module is further configured to: extracting characteristic factors in the first video image and the second video image; matching the characteristic factors in the first video image and the second video image to correlate the physical positions of the first video image and the second video image; and converging to generate a disparity map according to the position difference of the characteristic factors in the first video image and the second video image.

10. The apparatus according to claim 9, wherein the processing module is further configured to match feature factors in the first video image and the second video image by a SURF feature matching algorithm; determining that the characteristic factors are matched without errors according to the difference between the characteristic factors in the first video image and the second video image reaching a preset value; determining a homography matrix of the characteristic factors between the first video image and the second video image through a RANSAC algorithm; mapping the coordinates of the characteristic factors in the first video image to the second video image according to the homography matrix, and determining that the characteristic factors are matched without errors according to the matching of the characteristic factors in the mapped first video image and the characteristic factors in the second video image; and discarding part of the characteristic factors according to the condition that the distance between at least two groups of characteristic factors is smaller than a preset value, so that the minimum distance between at least two groups of characteristic factors is larger than the preset value.