CN115222878A

CN115222878A - Scene reconstruction method applied to lung bronchoscope surgical robot

Info

Publication number: CN115222878A
Application number: CN202210691216.1A
Authority: CN
Inventors: 王越; 陆豪健; 熊蓉; 李雲霜; 张敬禹
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-10-21

Abstract

The invention discloses a scene reconstruction method applied to a lung bronchoscope surgical robot, and belongs to the field of lung bronchoscope surgical robots. Acquiring an endoscopic image sequence of a lung bronchus, and filtering; extracting feature points of the filtered image sequence and matching adjacent frame feature points; taking the visual angle of the first frame image as an initialization pose, and calculating the pose of the second frame image according to the feature points and the matching relation of the first two frames of images; estimating the depth of the matched characteristic points by a triangulation method according to the characteristic points, the matching relation and the pose to obtain initial point cloud; obtaining the pose of the t frame image according to the position of the characteristic point of the t frame image in the point cloud, wherein t is more than or equal to 3; estimating the depth of the matched characteristic points by a triangulation method according to the characteristic points, the matching relation and the pose of the image of the t frame and the image of the previous frame, and updating the point cloud; and finally obtaining a three-dimensional reconstruction map of the local lung bronchus. The method does not need additional sensor information assistance, and has high reconstruction accuracy and good robustness.

Description

Scene reconstruction method applied to lung bronchoscope surgical robot

Technical Field

The invention relates to the field of lung bronchoscope surgical robots, relates to realization of functions of a sensing part of a robot, and particularly relates to a scene reconstruction method applied to the lung bronchoscope surgical robot.

Background

Conventional bronchoscope navigation techniques are mainly classified into the following categories: radial EBUS (radio end nuclear ultrasound), EMN (electronic navigation) and VBN (virtual branched navigation), each have advantages and disadvantages.

Radical EBUS: the radial EBUS probe can provide an ultrasound image of the tissue surrounding the tip, and the system can determine the location of the surrounding lung lesion by comparing EBUS images of normal lung tissue and cancerous tissue. However, EBUS-GS is not capable of biopsy under visual control and the EBUS probe must be removed before the biopsy tool is introduced from the GS into the working channel, which also adds some difficulty to the biopsy.

EMN: superDimension (Meindonli) is an EMN system, and can be used in combination with an EBUS probe to improve the diagnosis rate of lung cancer. However, the cost of the EMN's navigation system itself limits the development of this technology and the frequency of use in daily bronchoscopy.

Image-based bronchoscope navigation method (VBN): the three-dimensional reconstruction of the thoracic cavity and the lung nodules can be completed, before bronchoscope biopsy is carried out, the position of a lesion around the lung can be marked, meanwhile, the moving track of the bronchoscope can be calculated in advance, and the moving track is displayed on an image in real time when the biopsy is carried out. Although the reconstructed information is easily available and the associated underlying method is simple, the application of VBN is not widespread. This is because the accuracy of the CT image reconstruction still needs to be improved, and the slice thickness of the CT image should be less than 1mm. Thicker CT slice thickness, respiratory airway deformation, and secretions in the lumen all shorten the length of the bronchial tree visual reconstruction, leading to loss of lung peripheral information and affecting the diagnosis rate. On the other hand, the three-dimensional reconstruction of real-time biopsy probe images also faces many difficulties. The size of the lung bronchus is limited, a binocular depth camera is difficult to use under normal conditions, and environmental depth information is difficult to obtain; monocular cameras face the difficulty of having fewer and more unique lung bronchial features. Therefore, the research of the three-dimensional reconstruction problem around the extraction and matching of the lung bronchus scene feature points is valuable.

In the lung bronchus scenario, visual information is very poor, with few and single features. Under a first-level bronchus, a long lumen usually exists, and with the rotation of an endoscope at the tail end of a bronchoscope, the obtained image is only a monochromatic tube wall and has no obvious characteristics, even the image with the lumen oval outline has only one outline characteristic, the number of the outline characteristics is small and single, and the extraction of the characteristics is very difficult.

Under the common condition, a binocular camera can be used in the three-dimensional reconstruction method, the environmental depth can be directly obtained, and the three-dimensional reconstruction can be carried out after the pose is obtained. However, the lung bronchoscope robot has a narrow working scene, and the excessive sensor volume can increase the harm to the human body, so that the binocular camera is not allowed to be used. Depth estimation can only be performed through monocular images, namely RGB images, and certain difficulty exists in guaranteeing accuracy and calculation efficiency.

Currently, some scholars actively explore a monocular depth estimation method based on deep learning applied to medical scenes, and can obtain depth estimation of a single RGB (red, green and blue) image through a concept of style migration. However, this method does not have a strong mathematical basis, similar to the black box method. At present, an open-source verified learning-based monocular depth estimation method for lung bronchus does not exist, and the learning-based method can be counted as an innovative attempt, but the application value is not high.

Disclosure of Invention

Aiming at the problem that the lung bronchus three-dimensional reconstruction technology of the bronchoscope surgical robot in the prior art is difficult, the invention provides a scene reconstruction method applied to the bronchoscope surgical robot, provides a complete three-dimensional reconstruction solution aiming at the scene, and is closely related to the positioning and navigation functions of a bronchoscope.

The invention adopts the following technical scheme:

a scene reconstruction method applied to a lung bronchoscope surgical robot comprises the following steps:

step 1, acquiring an endoscope image sequence of a lung bronchus, filtering the endoscope image, and enhancing the lumen characteristics of the bronchus on the endoscope image to obtain an enhanced image sequence;

step 2, extracting characteristic points of the enhanced image sequence and matching the characteristic points of adjacent frame images to obtain a matching relation of the adjacent frame images;

step 3, taking the visual angle of the endoscope image in the first frame as an initialization pose, and calculating the pose of the endoscope image in the second frame relative to the endoscope image in the first frame according to the feature points and the matching relation of the endoscope images in the first two frames; estimating the depth of the matched characteristic points by a triangulation method according to the characteristic points, the matching relation and the pose to obtain an initial point cloud;

step 4, obtaining the pose of the endoscope image in the tth frame according to the position of the characteristic points of the endoscope image in the point cloud, wherein t is more than or equal to 3; estimating the depth of matched characteristic points by a triangulation method according to the characteristic points, the matching relation and the pose of the endoscope images in the t frame and the t-1 frame, and updating point cloud;

and 5, obtaining a three-dimensional reconstruction map of the local lung bronchus according to the final point cloud.

Further, in the step 1, a U-Net network is adopted to filter the endoscope image, and the U-Net network comprises a down-sampling part and an up-sampling part;

the down-sampling specifically comprises the following steps: performing maximum pooling operation dimensionality reduction with the step length of 2 after every two times of 3 × 3 convolution operation, wherein the maximum pooling operation dimensionality reduction is used as one-time down-sampling, and a ReLU activation function is adopted in each down-sampling process;

the upsampling specifically comprises: doubling the number of channels through 2 x 2 convolution operation, splicing with a feature map generated in a corresponding down-sampling process, performing 3 x 3 convolution operation twice as one up-sampling, wherein a ReLU activation function is adopted in each up-sampling process;

and (3) passing the output of the last upsampling through a layer of 1 multiplied by 1 convolution layer to obtain a filtered endoscope image, wherein the filtered endoscope image is used for enhancing the outline and the internal characteristics of the tracheal lumen on the endoscope image.

Further, the feature extraction and matching algorithm adopted in the step 2 is a HAPCG algorithm.

Further, the step 3 specifically includes:

step 3.1, solving 2D-2D pose by eight-point pairing method

For a pair of matching points of the previous two frames of endoscopic images, defined on the normalized coordinate plane, there is X = [ u, v,1 =] ^T And X ₁ ＝[u ₁ ,v ₁ ,1] ^T Two pixel points:

according to the epipolar geometric constraint, there are:

wherein,

represents a 3 × 3 constraint matrix, u ₁ 、v ₁ Is a pixel point X ₁ U, v are coordinates of pixel point X;

written in linear form:

[u ₁ u,u ₁ v,u ₁ ,v ₁ u,v ₁ v,v ₁ ,u,v,1]·e＝E ₁ ·e＝0

wherein E represents a 9 × 1 constraint matrix, E ₁ Is a local essential matrix;

similarly, selecting the other seven pairs of matching points, and representing according to the mode; integrating the expression results of eight pairs of matching points into a linear equation, and resolving to obtain an essential matrix

Performing singular value decomposition on the intrinsic matrix E:

E＝U∑V ^T

calculating a solution:

wherein U and V represent orthogonal matrixes, and Σ represents a diagonal matrix; r ₁ 、R ₂ Is a rotation matrix, t ₁ 、t ₂ Is a translation matrix;

representing rotation along the z-axis

The rotation matrix of (a) is,

representing rotation along the z-axis

The rotation matrix of (a);

from the above four solutions (R) ₁ ,t ₁ )、(R ₁ ,t ₂ )、(R ₂ ,t ₁ )、(R ₂ ,t ₂ ) Screening out a unique solution which accords with the real situation as a pose estimation result (R, t);

step 3.2, triangularization measurement method estimates depth

Obtaining pixel points X and X according to the pose estimation result (R and t) ₁ The relationship of (1):

s ₁ X ₁ ＝sRX+t

converting the above formula into:

s ₁ X ₁ ^X ₁ ＝0＝sX ₁ ^RX+X ₁ ^t

wherein s is ₁ Is a pixel point X ₁ S is the depth information of pixel point X; considering the equation as an equation for depth, s can be solved directly ₁ And s;

and 3.3, repeating the step 3.1 and the step 3.2, traversing all the matching points, and obtaining the initial point cloud according to the matched feature point depth.

Further, in step 4, the pose of the endoscope image in the tth frame is calculated by adopting a pnp algorithm, wherein t is more than or equal to 3.

Further, in step 4, before updating the point cloud each time, the estimated point cloud is projected onto different imaging planes respectively, the euclidean distance is calculated with the feature points originally corresponding to the imaging planes, and if the distance is greater than the threshold, the point is considered to be poor in estimation and is removed from the point cloud.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a feature point extraction and matching method for a bronchopulmonary endoscope image, namely a feature point extraction and matching algorithm based on HAPCG assisted with a U-Net filtering method, wherein the HAPCG strengthens invariance of the image to illumination and contrast in texture, and the U-Net filtering strengthens lumen content features of the bronchopulmonary endoscope image in content; compared with the traditional ORB and SIFT algorithms, the method is more suitable for the scene of the lung bronchoscope, and provides a good basis for the three-dimensional reconstruction of the lung bronchus.

2. According to the lung bronchoscope robot three-dimensional reconstruction method, the sequence image is given, the scene three-dimensional model can be directly obtained from the two-dimensional image, additional sensor information assistance is not needed, reconstruction accuracy is high, and robustness is good.

3. The invention directly adopts the RGB image to carry out pose estimation, realizes three-dimensional reconstruction, does not need expensive sensors such as NDI and the like, and saves the cost.

Drawings

Fig. 1 is a schematic flowchart of a scene reconstruction method applied to a bronchopulmonary surgery robot according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a three-dimensional scene reconstruction effect according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples. The figures are only schematic illustrations of the invention, some of the block diagrams shown in the figures are functional entities, which do not necessarily have to correspond to physically or logically separate entities, which may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of this application do not denote a limitation of quantity, either in the singular or the plural. The terms "comprises," "comprising," "has," "having" and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; reference in this application to "connected," "coupled," and the like is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Essentially, the problem of the lung bronchoscope three-dimensional reconstruction can also be regarded as the problem of SFM (structural from Motion), the invention obtains sequence images of the terminal Motion of the bronchoscope robot according to the monocular endoscope, designs a special method to carry out feature extraction and feature matching of adjacent images on the images, solves the relative pose, realizes depth estimation through a triangulation method and completes three-dimensional scene reconstruction.

As shown in fig. 1, the scene reconstruction method applied to the bronchopulmonary surgery robot of the invention mainly includes two parts, namely, (i) feature point extraction and matching, (ii) pose estimation and scene reconstruction.

In the present embodiment, the feature point extraction and matching includes:

(1.1) position scale orientation invariant feature transforms (HAPCG).

The application of the algorithm based on the HAPCG feature descriptor is mainly expected to enhance the feature points aiming at the characteristics of the texture of the endoscope image. The HAPCG algorithm is originally designed for extracting and matching feature points among heterogeneous remote sensing images, the scene has the problems of obvious image illumination difference, large contrast difference, nonlinear radiation distortion and the like, and the HAPCG algorithm can well solve the problems. Similarly, analysis of data acquired by bronchoscopy can find that bronchoscope images are affected by light provided by the endoscope, contrast and illumination difference between different images are large, and a feature extraction algorithm with illumination and contrast invariance is also needed. Therefore, the HAPCG algorithm is applicable to the scene of bronchoscopy endoscope.

When the HAPCG algorithm is executed, the adjacent two frames of images are subjected to non-linear diffusion through anisotropic filtering to obtain the maximum and minimum matrixes of the phase consistency direction, an anisotropic weighted moment equation is established according to the maximum and minimum matrixes, and an anisotropic weighted moment diagram is obtained through calculation. Then, an absolute phase consistency direction gradient formula is obtained by expanding the absolute phase consistency model, and then an absolute phase direction gradient formula Histogram (HAPCG) is defined according to a template of log-polar coordinates. And finally, identifying the characteristic homonymy points by taking the Euclidean example method as matching measure, and removing error matching by adopting a quick sample consensus mode so as to realize detection and registration of the characteristic points.

Compared with the traditional Scale Invariant Feature Transform (SIFT), position scale orientation invariant feature transform (PSO-SIFT) and Log-Gabor histogram descriptor (LGHD), the algorithm has better effect. In the HAPCG algorithm, anisotropic filtering belongs to nonlinear filtering, and edge information of an image is well preserved, so that subsequent characteristic points are extracted more favorably, and the defect of insufficient characteristic points of the image is overcome; the method adopts a phase consistency method to extract the characteristics in the frequency domain, can well take the boundary and the angular point of the image from the position of the superposition maximum phase of the Fourier harmonic component of the image, and can not be influenced by the amplitude of the amplitude spectrum of the image, thereby achieving good invariance to the illumination and the contrast of the image. The HAPCG algorithm has good feature extraction and matching effects in a bronchopulmonary scope scene, and provides a solid foundation for three-dimensional scene reconstruction work.

(1.2) content segmentation based on U-Net network;

bronchoscopes provide a single image, typically consisting of only the circular or elliptical contours of the vessel wall and trachea. However, although the tube wall occupies most of the area in the single-frame image, the feature points to be extracted and matched are often concentrated only on the lumen contour and the interior of the trachea. Therefore, the invention focuses the attention of the whole image on the lumen contour and the interior of the trachea by adding a filter, namely, enhances the main content of the image, and can also be regarded as adding a filter operation to the image.

In the embodiment, the U-Net is selected as a filtering method, which is a network architecture based on a convolutional neural network, is usually used for a task of medical image segmentation, and is suitable for a scene of a bronchopulmonary scope. Data sets in the field of medical images are difficult to obtain and are usually small in data volume. Thus, this network advantageously utilizes data enhancement methods to achieve satisfactory results using only a limited amount of tagged data. The U-Net network includes a series of down-sampling steps to extract image content and a symmetric up-sampling process to locate the content information well on the size of the original image. In the experimental process, the network can greatly reduce the workload of manual marking, and simultaneously accurately extract the content of the image, namely the position of the pipe wall, so that good input is provided for subsequent work.

The U-Net network structure adopted by the embodiment comprises two parts of down sampling and up sampling;

the down-sampling specifically comprises: performing maximum pooling operation dimensionality reduction with the step length of 2 after every two times of 3 × 3 convolution operation, wherein the maximum pooling operation dimensionality reduction is used as one-time down-sampling, and a ReLU activation function is adopted in each down-sampling process;

In this embodiment, pose estimation and scene reconstruction include:

and (2.1) solving the 2D-2D pose by an eight-point pairing method.

Through the previous steps, some matched feature points can be obtained, and the relative motion of the bronchopulmonary endoscope adjacent frame images can be calculated according to the results. Taking a pair of matching points as an example, assuming the match is error-free, then the pair of matching points x and x ₁ Is the same spatial point p at c ₀ And c ₁ Projections on two different imaging planes being the optical centre, i.e.

And

intersect at p. For convenience of description, some terms are defined to describe the collective relationship between them, as shown in table 1 below:

TABLE 1 description of the geometrical relationships involved in antipodal geometrical constraints

Adding epipolar geometric constraints, the formula is as follows:

wherein K is a camera internal reference matrix, t is a translation matrix, and R is a rotation matrix;

its geometric meaning is p, c ₀ ,c ₁ The three are coplanar. The middle part is respectively recorded as two matrices: the basic matrix F and the essential matrix E, E and F only differ by a camera internal reference matrix K, and epipolar geometric constraint is further simplified:

E＝t^R,F＝K ^-T EK ^-1 ,

then, the camera pose estimation is divided into the following two steps:

in the first step, E or F is found from the pixel position of the alignment point.

And secondly, resolving R and t according to E or F.

E = t ^ R has six degrees of freedom because the translation matrix and the rotation matrix are analyzed to have three degrees of freedom each according to the spatial relationship. In consideration of the scale uncertainty in the actual case, excluding one degree of freedom also preserves five degrees of freedom. This means that a minimum of five pairs of points can uniquely determine the matrix E. However, it is worth noting that since the intrinsic property of E is a non-linear property, it is relatively difficult to estimate, and 8 point pairs are more used to solve E in practical research and application.

Considering a pair of matching points, defined on a normalized coordinate plane, there is X = [ u, v,1 =] ^T And X ₁ ＝[u ₁ ,v ₁ ,1] ^T Two pixel points:

according to the epipolar geometric constraint, there are

Wherein,

written in linear form with respect to e

[u ₁ u,u ₁ v,u ₁ ,v ₁ u,v ₁ v,v ₁ ,u,v,1]·e＝E ₁ ·e＝0

Wherein E represents a 9 × 1 constraint matrix, E ₁ Is a local essential matrix.

Likewise, the remaining seven pairs of points may also be represented as described above. All eight points are integrated into a linear equation, and the essential matrix can be obtained by calculation

Then, the E needs to be subjected to SVD decomposition to obtain a non-unique pose estimation solution:

E＝U∑V ^T

wherein U and V represent orthogonal matrixes, and sigma represents a diagonal matrix; r ₁ 、R ₂ Is a rotation matrix, t ₁ 、t ₂ Is a translation matrix;

representing rotation along the z-axis

The rotation matrix of (a) is,

representing rotation along the z-axis

The rotation matrix of (a); in these four groups of solutions (R) ₁ ,t ₁ )、(R ₁ ,t ₂ )、(R ₂ ,t ₁ )、(R ₂ ,t ₂ ) Only one solution is constrained, i.e. the depth of the point is positive in the camera coordinate system. And substituting all solutions into the original formula, and judging the depth of the point to obtain a correct group of solutions.

(2.2) triangularization measurement method for estimating depth

The three-dimensional space depth of the pixel point cannot be directly obtained by the two-dimensional image of the monocular camera. For such problems, a triangulation method is usually used to solve the depth of the pixel point.

In accordance with the epipolar geometry constraints described above,

and

intersect at point p. But in general, noise exists in feature point extraction and matching, so that the situation is difficult to occur, and two lines are difficult to compare in a three-dimensional space. To solve this problem, the approximation can be generally performed by the least squares method.

Suppose X, X ₁ Is normalized coordinates of two feature points, satisfies

s ₁ X ₁ ＝sRX+t

Knowing R, t, it is desirable to solve for the depth s of the two feature points ₁ And s. Viewed geometrically, in

Searching three-dimensional coordinate points to enable the projection position to be close to x ₁ . For example, when solving for s, the two sides of the above equation are left-multiplied by X ₁ ^ dTo:

s ₁ X ₁ ^X ₁ ＝0＝sX ₁ ^RX+X ₁ ^t

the equation can be viewed as an equation for depth with one side being zero, one of which can be solved directly.

(2.3) PnP algorithm for solving 3D-2D pose

In the initialization step, an initial 3D point cloud image is obtained, and then a continuous 2D image needs to be matched to the initial point cloud image, and PnP is a method for solving the problem. Compared with an eight-point method used in 2D-2D pose estimation, the PnP can be solved by only three point pairs when the estimation of the 3D-2D pose is solved. Since the endoscope is a monocular camera, the 3D positions of the feature points at this time are obtained by the triangulation method.

Therefore, the characteristic points of each frame of image and the matching relation of the characteristic points and the adjacent frame characteristic points can be obtained. The initial pose of the first two frames is solved by using an eight-point method, and the depth of the characteristic point is estimated by using a triangulation method. Each subsequent frame can solve the pose of the frame by using the three-dimensional points and the corresponding 2D projection points on the frame. Therefore, a pose sequence can be obtained, and multi-frame feature points can be subjected to three-dimensional reconstruction according to the sequence.

In the process of reconstruction, the idea of Bundle Adjustment (BA) is utilized to optimize the three-dimensional structure. The BA is to extract the optimal 3D model and camera parameters from the visual pattern. If the calculated result and the related parameters are adjusted, the light beam where the feature point is located is converged to the optical center of the camera, which is called BA. In this embodiment, the specific method is to project the estimated three-dimensional feature points onto different imaging planes, and calculate the euclidean distance with the feature points originally corresponding to the imaging planes. If the distance is greater than the threshold value, the point is considered to be poor in estimation, and the point is removed from the three-dimensional point cloud.

(2.4) scene reconstruction

In this embodiment, an MATLAB is used for scene reconstruction, and the steps are as follows:

inputting a sequence of endoscopically acquired images { I _t },t＝1,2,…n；

Calculating a feature descriptor { D ] for each image _t },t＝1,2,…n；

Calculating matching relation of adjacent frame images { M _t },t＝1,2,…n-1；

Calculating I ₂ Relative to I ₁ And the feature point depth is estimated by triangulation to obtain an initial point cloud p ₁ ；

By means of M _t Calculating I _t+1 To p _t-1 Estimating the three-dimensional coordinates p of the feature points in the map _t+1 (t＝2,..n)；

For all p _t And (t =1,2,. N-1) performing BA optimization to obtain a final three-dimensional reconstruction image.

In this embodiment, fig. 2 shows the reconstruction effect on 7 endoscopic images, and the reconstruction error is small, the accuracy is high, and the robustness is good.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A scene reconstruction method applied to a lung bronchoscope surgery robot is characterized by comprising the following steps:

step 2, extracting characteristic points of the enhanced image sequence and matching the characteristic points of the adjacent frame images to obtain the matching relation of the adjacent frame images;

step 4, obtaining the pose of the endoscope image in the tth frame according to the position of the characteristic point of the endoscope image in the point cloud in the tth frame, wherein t is more than or equal to 3; estimating the depth of matched characteristic points by a triangularization measuring method according to the characteristic points of the endoscope images in the t frame and the t-1 frame as well as the matching relation and the pose, and updating the point cloud;

2. The method for reconstructing a scene applied to a bronchopulmonary surgery robot according to claim 1, wherein the endoscope image is filtered by a U-Net network in step 1, and the U-Net network comprises two parts of down-sampling and up-sampling;

the upsampling specifically comprises: doubling the number of channels through 2 × 2 convolution operation, splicing with a feature map generated in a corresponding down-sampling process, performing 3 × 3 convolution operation twice as up-sampling for one time, wherein a ReLU activation function is adopted in each up-sampling process;

3. The scene reconstruction method applied to the bronchopulmonary surgery robot as claimed in claim 1, wherein the feature extraction and matching algorithm adopted in step 2 is a HAPCG algorithm.

4. The scene reconstruction method applied to the bronchopulmonary surgery robot according to claim 1, wherein the step 3 is specifically:

step 3.1, solving 2D-2D pose by eight-point pairing method

For a pair of matching points of the previous two frames of endoscopic images, defined on the normalized coordinate plane, there is X = [ u, v,1 =] ^T And X ₁ ＝[u ₁ ，v ₁ ，1] ^T Two pixel points:

according to the epipolar geometric constraint, there are:

wherein,

represents a 3 × 3 constraint matrix, u ₁ 、v ₁ Is a pixel point X ₁ U, v are the coordinates of pixel point X;

written in linear form:

[u ₁ u，u ₁ v，u ₁ ，v ₁ u，v ₁ v，v ₁ ，u，v，1]·e＝E ₁ ·e＝0

Performing singular value decomposition on the intrinsic matrix E:

E＝U∑V ^T

calculating a solution:

wherein, U and V represent orthogonal matrixes, and sigma represents a diagonal matrix; r ₁ 、R ₂ Is a rotation matrix, t ₁ 、T ₂ Is a translation matrix;

representing rotation along the z-axis

The rotation matrix of (a) is,

representing rotation along the z-axis

The rotation matrix of (a);

from the above four solutions (R) ₁ ，T ₁ )、(R ₁ ，T ₂ )、(R ₂ ，T ₁ )、(R ₂ ，T ₂ ) Screening out a unique solution which accords with the real situation as a pose estimation result (R, T);

step 3.2, triangularization measurement method estimates depth

Obtaining pixel points X and X according to pose estimation results (R and T) ₁ The relationship of (c):

s ₁ X ₁ ＝sRX+T

converting the above formula into:

s ₁ X ₁ ^X ₁ ＝0＝sX ₁ ^RX+X ₁ ^T

5. The scene reconstruction method applied to the lung bronchoscope operation robot as claimed in claim 1, wherein in step 4, the pose of the endoscopic image in the tth frame is calculated by pnp algorithm, and t is greater than or equal to 3.

6. The method of claim 1, wherein in step 4, before updating the point cloud, the estimated point cloud is projected onto different imaging planes respectively, and the euclidean distance between the estimated point cloud and the feature point originally corresponding to the estimated point cloud and the feature point on the imaging plane is calculated.