CN117100393A - Method, system and device for video-assisted surgical target positioning - Google Patents

Method, system and device for video-assisted surgical target positioning Download PDF

Info

Publication number
CN117100393A
CN117100393A CN202311073455.1A CN202311073455A CN117100393A CN 117100393 A CN117100393 A CN 117100393A CN 202311073455 A CN202311073455 A CN 202311073455A CN 117100393 A CN117100393 A CN 117100393A
Authority
CN
China
Prior art keywords
video
instrument
real
patient
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311073455.1A
Other languages
Chinese (zh)
Inventor
刘继敏
王雯贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Singapore Liulian Technology Co ltd
Singapore Health Services Pte Ltd
Original Assignee
Singapore Liulian Technology Co ltd
Singapore Health Services Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Singapore Liulian Technology Co ltd, Singapore Health Services Pte Ltd filed Critical Singapore Liulian Technology Co ltd
Priority to CN202311073455.1A priority Critical patent/CN117100393A/en
Publication of CN117100393A publication Critical patent/CN117100393A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/10Computer-aided planning, simulation or modelling of surgical operations
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/10Computer-aided planning, simulation or modelling of surgical operations
    • A61B2034/101Computer-aided simulation of surgical operations
    • A61B2034/105Modelling of the patient, e.g. for ligaments or bones
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/10Computer-aided planning, simulation or modelling of surgical operations
    • A61B2034/107Visualisation of planned trajectories or target regions
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/10Computer-aided planning, simulation or modelling of surgical operations
    • A61B2034/108Computer aided selection or customisation of medical implants or cutting guides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • A61B2034/2046Tracking techniques
    • A61B2034/2065Tracking using image or pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Human Computer Interaction (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Robotics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computational Linguistics (AREA)
  • Image Processing (AREA)

Abstract

The method for video-assisted surgical target positioning provided by the invention comprises the following steps of: receiving basic data, wherein the basic data comprise a cavity mirror video, a three-dimensional preoperative model of a patient and a real-time camera video; segmenting an interested region of the endoscope video by adopting semantic segmentation to obtain a segmented target; tracking the gesture of the surgical instrument and the operator in the real-time camera video by adopting a gesture tracking method; initializing and setting the segmentation target after semantic segmentation, the three-dimensional preoperative model of the patient and the posture information of the operator; performing large physical deformation on target organs and/or tissues of the three-dimensional preoperative model of the patient; and (3) fusing and matching in real time in operation, so that the preoperative three-dimensional model fitting of the organ and/or tissue of the patient tracks the real-time endoscopic video in operation, thereby helping the operator understand the organ and the anatomical structure in the video. Systems, devices, and readable storage media for video-assisted surgical target localization are also provided.

Description

Method, system and device for video-assisted surgical target positioning
Technical Field
The present invention relates to the medical and computer arts, and more particularly to a method, system and apparatus for video-assisted surgical target localization.
Technical Field
The gold standard for early lung cancer treatment is surgical excision. In contemporary surgical practice, this is typically done by television assisted thoracoscopic surgery (video-assisted thoracic surgery, vat) because of its numerous advantages over open surgery. However, there are several challenges in VATS pneumonectomy: during VATS, small, deep or predominantly frosted intraparenchymal lesions are difficult to locate because these lesions are typically neither visible nor visible on the pleural surface. Up to 63% of patients receiving vat need to be chest opened instead for lesions < 10mm, or >5mm from the pleural surface. Thus, various invasive positioning techniques (such as preoperative CT guided hook wire positioning) are needed to assist the surgeon in accurately resecting these lesions through the vat.
Identifying and dissecting the intercostal pulmonary artery is a critical step in an anatomic pneumoectomy to avoid catastrophic bleeding during surgery, but for some patients, such as patients with incomplete lacerations or severe adhesions, the use of VATS can be challenging.
Performing a segmental lung resection requires precise knowledge of the branching pattern and anatomical variations of the bronchial vascular structure to the pulmonary segment, and VATS is also more difficult than open surgery due to its limited field of view.
3D modeling software (e.g., chemicals, hexa 3D) can be used for pre-operative planning to enable a surgeon to better understand the anatomy of each patient and create a personalized surgical plan for each patient. Electromagnetic Navigation Bronchoscopy (ENB), a hook wire under CT guidance can be used for the localization of the preoperative nodule. The operator uses 3D modeling software to identify the location of lung nodules during surgery, but their method uses visual markers and additional scales to correlate what they see on the 3D model during surgery, which adds unnecessary operations to the surgeon and may also pose unnecessary surgical risks.
The same problem also exists in laparoscopic surgery, and thus there is a need to develop a technique to solve the above-mentioned problems to facilitate clinical application of laparoscopic surgery.
Disclosure of Invention
In view of the above, the present invention provides a system and apparatus for video-assisted surgical target localization that addresses the following issues: the video auxiliary positioning target is realized on the basis of not changing the existing operation standard and operation flow as much as possible, the focus is quickly found, and the operation risk is reduced.
According to some embodiments of the present disclosure, a method for video-assisted surgical target localization is provided comprising the steps of: receiving basic data, wherein the basic data comprise a cavity mirror video, a three-dimensional preoperative model of a patient and a real-time camera video; dividing the interested region of the endoscope video by adopting a semantic dividing method to obtain a divided target, wherein the divided target comprises a surgical instrument and an anatomical structure of a target organ and/or tissue; tracking the gesture of the surgical instrument and the gesture of the operator in the real-time camera video by adopting a gesture tracking method to obtain surgical instrument position information and operator gesture information, wherein the surgical instrument position information is used for positioning the surgical instrument, and the operator gesture information is used for acquiring the gesture of the operator; initializing, namely initializing the semantic-segmented segmentation target, the three-dimensional preoperative patient model and the posture information of the operator, wherein the initializing is to place the information in the same space system; combining the position information of the surgical instrument and the initialized data, and carrying out large physical deformation on a target organ and/or tissue of a three-dimensional preoperative model of a patient so as to enable the three-dimensional preoperative model of the patient to more accurately fit an initial video of a patient endoscope; and (3) fusing and matching in real time in operation, so that the preoperative three-dimensional model fitting of the organ and/or tissue of the patient tracks the real-time endoscopic video in operation, thereby helping the operator understand the organ and the anatomical structure in the video.
According to some embodiments of the disclosure, the endoscopic video is a thoracoscopic video.
According to some embodiments of the disclosure, the laparoscopic video is a laparoscopic video.
According to some embodiments of the present disclosure, the endoscopic video may further comprise arthroscopic video or gastroscopic video, etc.
According to some embodiments of the present disclosure, the three-dimensional pre-operative model of the patient is a three-dimensional model comprising an anatomical structure obtained by three-dimensional reconstruction of a pre-operative CT, MRI or ultrasound image of the patient, and further the three-dimensional model is a digital three-dimensional model.
According to some embodiments of the present disclosure, the semantic segmentation method is a conventional deep learning method such as UNet method, comprising the steps of:
1) Preparing an initial training data set, wherein the initial training data set adopts a depth seed region growing method to divide an initial data set of a cavity mirror video, and the initial data set is denoted as D1;
2) Constructing a set of three-dimensional models: segmenting a target object (including surgery-related human organs, tissues, etc.) in a set of CT images for each patient, and constructing a three-dimensional model for said patient, all patient model sets being denoted as V1;
3) Generating a set of analog labeled 2D endoscopic images, the method for generating 2D endoscopic images comprising the steps of:
A. calculating statistical information, such as RGB color distribution, of each marker object according to the data set D1;
B. generating a set of 2D marker image datasets using a3D volume rendering method: that is, each model in V1 is rendered with a different endoscopic view, producing a series of 2D images. All the generated image data are denoted as D2;
C. filling each marked area in the related object statistical information in the data set D1 for each image in the D2, adding certain Gaussian noise, and then applying a random b spline function to deform the object in each marked image, wherein the obtained data set is expressed as D3;
4) Learning a segmentation model M1 by using the data set D3, wherein the learning method is a Unet method;
5) Semantic segmentation is performed using the segmentation model M1.
According to some embodiments of the present disclosure, the depth seed region growing method is to place some relevant seed points on the keyframes of the cavity mirror video, and automatically divide according to the seed points. Further, when the automatic segmentation is inaccurate, the segmentation can be optimized by adding seed points.
According to some embodiments of the present disclosure, the gesture tracking method is a stereoscopic directional vision tracking method based on deep learning artificial intelligence, and the following steps are used to build a depth network for automatically identifying each instrument and detecting its mark point from real-time camera video:
1) Creating a three-dimensional model of each instrument, wherein the three-dimensional model of each instrument can directly call a three-dimensional model file of each instrument, and a3D scanner can be used for scanning and creating the three-dimensional model file;
2) Extracting the centreline of each three-dimensional model using any available centreline extraction method;
3) Determining the intersection of the local curvature extremity points of the surface and the center line as a marker point for each instrument;
4) Generating an instrument training data set Di, the method for generating the instrument training data set Di comprising the steps of:
A. for each instrument, an electromagnetic tracker is attached to track its movement;
B. prior to insertion of the instrument into the human body, holding the instrument in various possible ways by the operator, capturing video and corresponding movements of the electromagnetic markers;
C. inserting the instrument model into a human body, simulating operations at different positions and under different views, and capturing video and corresponding electromagnetic mark movements;
D. the video and the electromagnetic mark form a data set which is marked as Di, each video Vid, instrument id and mark file Lid are formed, di= { id, lid, vid };
5) Constructing and training an instrument identification network yollostereo 3D using the instrument dataset Di;
6) For each instrument id, each instrument id tag identification network LMid is established using a sub-data set in Di with a given id, which identification network may be point cloud based or real-time camera image based.
According to some embodiments of the disclosure, the real-time camera image is a video converted image captured by a real-time camera.
According to some embodiments of the present disclosure, the initialization setting includes the steps of:
1) Establishing a coordinate system, wherein the coordinate system adopts the coordinate system of the endoscope camera as a reference coordinate system, and initializing the coordinate systems of the real-time camera and the three-dimensional model of the patient into the reference coordinate system;
2) Determining the insertion point and direction of each endoscopic instrument according to the three-dimensional anatomical model of the patient;
3) Inserting the endoscope camera into the planned rough position and orientation;
4) The azimuth and the angle of the simulated endoscope in the operation planning are adjusted, so that the simulated generated image fits the image output by the real endoscope to the maximum extent.
According to some embodiments of the present disclosure, the large physical deformation of the organ and/or tissue comprises the steps of:
1) Tracking the tip of the instrument by adopting the gesture tracking method;
2) Automatically segmenting organs and/or tissues of interest in the endoscopic video;
3) Inputting the position of the instrument tip and the segmented endoscopic video into a large deformation calculation model to calculate organ and/or tissue deformation;
4) The deformed three-dimensional anatomical model of the patient is superimposed into the initial video of the endoscope.
According to some embodiments of the present disclosure, the method of large physical deformation is a combination of Particle Finite Element Method (PFEM), artificial intelligence, and 3D surface interpolation for organ and/or tissue large deformation.
According to some embodiments of the present disclosure, the endoscope initial video is a video when the endoscope is stationary, the endoscope initial video including segmented identified and highlighted objects.
According to further embodiments of the present disclosure, the system for video-assisted surgical target localization includes:
an input module: the system is used for receiving data such as endoscope video, a three-dimensional preoperative model of a patient, real-time camera video and the like;
and a segmentation module: for segmenting a surgical instrument and an anatomical region of interest from a endoscopic video; (e.g., arterial, venous, bronchial, lobed, intersegmental boundary)
And a tracking module: for tracking surgical instruments, operator poses, patient poses, anatomical regions of interest from endoscopic video;
and an identification module: for identifying surgical instruments from real-time camera images and/or 3D point clouds;
an initialization module: the method is used for initializing the received related data;
and a deformation module: for calculating large physical deformations of organs and/or tissues of the three-dimensional preoperative model;
and a registration module: for overlaying a patient-specific three-dimensional preoperative model onto an intra-operative video image to achieve non-invasive localization of diseased tissue.
According to further embodiments of the present disclosure, the workflow of the system for video-assisted surgical target localization is: the input module is used for receiving data such as a cavity mirror video, a three-dimensional preoperative model of a patient, a real-time camera video and the like; the received endoscopic video data is transmitted to the segmentation module, which segments the surgical instrument and anatomical region of interest, such as arteries, veins, bronchi, lung lobes, inter-segment boundaries, from the endoscopic video; the surgical instruments outside the endoscope video are identified through the identification module, and the segmented surgical instruments in the endoscope video and the surgical instruments outside the endoscope video are combined and restored into complete surgical instruments, so that the unification of the endoscope information and the real-time camera information can be realized; then unifying the coordinate system of the preoperative three-dimensional model of the patient and the real-time camera video into a cavity mirror video system by an initialization module; after initialization is completed, performing large physical deformation on organs and/or tissues of the three-dimensional preoperative model by a deformation module so as to fit the organs and/or tissues to corresponding organs and/or tissues in the endoscopic video image; finally, a registration module superimposes the patient-specific three-dimensional preoperative model on the intra-operative video image to achieve noninvasive localization of the diseased tissue.
According to still further embodiments of the present disclosure, the apparatus for video-assisted surgical target localization includes:
and (3) a control system: the control system consists of a control computer with a screen and a system for video-assisted surgical target positioning;
frame capturer: the frame capturing device is used for capturing video output by the cavity mirror system and converting the video into an image;
real-time machine set: the real-time camera group consists of two or more real-time cameras, wherein one real-time camera is used for capturing gestures of a patient, and the gestures are used for sending out instructions for operating the three-dimensional preoperative model of the patient; another one or more real-time cameras are used to identify and track surgical instruments, more than one real-time camera may be used to avoid occlusion;
auxiliary system: the system for training the video assisted surgical target positioning uses a real-time camera to track pose and surgical instruments, consisting of an electromagnetic or optical tracker and associated electromagnetic sensors or markers.
According to still further embodiments of the present disclosure, the workflow of the device for video assisted surgical target localization is that of the device for video assisted surgical target localization is: the control system consists of a control computer with a screen and the system for video assisted surgery target positioning, wherein the screen is used for displaying data such as endoscope video (such as VATS video), a three-dimensional preoperative model of a patient, images captured by a frame grabber, video (or images) shot by a real-time camera group and the like, the system for video assisted surgery target positioning is used for processing the data displayed by the screen, and the processing method is executed according to the method for video assisted surgery target positioning; the frame capturing device is used for capturing the video output by the cavity mirror system, converting the video into an image and inputting the image into the control system; the real-time camera group consists of two or more real-time cameras, wherein one real-time camera is used for capturing gestures of a patient, and the gestures are used for sending out instructions for operating the three-dimensional preoperative model of the patient; another one or more real-time cameras are used to identify and track surgical instruments, more than one real-time camera may be used to avoid occlusion; video (or image) shot by the real-time camera group is input to the control system; the auxiliary system consists of an electromagnetic or optical tracker and an associated electromagnetic sensor or marker, the system for training the video auxiliary surgical target positioning uses a real-time camera to track the gesture and the surgical instrument, and the trained system for the video auxiliary surgical target positioning can automatically track the gesture and the surgical instrument of a user through the real-time camera.
According to further embodiments of the present disclosure, the computer readable storage medium for video assisted surgical target localization has stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method for video assisted surgical target localization.
The beneficial effects of the invention are as follows:
1) The method has the advantages that the existing operation flow is not interfered, the target positioning is realized, a surgeon can be assisted to quickly position the target, and the operation efficiency is improved;
2) The three-dimensional preoperative model is superimposed on the endoscope video to obtain a global view, so that preoperative planning and intraoperative positioning are matched, an operator can accurately find a target, the surgical risk is reduced, and the surgical success rate is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 illustrates a flow diagram of a method for video-assisted surgical target localization in accordance with some embodiments of the present disclosure.
Fig. 2 illustrates a schematic of a system for video-assisted surgical target localization in accordance with further embodiments of the present disclosure.
Fig. 3 illustrates a schematic of a device for video-assisted surgical target localization in accordance with further embodiments of the present disclosure.
Description of the embodiments
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The present disclosure provides a method for video-assisted surgical target localization, as exemplified by video-assisted thoracoscopic surgery (video-assisted thoracic surgery, vat), described in connection with fig. 1.
A flow diagram of a method for video-assisted surgical target localization in accordance with some embodiments as shown in fig. 1. The method comprises the steps of S101-S106:
s101, receiving basic data:
in some embodiments, the base data includes thoracoscopic video, a three-dimensional pre-operative model of the patient, real-time camera video;
in some embodiments, the base data includes laparoscopic video, a three-dimensional pre-operative model of the patient, real-time camera video;
in some embodiments, the base data includes arthroscopic video, a three-dimensional pre-operative model of the patient, real-time camera video;
in some embodiments, the base data includes bronchoscope video, a three-dimensional pre-operative model of the patient, real-time camera video.
S102, semantic segmentation:
in some embodiments, a semantic segmentation method is employed to segment a region of interest of a endoscopic video to obtain a segmented target, the segmented target comprising a surgical instrument and an anatomical structure of a target organ and/or tissue.
In some embodiments, the semantic segmentation method is a conventional deep learning method such as UNet method, comprising the steps of:
1) Preparing an initial training data set, wherein the initial training data set adopts a depth seed region growing method to divide an initial data set of a cavity mirror video, and the initial data set is denoted as D1;
2) Constructing a set of three-dimensional models: segmenting a target object (including surgery-related human organs, tissues, etc.) in a set of CT images for each patient, and constructing a three-dimensional model for said patient, all patient model sets being denoted as V1;
3) Generating a set of analog labeled 2D endoscopic images, the method for generating 2D endoscopic images comprising the steps of:
A. calculating statistical information, such as RGB color distribution, of each marker object according to the data set D1;
B. generating a set of 2D marker image datasets using a3D volume rendering method: that is, each model in V1 is rendered with a different endoscopic view, producing a series of 2D images. All the generated image data are denoted as D2;
C. filling each marked area in the related object statistical information in the data set D1 for each image in the D2, adding certain Gaussian noise, and then applying a random b spline function to deform the object in each marked image, wherein the obtained data set is expressed as D3;
4) Learning a segmentation model M1 by using the data set D3, wherein the learning method is a Unet method;
5) Semantic segmentation is performed using the segmentation model M1.
In some embodiments, the depth seed region growing method is to place some relevant seed points on the key frame of the cavity mirror video, and automatically divide according to the seed points. Further, when the automatic segmentation is inaccurate, the segmentation can be optimized by adding seed points.
In some embodiments, the gesture tracking method is a stereoscopic directional vision tracking method based on deep learning artificial intelligence, and the following steps are used to build a depth network for automatically identifying each instrument and detecting its mark point from real-time camera video:
1) Creating a three-dimensional model of each instrument, wherein the three-dimensional model of each instrument can directly call a three-dimensional model file of each instrument, and a3D scanner can be used for scanning and creating the three-dimensional model file;
2) Extracting the centreline of each three-dimensional model using any available centreline extraction method;
3) Determining the intersection of the local curvature extremity points of the surface and the center line as a marker point for each instrument;
4) Generating an instrument training data set Di, the method for generating the instrument training data set Di comprising the steps of:
A. for each instrument, an electromagnetic tracker is attached to track its movement;
B. prior to insertion of the instrument into the human body, holding the instrument in various possible ways by the operator, capturing video and corresponding movements of the electromagnetic markers;
C. inserting the instrument model into a human body, simulating operations at different positions and under different views, and capturing video and corresponding electromagnetic mark movements;
D. the video and the electromagnetic mark form a data set which is marked as Di, each video Vid, instrument id and mark file Lid are formed, di= { id, lid, vid };
5) Constructing and training an instrument identification network yollostereo 3D using the instrument dataset Di;
6) For each instrument id, each instrument id tag identification network LMid is established using a sub-data set in Di with a given id, which identification network may be point cloud based or real-time camera image based.
In some embodiments, the real-time camera image is a video converted image captured by a real-time camera.
S103, attitude tracking:
in some embodiments, a gesture tracking method is adopted to track the gesture of a surgical instrument and a gesture of a operator in a real-time camera video, so as to obtain surgical instrument position information and operator gesture information, wherein the surgical instrument position information is used for positioning the surgical instrument, and the operator gesture information is used for acquiring the gesture of the operator.
S104, initializing and setting:
in some embodiments, the initialization setting is to perform initialization setting on the semantically segmented segmentation target, the three-dimensional preoperative patient model and the operator pose information, and the initialization setting is to place the information in the same spatial system.
In some embodiments, the initialization settings include the steps of:
1) Establishing a coordinate system, wherein the coordinate system adopts the coordinate system of the endoscope camera as a reference coordinate system, and initializing the coordinate systems of the real-time camera and the three-dimensional model of the patient into the reference coordinate system;
2) Determining the insertion point and direction of each endoscopic instrument according to the three-dimensional anatomical model of the patient;
3) Inserting the endoscope camera into the planned rough position and orientation;
4) The azimuth and the angle of the simulated endoscope in the operation planning are adjusted, so that the simulated generated image fits the image output by the real endoscope to the maximum extent.
S105, large physical deformation:
in some embodiments, the target organ and/or tissue of the three-dimensional pre-operative model of the patient is subjected to large physical deformation in combination with the surgical instrument position information and the initialized data, so that the three-dimensional pre-operative model of the patient can be more accurately fitted with the initial video of the endoscope of the patient.
In some embodiments, the large physical deformation of the organ and/or tissue comprises the steps of:
1) Tracking the tip of the instrument by adopting the gesture tracking method;
2) Automatically segmenting organs and/or tissues of interest in the endoscopic video;
3) Inputting the position of the instrument tip and the segmented endoscopic video into a large deformation calculation model to calculate organ and/or tissue deformation;
4) The deformed three-dimensional anatomical model of the patient is superimposed into the initial video of the endoscope.
According to some embodiments of the present disclosure, the method of large physical deformation is a combination of Particle Finite Element Method (PFEM), artificial intelligence, and 3D surface interpolation for organ and/or tissue large deformation.
In some embodiments, the endoscope initial video is a video when the endoscope is stationary, the endoscope initial video including segmented identified and highlighted objects.
S106, fusion matching in real time in operation:
in some embodiments, intraoperative real-time fusion matching allows the preoperative three-dimensional model fitting of the patient organ and/or tissue to track intraoperative real-time endoscopic videos, thereby helping the operator understand the organs and their anatomy in the videos.
The present disclosure also provides a system for video-assisted surgical target localization, described in connection with fig. 2.
FIG. 2 is a block diagram of some embodiments of the system for video assisted surgical target positioning of the present disclosure, as shown in FIG. 2, including:
an input module: the system is used for receiving data such as endoscope video, a three-dimensional preoperative model of a patient, real-time camera video and the like;
and a segmentation module: for segmenting surgical instruments and anatomical regions of interest, such as arteries, veins, bronchi, lung lobules, intersegmental boundaries, from endoscopic video;
and a tracking module: the method is used for tracking surgical instruments, operator postures, patient postures and interested anatomical areas in the endoscope video;
and an identification module: the device is used for identifying surgical instruments outside the endoscope video according to the real-time camera video and/or the 3D point cloud;
an initialization module: the method is used for initializing the received related data;
and a deformation module: for calculating large physical deformations of organs and/or tissues of the three-dimensional preoperative model;
and a registration module: for overlaying a patient-specific three-dimensional preoperative model onto an intra-operative video image to achieve non-invasive localization of diseased tissue.
In other embodiments, the workflow of the system for video-assisted surgical target localization is: the input module is used for receiving data such as a cavity mirror video, a three-dimensional preoperative model of a patient, a real-time camera video and the like; the received endoscopic video data is transmitted to the segmentation module, which segments the surgical instrument and anatomical region of interest, such as arteries, veins, bronchi, lung lobes, inter-segment boundaries, from the endoscopic video; the surgical instruments outside the endoscope video are identified through the identification module, and the segmented surgical instruments in the endoscope video and the surgical instruments outside the endoscope video are combined and restored into complete surgical instruments, so that the unification of the endoscope information and the real-time camera information can be realized; then unifying the coordinate system of the preoperative three-dimensional model of the patient and the real-time camera video into a cavity mirror video system by an initialization module; after initialization is completed, performing large physical deformation on organs and/or tissues of the three-dimensional preoperative model by a deformation module so as to fit the organs and/or tissues to corresponding organs and/or tissues in the endoscopic video image; finally, a registration module superimposes the patient-specific three-dimensional preoperative model on the intra-operative video image to achieve noninvasive localization of the diseased tissue.
The present disclosure also provides an apparatus for video-assisted surgical target localization, described in connection with fig. 3.
FIG. 3 is a block diagram of some embodiments of the apparatus for video assisted surgical target positioning of the present disclosure, as shown in FIG. 3, including:
and (3) a control system: the control system consists of a control computer with a screen and the system for video-assisted surgical target positioning;
frame capturer: the frame capturing device is used for capturing video output by the cavity mirror system and converting the video into an image;
real-time machine set: the real-time camera group consists of two or more real-time cameras, wherein one real-time camera is used for capturing gestures of a patient, and the gestures are used for sending out instructions for operating the three-dimensional preoperative model of the patient; another one or more real-time cameras are used to identify and track surgical instruments, more than one real-time camera may be used to avoid occlusion;
auxiliary system: the system for training the video assisted surgical target positioning uses a real-time camera to track pose and surgical instruments, consisting of an electromagnetic or optical tracker and associated electromagnetic sensors or markers.
In still other embodiments, the workflow of the device for video assisted surgical target localization is: the control system consists of a control computer with a screen and the system for video assisted surgery target positioning, wherein the screen is used for displaying data such as endoscope video (such as VATS video), a three-dimensional preoperative model of a patient, images captured by a frame grabber, video (or images) shot by a real-time camera group and the like, the system for video assisted surgery target positioning is used for processing the data displayed by the screen, and the processing method is executed according to the method for video assisted surgery target positioning; the frame capturing device is used for capturing the video output by the cavity mirror system, converting the video into an image and inputting the image into the control system; the real-time camera group consists of two or more real-time cameras, wherein one real-time camera is used for capturing gestures of a patient, and the gestures are used for sending out instructions for operating the three-dimensional preoperative model of the patient; another one or more real-time cameras are used to identify and track surgical instruments, more than one real-time camera may be used to avoid occlusion; video (or image) shot by the real-time camera group is input to the control system; the auxiliary system consists of an electromagnetic or optical tracker and an associated electromagnetic sensor or marker, the system for training the video auxiliary surgical target positioning uses a real-time camera to track the gesture and the surgical instrument, and the trained system for the video auxiliary surgical target positioning can automatically track the gesture and the surgical instrument of a user through the real-time camera.
In still other embodiments, a computer readable storage medium has stored thereon a computer program which when executed by a processor implements the steps of the method for video assisted surgical target localization of any of the foregoing embodiments.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover any and all modifications, equivalents, improvements or alternatives falling within the spirit and principles of the present disclosure.

Claims (12)

1. A method for video-assisted surgical target localization, the method for video-assisted surgical target localization comprising:
receiving basic data, wherein the basic data comprise a cavity mirror video, a three-dimensional preoperative model of a patient and a real-time camera video;
dividing the interested region of the endoscope video by adopting a semantic dividing method to obtain a divided target, wherein the divided target comprises a surgical instrument and an anatomical structure of a target organ and/or tissue;
tracking the gesture of the surgical instrument and the gesture of the operator in the real-time camera video by adopting a gesture tracking method to obtain surgical instrument position information and operator gesture information, wherein the surgical instrument position information is used for positioning the surgical instrument, and the operator gesture information is used for acquiring the gesture of the operator;
initializing, namely initializing the semantic-segmented segmentation target, the three-dimensional preoperative patient model and the posture information of the operator, wherein the initializing is to place the information in the same space system;
combining the position information of the surgical instrument and the initialized data, and carrying out large physical deformation on a target organ and/or tissue of a three-dimensional preoperative model of a patient so as to enable the three-dimensional preoperative model of the patient to more accurately fit an initial video of a patient endoscope;
and (3) fusing and matching in real time in operation, so that the preoperative three-dimensional model fitting of the organ and/or tissue of the patient tracks the real-time endoscopic video in operation, thereby helping the operator understand the organ and the anatomical structure in the video.
2. The method for video-assisted surgical target positioning of claim 1, wherein the endoscopic video comprises a thoracoscopic video and a laparoscopic video.
3. The method for video-assisted surgical target localization of claim 1, wherein the patient three-dimensional preoperative model is a three-dimensional model containing anatomical structures obtained by three-dimensional reconstruction of a patient preoperative CT, MRI or ultrasound image.
4. A method for video assisted surgical target localization according to claim 1, wherein the semantic segmentation method is a conventional deep learning method such as UNet method, comprising the steps of:
preparing an initial training data set, wherein the initial training data set adopts a depth seed region growing method to divide an initial data set of a cavity mirror video, and the initial data set is denoted as D1;
constructing a set of three-dimensional models: segmenting a target object in a set of CT images for each patient, the target object comprising one or a combination of surgically-related human organs, tissues, and constructing a three-dimensional model for the patient, representing a three-dimensional model set of all patients as V1;
generating a set of analog labeled 2D endoscopic images, the method for generating 2D endoscopic images comprising the steps of:
calculating statistical information, such as RGB color distribution, of each marker object according to the data set D1;
generating a set of 2D marker image datasets using a3D volume rendering method: that is, each model in V1 is rendered with a different endoscopic view, producing a series of 2D images. All the generated image data are denoted as D2;
filling each marked area in the related object statistical information in the data set D1 for each image in the D2, adding certain Gaussian noise, and then applying a random b spline function to deform the object in each marked image, wherein the obtained data set is expressed as D3;
learning a segmentation model M1 by using the data set D3, wherein the learning method is a Unet method;
semantic segmentation is performed using the segmentation model M1.
5. The method for video assisted surgery target localization according to claim 4, wherein the depth seed region growing method is to place some relevant seed points on a keyframe of a endoscopic video, automatically segment according to the seed points, and if the segmentation is found to be inaccurate, add seed points to optimize the segmentation.
6. The method for video assisted surgical target localization of claim 1, wherein the pose tracking method is a deep learning artificial intelligence based stereotactic vision tracking method employing the steps of establishing a depth network for automatically identifying each instrument and detecting its marker points from real-time camera video:
1) Creating a three-dimensional model of each instrument, wherein the three-dimensional model of each instrument can directly call a three-dimensional model file of each instrument, and a3D scanner can be used for scanning and creating the three-dimensional model file;
2) Extracting the centreline of each three-dimensional model using any available centreline extraction method;
3) Determining the intersection of the local curvature extremity points of the surface and the center line as a marker point for each instrument;
4) Generating an instrument training data set Di, the method for generating the instrument training data set Di comprising the steps of:
A. for each instrument, an electromagnetic tracker is attached to track its movement;
B. prior to insertion of the instrument into the human body, holding the instrument in various possible ways by the operator, capturing video and corresponding movements of the electromagnetic markers;
C. inserting the instrument model into a human body, simulating operations at different positions and under different views, and capturing video and corresponding electromagnetic mark movements;
D. the video and the electromagnetic mark form a data set which is marked as Di, each video Vid, instrument id and mark file Lid are formed, di= { id, lid, vid };
5) Constructing and training an instrument identification network yollostereo 3D using the instrument training data set Di;
6) For each instrument id, establishing each instrument id tag identification network LM using a sub-dataset in Di with a given id id The identification network may be point cloud based or real-time camera image based.
7. The method for video-assisted surgical target localization of claim 1, wherein the initialization setup comprises the steps of:
establishing a coordinate system, wherein the coordinate system adopts the coordinate system of the endoscope camera as a reference coordinate system, and initializing the coordinate systems of the real-time camera and the three-dimensional model of the patient into the reference coordinate system;
determining the insertion point and direction of each endoscopic instrument according to the three-dimensional anatomical model of the patient;
inserting the endoscope camera into the planned rough position and orientation;
the azimuth and the angle of the simulated endoscope in the operation planning are adjusted, so that the simulated generated image fits the image output by the real endoscope to the maximum extent.
The method for video-assisted surgical target localization of claim 1, wherein the large physical deformation of the organ and/or tissue comprises the steps of:
tracking the tip of the instrument by adopting the gesture tracking method;
automatically segmenting organs and/or tissues of interest in the endoscopic video;
inputting the position of the instrument tip and the segmented endoscopic video into a large deformation calculation model to calculate organ and/or tissue deformation;
the deformed three-dimensional anatomical model of the patient is superimposed into the initial video of the endoscope.
8. The method for video-assisted surgical target localization of claim 6, wherein the method of large physical deformation is a combination of Particle Finite Element Method (PFEM), artificial intelligence, and 3D surface interpolation for organ and/or tissue large deformation.
9. The method for video-assisted surgical target localization of claim 6, wherein the endoscope-initiating video is a video when the endoscope is stationary, the endoscope-initiating video including segmented identified and highlighted objects.
10. A system for video-assisted surgical target localization, the system for video-assisted surgical target localization comprising:
an input module: the system is used for receiving data such as endoscope video, a three-dimensional preoperative model of a patient, real-time camera video and the like;
and a segmentation module: for segmenting a surgical instrument and an anatomical region of interest from a endoscopic video; (e.g., arterial, venous, bronchial, lobed, intersegmental boundary)
And a tracking module: the method is used for tracking surgical instruments, operator postures, patient postures and interested anatomical areas in the endoscope video;
and an identification module: the device is used for identifying surgical instruments outside the endoscope video according to the real-time camera image and/or the 3D point cloud;
an initialization module: the method is used for initializing the received related data;
and a deformation module: for calculating large physical deformations of organs and/or tissues of the three-dimensional preoperative model;
and a registration module: for overlaying a patient-specific three-dimensional preoperative model onto an intra-operative video image to achieve non-invasive localization of diseased tissue.
11. A device for video-assisted surgical target localization, the device for video-assisted surgical target localization comprising:
and (3) a control system: the control system consists of a control computer with a screen and the system for video-assisted surgical target positioning;
frame capturer: the frame capturing device is used for capturing video output by the cavity mirror system and converting the video into an image;
real-time machine set: the real-time camera group consists of two or more real-time cameras, wherein one real-time camera is used for capturing gestures of a patient, and the gestures are used for sending out instructions for operating the three-dimensional preoperative model of the patient; another one or more real-time cameras are used to identify and track surgical instruments, more than one real-time camera may be used to avoid occlusion;
auxiliary system: the system for training the video assisted surgical target positioning uses a real-time camera to track pose and surgical instruments, consisting of an electromagnetic or optical tracker and associated electromagnetic sensors or markers.
12. A computer readable storage medium for video-assisted surgical target localization, having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of the method of any of claims 1-10.
CN202311073455.1A 2023-08-23 2023-08-23 Method, system and device for video-assisted surgical target positioning Pending CN117100393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311073455.1A CN117100393A (en) 2023-08-23 2023-08-23 Method, system and device for video-assisted surgical target positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311073455.1A CN117100393A (en) 2023-08-23 2023-08-23 Method, system and device for video-assisted surgical target positioning

Publications (1)

Publication Number Publication Date
CN117100393A true CN117100393A (en) 2023-11-24

Family

ID=88807052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311073455.1A Pending CN117100393A (en) 2023-08-23 2023-08-23 Method, system and device for video-assisted surgical target positioning

Country Status (1)

Country Link
CN (1) CN117100393A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118079256A (en) * 2024-04-26 2024-05-28 四川省肿瘤医院 Automatic tracking method for tumor target area of magnetic resonance guided radiation therapy

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118079256A (en) * 2024-04-26 2024-05-28 四川省肿瘤医院 Automatic tracking method for tumor target area of magnetic resonance guided radiation therapy

Similar Documents

Publication Publication Date Title
Chen et al. SLAM-based dense surface reconstruction in monocular minimally invasive surgery and its application to augmented reality
US11883118B2 (en) Using augmented reality in surgical navigation
Collins et al. Augmented reality guided laparoscopic surgery of the uterus
US9646423B1 (en) Systems and methods for providing augmented reality in minimally invasive surgery
CN110010249B (en) Augmented reality operation navigation method and system based on video superposition and electronic equipment
US20180158201A1 (en) Apparatus and method for registering pre-operative image data with intra-operative laparoscopic ultrasound images
CN108369736B (en) Method and apparatus for calculating the volume of resected tissue from an intra-operative image stream
Puerto-Souza et al. Toward long-term and accurate augmented-reality for monocular endoscopic videos
Zhou et al. Real-time dense reconstruction of tissue surface from stereo optical video
JP2016511049A (en) Re-identifying anatomical locations using dual data synchronization
KR20210051141A (en) Method, apparatus and computer program for providing augmented reality based medical information of patient
Nosrati et al. Simultaneous multi-structure segmentation and 3D nonrigid pose estimation in image-guided robotic surgery
Wang et al. 3-D tracking for augmented reality using combined region and dense cues in endoscopic surgery
Kumar et al. Stereoscopic visualization of laparoscope image using depth information from 3D model
CN117100393A (en) Method, system and device for video-assisted surgical target positioning
EP3110335B1 (en) Zone visualization for ultrasound-guided procedures
Marques et al. Framework for augmented reality in Minimally Invasive laparoscopic surgery
Penza et al. Enhanced vision to improve safety in robotic surgery
Singh et al. A novel enhanced hybrid recursive algorithm: image processing based augmented reality for gallbladder and uterus visualisation
CN114334096A (en) Intraoperative auxiliary display method and device based on medical image and storage medium
Dagon et al. A framework for intraoperative update of 3D deformable models in liver surgery
US11657547B2 (en) Endoscopic surgery support apparatus, endoscopic surgery support method, and endoscopic surgery support system
Chandelon et al. Kidney tracking for live augmented reality in stereoscopic mini-invasive partial nephrectomy
Serna-Morales et al. Acquisition of three-dimensional information of brain structures using endoneurosonography
Drechsler et al. Simulation of portal vein clamping and the impact of safety margins for liver resection planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination