CN117635833A

CN117635833A - Image processing method, device, equipment and storage medium

Info

Publication number: CN117635833A
Application number: CN202311619509.XA
Authority: CN
Inventors: 蔡成作; 林鸿
Original assignee: Langxin Medical Technology Wuxi Co ltd
Current assignee: Langxin Medical Technology Wuxi Co ltd
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-03-01

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, which are applied to a controller of an endoscope, and are used for acquiring a two-dimensional image set of a region to be detected acquired by a camera and state parameters acquired by a sensor in real time; based on the state parameters and pose transformation information extracted from the two-dimensional image set, performing splicing processing on each frame of two-dimensional image to obtain a three-dimensional model corresponding to the region to be detected; and determining the two-dimensional position information of the focus by identifying the two-dimensional image, and marking the position of the focus in the three-dimensional model based on the mapping relation between the two-dimensional coordinate information and the three-dimensional model for display. The three-dimensional model constructed by the method can accurately display the specific position of the focus in the region to be detected, and operators can observe all the regions shot by the camera through the three-dimensional model as far as possible, so that the accuracy of marking the focus position in the region to be detected is improved.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing, and more particularly, to an image processing method, apparatus, device, and storage medium.

Background

The endoscope is a detection instrument, including gastroscope, laryngoscope, enteroscope, etc., can enter the body through oral cavity or other natural pore canal, and can observe whether the pathological changes appear in the patient body through the picture that the camera of the endoscope gathers and returns in real time, which plays an important role in clinical condition detection or disease treatment.

However, in the process of checking, if a specific area or a focus of a certain area which is determined previously is to be checked, an operator is required to move the camera of the endoscope, and the images acquired and returned by the camera are checked again. And because of different operation methods, the positions and the sizes of the focuses in the pictures transmitted each time are different for the same focus. The mode of collecting the return image in real time ensures that a doctor cannot accurately position the focus of the area to be detected in the process of focus examination by utilizing an endoscope.

Disclosure of Invention

In view of this, the present application provides an image processing method, apparatus, device, and storage medium, which are used for solving the problem that the focus position of the area to be detected cannot be accurately located in the existing mode of collecting the feedback image in real time.

In order to achieve the above object, the following solutions have been proposed:

an image processing method applied to a controller of an endoscope, the endoscope further comprising at least: the image processing method comprises the following steps of:

acquiring a two-dimensional image set of a region to be detected acquired by the camera at the current moment and a state parameter acquired by the sensor, wherein the two-dimensional image set comprises each frame of two-dimensional image acquired according to time sequence when the camera works under the control of an operator, and the state parameter is the state parameter of the camera acquired by the sensor when the camera works;

identifying a focus in the two-dimensional image of each frame, and determining two-dimensional coordinate information of the focus in the two-dimensional image;

based on the state parameters and pose transformation information extracted from the two-dimensional image set, performing splicing processing on the two-dimensional images of each frame to obtain a three-dimensional model corresponding to the region to be detected;

marking the position of the focus in the three-dimensional model based on the mapping relation between the two-dimensional coordinate information corresponding to the focus and the three-dimensional model to obtain a target display model;

and displaying the target display model on the display.

Optionally, the identifying the focus in the two-dimensional image of each frame and determining two-dimensional coordinate information of the focus in the two-dimensional image include:

extracting features of the two-dimensional images of each frame to obtain key feature information corresponding to the two-dimensional images of each frame;

invoking a preset focus detection model to identify the key feature information corresponding to the two-dimensional image of each frame, and determining the key feature information with focus features, wherein the preset focus detection model is a model obtained by training the feature information of a plurality of image samples as feature values and focus features of the image samples as target values;

and determining two-dimensional coordinate information of the focus in each frame of the two-dimensional image based on the position of the key feature information of the focus feature in the two-dimensional image.

Optionally, the stitching processing is performed on the two-dimensional images of each frame based on pose transformation information and the state parameters extracted from the two-dimensional image set to obtain a three-dimensional model corresponding to the region to be detected, including:

extracting characteristic points of the two-dimensional image of each frame;

tracking the characteristic points of the two-dimensional image of each frame, and determining pose transformation information of the camera, wherein the pose transformation information is used for representing the motion state of the camera relative to the region to be detected;

fusing the pose transformation information with the state parameters to obtain target pose transformation information of the camera;

performing depth estimation processing based on the gradient of the brightness of the two-dimensional image and the target pose transformation information of each frame to obtain depth information of the feature points of the two-dimensional image of each frame;

and splicing the two-dimensional images of each frame according to the time sequence based on the depth information of the characteristic points of the two-dimensional images of each frame to obtain a three-dimensional model corresponding to the region to be detected.

Optionally, the marking the position of the focus in the three-dimensional model based on the mapping relationship between the two-dimensional coordinate information corresponding to the focus and the three-dimensional model to obtain a target display model includes:

projecting the two-dimensional coordinate information of the focus in each frame of the two-dimensional image to the three-dimensional model to obtain three-dimensional coordinate information corresponding to the focus;

and performing distinguishing marking in the three-dimensional model according to the three-dimensional coordinate information corresponding to the focus to obtain a three-dimensional model marked with the focus.

Optionally, the image processing method further includes:

determining a size of the lesion based on the three-dimensional coordinate information corresponding to the lesion.

Optionally, the image processing method further includes:

and optimizing the three-dimensional model based on a loop detection technology to obtain the optimized three-dimensional model.

An image processing apparatus, applied to a controller of an endoscope, the endoscope further comprising at least: camera, sensor and display, image processing device includes:

the information acquisition unit is used for acquiring a two-dimensional image set of a region to be detected acquired by the camera at the current moment and a state parameter acquired by the sensor, wherein the two-dimensional image set comprises each frame of two-dimensional image acquired and stored according to a time sequence when the camera works under the control of an operator, and the state parameter is the state parameter of the camera acquired by the sensor when the camera works;

a focus identifying unit, configured to identify a focus in the two-dimensional image of each frame, and determine two-dimensional coordinate information of the focus in the two-dimensional image;

the model construction unit is used for carrying out splicing processing on the two-dimensional images of each frame based on pose transformation information and the state parameters extracted from the two-dimensional image set to obtain a three-dimensional model corresponding to the region to be detected;

the position marking unit is used for marking the position of the focus in the three-dimensional model based on the mapping relation between the two-dimensional coordinate information corresponding to the focus and the three-dimensional model to obtain a target display model;

and the model display unit is used for displaying the target display model on the display.

Optionally, the lesion recognition unit includes:

the feature extraction subunit is used for extracting features of the two-dimensional images of each frame to obtain key feature information corresponding to the two-dimensional images of each frame;

the feature recognition subunit is used for calling a preset focus detection model to recognize the key feature information corresponding to the two-dimensional image of each frame and determining the key feature information with focus features, wherein the preset focus detection model is a model obtained by training by taking the feature information of a plurality of image samples as feature values and the focus features of the image samples as target values;

and the coordinate determination subunit is used for determining the two-dimensional coordinate information of the focus in each frame of the two-dimensional image based on the position of the key feature information of the focus feature in the two-dimensional image.

An image processing apparatus includes a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of any one of the image processing methods.

A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the image processing methods.

According to the method and the device, a three-dimensional model corresponding to the area to be detected can be constructed according to the two-dimensional image set acquired by the camera and the state parameters of the camera acquired by the sensor during operation, and the position of the focus determined through the two-dimensional image set is marked in the three-dimensional model. Therefore, the three-dimensional model constructed by the method and the device can accurately display the specific position of the focus in the region to be detected, so that an operator can observe all the regions shot by the camera through the three-dimensional model as far as possible at the same time, and the accuracy of marking the focus position in the region to be detected by the operator is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic view of an alternative endoscope provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of an implementation method of image processing according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The current endoscope mostly adopts a two-dimensional imaging mode, and two-dimensional images acquired by the endoscope in real time are transmitted back to the display equipment, so that the basic inspection requirements of operators can be met. However, in the inspection process, due to the special structure of the human body organ, the images acquired by the endoscope in real time often cannot completely cover the area to be inspected in the process of using the endoscope by an operator, so that the focus of the area to be inspected cannot be inspected completely, and missed inspection is caused.

Or after the inspection is finished, if an operator wants to check the inspected area or the determined focus, the operator is required to operate the endoscope, and the position of the area or the focus is not easy to be determined again according to the two-dimensional picture returned by the endoscope. Moreover, since the operator cannot ensure that the actions of operating the endoscope are completely consistent each time, the camera of the endoscope may deviate, and the size, the distance and the like of the two-dimensional images correspondingly returned in the same area are different.

According to the method and the device for detecting the focus position of the region to be detected, the problem that the focus position of the region to be detected cannot be accurately located by the endoscope return image can be solved, the three-dimensional model corresponding to the region to be detected can be built while accuracy is improved, operators can judge whether the condition of missed detection exists or not according to the knowledge of the region to be detected (namely the human organ structure) by themselves according to the three-dimensional model, therefore, the endoscope is further operated, the structure of the three-dimensional model is perfected, and accuracy of detection of the region to be detected is improved. In addition, the embodiment of the application adds depth estimation based on the calculation of the focus position and the size of the three-dimensional model, and further improves the accuracy of focus size estimation.

It will be appreciated that the image processing method may be applied to a controller of an endoscope, changing the existing image processing method of an endoscope. Referring to fig. 1, an alternative endoscope schematic diagram is provided in an embodiment of the present application, the endoscope includes: a controller 01, a camera 02, a sensor 03, a display 04 and connecting wires for connecting the components.

It should be understood that the controller 01 may be a functional module having an image processing function, or may be a separate device such as a server or a host having an image processing function. The camera 02 can be controlled by an operator, and is used for acquiring a two-dimensional picture of a shooting area of the camera 02, the sensor 03 is used for acquiring state parameters of the camera in a working state, the state parameters can comprise angular speed, attitude information and the like, and the display 04 is used for displaying the two-dimensional picture returned by the camera or displaying a three-dimensional model obtained through processing the two-dimensional picture so as to be checked by the operator. In the embodiment of the application, the hardware types and software programs of the controller 01, the camera 02, the sensor 03 and the display 04 in the endoscope are not particularly limited, and components, devices and the like which can realize the same functions can be equivalently replaced.

The specific execution steps of the image processing method applied to the controller 01 of the endoscope may refer to a schematic flowchart of implementing the image processing method provided in the embodiment of the present application shown in fig. 2, where the specific flowchart steps may include:

step S110, a two-dimensional image set of the area to be detected, which is acquired by the camera at the current moment, and state parameters which are acquired by the sensor are acquired.

The two-dimensional image set comprises each frame of two-dimensional image which is acquired and stored according to time sequence when the camera works under the control of an operator, and the state parameters are acquired by the sensor when the camera works.

According to the embodiment of the application, the time for starting the endoscope by an operator is taken as the starting time, and the camera can acquire a series of two-dimensional images of the region to be detected with different visual angles under the control of the operator and transmit each frame of the two-dimensional images to the controller in real time. And meanwhile, the controller receives each frame of two-dimensional image acquired by the camera to obtain all two-dimensional images acquired by the camera from the current moment to the starting moment, and a two-dimensional image set is obtained.

Optionally, the camera can adopt the monocular camera, and the follow-up depth estimation to two-dimensional image can be realized according to the monocular ranging principle of monocular camera to the follow-up, or the focus is estimated with the distance between the camera, promotes the accuracy that three-dimensional model construction and focus size estimated are constructed to this application embodiment.

Optionally, the sensor may be a gyro sensor, and may collect the angular velocity and attitude information of the camera in real time as the state parameter, and the state parameter of the camera is the same as the camera and is returned to the controller in real time.

Step S120, identifying a focus in the two-dimensional image of each frame, and determining two-dimensional coordinate information of the focus in the two-dimensional image.

In the prior art, a focus is usually identified, an operator needs to manually identify the focus according to a two-dimensional image transmitted back in real time, and the specific position of the focus in an area to be inspected cannot be locked. According to the method and the device for identifying the focus, the focus can be automatically identified through the characteristic information in the two-dimensional image, and the coordinate information of the focus in the corresponding two-dimensional image is determined.

Lesions, a localized diseased tissue with pathogenic microorganisms, are the part of the body where lesions occur. In the embodiment of the application, the automatic focus identification can be identified through a focus identification model trained in advance, through a specific feature identification algorithm, or through analyzing the characteristics of a layer, resolution, color blocks and the like of a two-dimensional image, determining the area where the lesions possibly exist and other modes capable of realizing the focus identification, and the mode of the automatic focus identification is not limited uniquely.

Optionally, the process of automatically identifying a focus through a pre-trained focus identification model and determining focus coordinate information according to the embodiment of the present application may include: extracting features of the two-dimensional images of each frame to obtain key feature information corresponding to the two-dimensional images of each frame; invoking a preset focus detection model to identify the key feature information corresponding to the two-dimensional image of each frame, and determining the key feature information with focus features, wherein the preset focus detection model is a model obtained by training the feature information of a plurality of image samples as feature values and focus features of the image samples as target values; and determining two-dimensional coordinate information of the focus in each frame of the two-dimensional image based on the position of the key feature information of the focus feature in the two-dimensional image.

The preset focus detection model which is trained in advance is deployed in the controller, when the preset focus detection model is trained, a large number of two-dimensional images for focus detection can be obtained from previous cases, the two-dimensional images are used as training samples, and the characteristic information of focuses in each two-dimensional image in the training samples is extracted.

During training, feature information of the two-dimensional image is taken as a model input, feature information of a focus in the two-dimensional image is taken as an output, so that a preset focus detection model obtained through training determines whether the focus exists or not based on the input feature information, and when the focus exists, feature information corresponding to the focus is determined from a plurality of feature information.

Based on the method, key feature information of each frame of two-dimensional image of the region to be detected is extracted, the feature information is input into a preset focus detection model to be subjected to feature recognition processing, whether a focus exists in each frame of two-dimensional image is determined, and key feature information belonging to focus features in the key feature information of the two-dimensional image is determined. And based on the position of the key characteristic information in the two-dimensional image, the position information of the focus is truly obtained. In this embodiment of the present application, the position information is determined by coordinates, and the format of each frame of two-dimensional image acquired by using the same camera is the same, so all two-dimensional images may use the same two-dimensional coordinate system, and two-dimensional coordinate information of the lesion in each frame of two-dimensional image is determined based on the two-dimensional coordinate system.

And step S130, based on the state parameters and pose transformation information extracted from the two-dimensional image set, performing stitching processing on the two-dimensional images of each frame to obtain a three-dimensional model corresponding to the region to be detected.

According to the embodiment of the application, the three-dimensional model of the region to be detected can be spliced and constructed frame by frame according to the time corresponding to the two-dimensional image and the gesture information provided by the sensor. It can be understood that the two-dimensional image set includes a plurality of two-dimensional images for the same region and different view angles, so that the basic structure of the region can be approximately restored according to the two-dimensional images of different view angles, and based on the basic thought, a three-dimensional model corresponding to the region to be detected can be constructed according to the two-dimensional image set.

Specifically, the process of establishing the three-dimensional model according to the embodiment of the application may include: extracting characteristic points of the two-dimensional image of each frame; tracking the characteristic points of the two-dimensional image of each frame, and determining pose transformation information of the camera, wherein the pose transformation information is used for representing the motion state of the camera relative to the region to be detected; fusing the pose transformation information with the state parameters to obtain target pose transformation information of the camera; performing depth estimation processing based on the gradient of the brightness of the two-dimensional image and the target pose transformation information of each frame to obtain depth information of the feature points of the two-dimensional image of each frame; and splicing the two-dimensional images of each frame according to the time sequence based on the depth information of the characteristic points of the two-dimensional images of each frame to obtain a three-dimensional model corresponding to the region to be detected.

Alternatively, the embodiment of the application can use an LSD-SLAM algorithm (Large-Scale Direct Monocular SLAM or semi-dense monocular SLAM) to establish a three-dimensional model corresponding to the area to be detected in real time. First, a computer vision technique is used to extract feature points for each two-dimensional image, where in image processing, the feature points refer to points where the gray value of the image changes drastically, or points where the curvature is relatively large on the edges of the image (i.e., the intersection of two edges), such as corner points, edges, and the like. The feature points can reflect the essential characteristics of the corresponding two-dimensional images, can identify target objects in the two-dimensional images, and can be matched with the two-dimensional images through the matching of the feature points.

Further, using an image pose tracking technique of the LSD-SLAM, the motion of the camera is tracked through the matching of the characteristic points between two-dimensional images, and pose transformation information, such as translation and rotation, of the camera is determined.

The angular velocity information of the camera acquired by the gyroscope sensor is fused with the pose transformation information of the camera, and compared with the pose transformation information, the obtained target pose transformation information has higher stability and accuracy. According to the embodiment of the application, the angular velocity information and the pose transformation information can be fused in a complementary filtering mode, and the target pose transformation information is determined more stably and accurately.

In order to acquire another dimension data for constructing the three-dimensional model, depth estimation is required to be performed on the two-dimensional image, and the depth estimation on the two-dimensional image can be realized through a depth estimation algorithm (such as a monocular depth estimation algorithm and a binocular depth estimation algorithm), a depth estimation model construction method, a motion recovery structure (SFM, structure from motion) method and the like.

According to the embodiment of the application, the LSD-SLAM is used for estimating the depth of each frame of two-dimensional image by using a direct method, and the depth information of the feature points in each frame of two-dimensional image is estimated by comparing the gradient of the brightness of each frame of two-dimensional image and the relation between the pose transformation information of the camera and the feature points.

And matching the characteristic points of every two adjacent two-dimensional images by utilizing a two-dimensional image set sequenced in time sequence, and aligning and splicing each two-dimensional image in the two-dimensional image set with the two-dimensional images of the front frame and the rear frame by combining the depth information of the characteristic points to obtain the three-dimensional model.

In addition, in the embodiment of the application, after the endoscope is started, a two-dimensional image is returned in real time, after the three-dimensional model is built, the controller can also receive the two-dimensional image returned by the camera in real time, and can determine the relationship between the pose of the camera and the three-dimensional model according to the matching of the characteristic points of the two-dimensional image and the characteristic points in the three-dimensional model, further can optimize the model in real time according to the two-dimensional image, optimize the motion track of the camera, update the model picture and the like, and can be realized by adopting global optimization technologies such as binding adjustment and the like so as to minimize errors and ensure the consistency of the three-dimensional model before and after optimization.

In addition, in the process of constructing the three-dimensional model, the three-dimensional model can be optimized based on a loop detection technology, so that the optimized three-dimensional model is obtained, the accumulated error is reduced, and the stability and consistency of the three-dimensional model are improved.

And detecting whether closed-loop information exists in the three-dimensional model by adopting a loop detection technology, and if the closed-loop information exists, correcting or optimizing the three-dimensional model by using a preset algorithm, so that the three-dimensional model after closed-loop correction can be effectively carried out, and the accumulated error is obviously reduced. Optionally, the method and the device can determine whether closed-loop information exists through similarity calculation between two-dimensional images, and determine that the closed-loop information exists when the similarity between the two-dimensional images does not meet a threshold value, and enable the similarity between the two-dimensional images to meet a preset threshold value through corresponding transformation of the two-dimensional images, so that correction of the images or models is completed.

And step S140, marking the position of the focus in the three-dimensional model based on the mapping relation between the two-dimensional coordinate information corresponding to the focus and the three-dimensional model, and obtaining a target display model.

And step S150, displaying the target display model on a display.

The two-dimensional coordinates of the focus are mapped to the three-dimensional coordinates in a projection mode, and the positions of the three-dimensional coordinates of the focus in the three-dimensional model are marked, so that the three-dimensional model marked with the focus is obtained, visual inspection by operators is facilitated, the focus is not required to be identified from the three-dimensional model by people, and the position of the focus in a region to be detected can be intuitively embodied.

Optionally, the process of marking the focus according to the projection mode according to the embodiment of the application may include: projecting the two-dimensional coordinate information of the focus in each frame of the two-dimensional image to the three-dimensional model to obtain three-dimensional coordinate information corresponding to the focus; and performing distinguishing marking in the three-dimensional model according to the three-dimensional coordinate information corresponding to the focus to obtain a three-dimensional model marked with the focus.

It can be understood that the three-dimensional model needs to be obtained by splicing all the two-dimensional images in combination with the depth information, and the two-dimensional coordinate information corresponding to the focus in the two-dimensional image with the focus in the three-dimensional model is marked, so that the effect of projecting the two-dimensional coordinate information to the three-dimensional model can be achieved, and the three-dimensional coordinate information corresponding to the focus is obtained.

Based on the thought, the positions corresponding to the two-dimensional coordinate information of the focus can be marked in a distinguishing mode in the two-dimensional images, and the focus can be marked directly in a three-dimensional model obtained by splicing all the two-dimensional images without the projection mode.

Further, in the three-dimensional model, the positions corresponding to the three-dimensional coordinate information are marked in a distinguishing mode, such as color distinguishing, brightness distinguishing and the like, and the lesion part is distinguished from the normal part, so that the identification of an operator is facilitated. And feeding back the three-dimensional model marked with the focus position to a display, and displaying the three-dimensional model for operators to observe after the display receives the information of the three-dimensional model.

An operator can determine whether the region to be detected has a region which is missed to be detected or not through the three-dimensional model displayed by the display. Assuming that the current endoscope is a gastroscope, the stomach needs to be detected, after an operator starts the gastroscope, the two-dimensional image of the stomach is collected by controlling the camera, and the controller constructs a three-dimensional model by obtaining the two-dimensional image collected by the camera in real time and displays the model through the display. And an operator judges whether the three-dimensional model is identical to the stomach structure according to the three-dimensional model displayed in real time by the display and the known stomach structure, if the three-dimensional model is not identical to the stomach structure, a missed detection area possibly exists, the gastroscope camera is controlled again, a new two-dimensional image is acquired, and the current three-dimensional model is perfected until the comprehensive detection of the area to be detected is completed.

The embodiment of the application can also determine the size of the focus based on the three-dimensional coordinate information corresponding to the focus. The three-dimensional coordinates are compared with the two-dimensional coordinates, and the volume of the focus can be calculated based on the three-dimensional coordinates and the feature information of the depth dimension, or when the area where the focus is located is a curved surface, the focus area determined based on the three-dimensional coordinates is more accurate than the focus plane area determined based on the two-dimensional coordinates.

In summary, according to the embodiment of the present application, a three-dimensional model corresponding to a region to be detected may be constructed according to a two-dimensional image set acquired by a camera and a state parameter acquired by a sensor during operation of the camera, and a position of a focus determined by the two-dimensional image set is marked in the three-dimensional model. Therefore, the three-dimensional model constructed by the method and the device can accurately display the specific position of the focus in the region to be detected, so that an operator can observe all the regions shot by the camera through the three-dimensional model as far as possible at the same time, and the accuracy of marking the focus position in the region to be detected by the operator is improved.

Next, taking a gastroscope scene as an example, an example of practical application of the image processing method provided in the embodiment of the present application will be described. The image processing method is applied to a controller or host software of the gastroscope, after an operator starts the gastroscope, a camera of the gastroscope is controlled to acquire two-dimensional images of the stomach of a patient, the camera transmits the acquired two-dimensional images back to the controller in real time, and the controller constructs a three-dimensional model according to the acquired two-dimensional images based on the mode. In the embodiment of the application, the three-dimensional model is not required to be built after a certain number of two-dimensional images are accumulated, but the model can be built when the two-dimensional images are received for the first time. However, because the two-dimensional images are fewer, the built model is not a complete stomach structure, so that the three-dimensional model needs to be continuously perfected according to the two-dimensional images returned in real time until an operator closes the gastroscope.

The steps of creating a three-dimensional model based on the two-dimensional image and identifying the lesion in the two-dimensional image are performed substantially simultaneously and do not conflict. After a two-dimensional image returned by a camera is received, characteristic information of the two-dimensional image is identified and processed based on a preset focus detection model which is deployed in advance, whether a focus exists in the two-dimensional image is determined, and if the focus exists, two-dimensional coordinate information of the focus is determined at the same time.

And mapping the two-dimensional coordinate information to the position of the two-dimensional image in the three-dimensional model, determining the three-dimensional coordinate information of the focus, and displaying the three-dimensional coordinate information on a display for an operator to check.

The image processing apparatus provided in the embodiments of the present application will be described below, and the image processing apparatus described below and the image processing method described above may be referred to correspondingly to each other.

First, referring to fig. 3, an image processing apparatus applied to a controller or host software of an endoscope is described, as shown in fig. 3, the image processing apparatus may include:

the information acquisition unit 100 is configured to acquire a two-dimensional image set of a region to be detected acquired by the camera at a current moment and a state parameter acquired by the sensor, where the two-dimensional image set includes two-dimensional images of each frame acquired and stored in time sequence when the camera operates under the control of an operator, and the state parameter is a state parameter of the camera acquired by the sensor when the camera operates;

a focus identifying unit 200, configured to identify a focus in the two-dimensional image for each frame, and determine two-dimensional coordinate information of the focus in the two-dimensional image;

a model construction unit 300, configured to perform stitching processing on the two-dimensional images of each frame based on pose transformation information and the state parameters extracted from the two-dimensional image set, so as to obtain a three-dimensional model corresponding to the region to be detected;

a position marking unit 400, configured to mark a position of the focus in the three-dimensional model based on a mapping relationship between the two-dimensional coordinate information corresponding to the focus and the three-dimensional model, so as to obtain a target display model;

and a model display unit 500, configured to display the target display model on the display.

Optionally, the lesion recognition unit 200 includes:

Optionally, the model building unit 300 includes:

a feature point extraction subunit, configured to extract feature points of the two-dimensional image of each frame;

the tracking processing subunit is used for carrying out tracking processing on the characteristic points of the two-dimensional image of each frame and determining pose transformation information of the camera, wherein the pose transformation information is used for representing the motion state of the camera relative to the region to be detected;

the information fusion subunit is used for fusing the pose transformation information with the state parameters to obtain target pose transformation information of the camera;

the depth estimation subunit is used for carrying out depth estimation processing based on the gradient of the brightness of the two-dimensional image and the target pose transformation information of each frame to obtain the depth information of the characteristic points of the two-dimensional image of each frame;

and the image splicing subunit is used for splicing the two-dimensional images of each frame according to the time sequence based on the depth information of the characteristic points of the two-dimensional images of each frame to obtain a three-dimensional model corresponding to the region to be detected.

Optionally, the position marking unit 400 includes:

the coordinate projection subunit is used for projecting the two-dimensional coordinate information of the focus in each frame of the two-dimensional image to the three-dimensional model to obtain three-dimensional coordinate information corresponding to the focus;

and the focus marking subunit is used for carrying out distinguishing marking in the three-dimensional model according to the three-dimensional coordinate information corresponding to the focus to obtain a three-dimensional model marked with the focus.

Optionally, the image processing apparatus may further include:

a size determining subunit, configured to determine a size of the lesion based on the three-dimensional coordinate information corresponding to the lesion.

Optionally, the image processing apparatus may further include:

the model optimization unit is used for optimizing the three-dimensional model based on a loop detection technology to obtain the optimized three-dimensional model.

In summary, according to the embodiment of the present application, a three-dimensional model corresponding to a region to be detected may be constructed according to a two-dimensional image set acquired by a camera and a state parameter acquired by a sensor during operation of the camera, and a position of a focus determined by the two-dimensional image set is marked in the three-dimensional model. Therefore, the three-dimensional model constructed by the embodiment of the invention can accurately display the specific position of the focus in the region to be detected, so that an operator can observe all the regions shot by the camera through the three-dimensional model as far as possible at the same time, thereby helping to improve the accuracy of marking the focus position in the region to be detected.

The image processing device provided by the embodiment of the application can be applied to an image processing device.

Fig. 4 shows a schematic diagram of the structure of an image processing apparatus, and referring to fig. 4, the structure of the image processing apparatus may include: at least one processor 10, at least one memory 20, at least one communication bus 30, and at least one communication interface 40.

In the embodiment of the present application, the number of the processor 10, the memory 20, the communication bus 30 and the communication interface 40 is at least one, and the processor 10, the memory 20 and the communication interface 40 complete communication with each other through the communication bus 30.

The processor 10 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, or the like.

The memory 20 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory.

The memory stores a program, and the processor can call the program stored in the memory, wherein the program is used for realizing each processing flow in the image processing scheme.

The embodiment of the application also provides a storage medium, which may store a program suitable for being executed by a processor, where the program is used to implement each processing flow in the foregoing image processing scheme.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image processing method, characterized by a controller applied to an endoscope, the endoscope further comprising at least: the image processing method comprises the following steps of:

and displaying the target display model on the display.

2. The image processing method according to claim 1, wherein the identifying a lesion in the two-dimensional image for each frame and determining two-dimensional coordinate information of the lesion in the two-dimensional image include:

3. The image processing method according to claim 1, wherein the performing a stitching process on the two-dimensional images for each frame based on pose transformation information and the state parameters extracted from the two-dimensional image set to obtain a three-dimensional model corresponding to the region to be detected, comprises:

extracting characteristic points of the two-dimensional image of each frame;

4. The image processing method according to claim 1, wherein the marking the position of the lesion in the three-dimensional model based on the mapping relationship between the two-dimensional coordinate information corresponding to the lesion and the three-dimensional model, to obtain a target display model, includes:

5. The image processing method according to claim 4, characterized by further comprising:

6. The image processing method according to claim 1, characterized by further comprising:

7. An image processing apparatus characterized by a controller applied to an endoscope, the endoscope further comprising at least: camera, sensor and display, image processing device includes:

8. The image processing apparatus according to claim 7, wherein the lesion recognition unit includes:

9. An image processing apparatus, comprising a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the respective steps of the image processing method according to any one of claims 1 to 6.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the image processing method according to any of claims 1-6.