CN114587416A

CN114587416A - Gastrointestinal tract submucosal tumor diagnosis system based on deep learning multi-target detection

Info

Publication number: CN114587416A
Application number: CN202210238095.5A
Authority: CN
Inventors: 李�真; 钟宁; 娄煜; 邵学军; 赖永航; 倪杰锟; 王鹏; 王立梅; 左秀丽; 李延青
Original assignee: Qilu Hospital of Shandong University
Current assignee: Qilu Hospital of Shandong University
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-06-07

Abstract

The invention belongs to the field of deep learning, and provides a gastrointestinal submucosal tumor diagnosis system based on deep learning multi-target detection, which comprises an ultrasonic video fragment acquisition module, a detection module and a control module, wherein the ultrasonic video fragment acquisition module is used for acquiring gastrointestinal submucosal ultrasonic video fragments; wherein, the gastrointestinal submucosal ultrasound video segment is composed of a plurality of frames of gastrointestinal submucosal ultrasound images with time sequence; the tumor type prediction module is used for predicting the tumor type of the gastrointestinal tract submucosal ultrasound image frame by frame according to the time sequence based on the trained diagnosis model; and the tumor diagnosis and position determination module is used for screening out the tumor type with the highest number of the predicted focuses in the gastrointestinal submucosa ultrasonic video segment as the gastrointestinal submucosa tumor diagnosis type and marking out the position of the tumor diagnosis type.

Description

Gastrointestinal tract submucosal tumor diagnosis system based on deep learning multi-target detection

Technical Field

The invention belongs to the field of deep learning, and particularly relates to a gastrointestinal submucosal tumor diagnosis system based on deep learning multi-target detection.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

The gastrointestinal submucosal tumor (SMT) refers to a raised lesion originated from layers below the gastrointestinal mucosal layer (mainly including mucosal muscularis, submucosa and muscularis propria). The pathological and histological types of the submucosal tumor are complex, and different treatment modes are required to be selected according to different pathological types, so that the early and correct identification of the type of the submucosal tumor is very important. Ultrasonic Endoscopy (EUS) is the most accurate imaging examination method for evaluating the gastrointestinal submucosal tumors at present, and plays an important role in the differential diagnosis of various types of gastrointestinal submucosal tumors, the positioning of the tumors and the selection of treatment methods. Research shows that the sensitivity and specificity of good and malignant tumors under the mucosa identified by ultrasonic endoscopy are respectively 64 percent and 80 percent; and is superior to the examination of electron Computer Tomography (CT), Magnetic Resonance Imaging (MRI) and the like for the lesion with the tumor diameter of less than 2 cm. However, ultrasonic endoscopes have the following limitations: (1) in the ultrasonic endoscopy, ultrasonic images of various types of submucosal tumors have no clear differential diagnosis standard, and completely depend on the experience of operators to identify lesions and make classified diagnosis. Therefore, different classification diagnoses can be made between operators in different levels, even two times of examinations performed by the same operator, and the diagnosis consistency is poor; (2) the interference factors of ultrasonic imaging are more, artifacts are easily generated, and the observation of an operator is influenced; (3) in the process of scanning the ultrasonic endoscope, the ultrasonic image is in a dynamic change state, the image change speed is very high, and the submucosal tumor is difficult to identify and position quickly and accurately in the process of inspection.

In the prior art, a two-classification artificial intelligent diagnosis system for auxiliary identification of interstitial tumors and leiomyoma is provided, but the histopathology of the submucosal tumors in clinical practice is complex and the types of the submucosal tumors are various. The inventor finds that the existing two-classification diagnosis system cannot solve the problem of multiple classifications of the submucosal tumor in clinical practice at all and is difficult to apply to clinical work.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a gastrointestinal submucosal tumor diagnosis system based on deep learning multi-target detection, which can solve the problems of submucosal tumor region positioning and multi-classification in clinical practice, improve the sensitivity and specificity of diagnosing gastrointestinal submucosal tumors under an ultrasonic endoscope and assist an ultrasonic endoscope operator in identifying, positioning and classifying the submucosal tumors in the inspection process.

In order to achieve the purpose, the invention adopts the following technical scheme:

the first aspect of the present invention provides a gastrointestinal submucosal tumor diagnosis system based on deep learning multi-target detection, which comprises:

the ultrasonic video clip acquisition module is used for acquiring gastrointestinal tract submucosal ultrasonic video clips; wherein, the gastrointestinal submucosal ultrasound video segment is composed of a plurality of frames of gastrointestinal submucosal ultrasound images with time sequence;

the tumor type prediction module is used for predicting the tumor type of the gastrointestinal tract submucosal ultrasound image frame by frame according to the time sequence based on the trained diagnosis model;

and the tumor diagnosis and position determination module is used for screening out the tumor type with the highest number of the predicted focuses in the gastrointestinal submucosa ultrasonic video segment as the gastrointestinal submucosa tumor diagnosis type and marking out the position of the tumor diagnosis type.

A second aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring gastrointestinal submucosal ultrasound video fragments; wherein, the gastrointestinal submucosal ultrasound video segment is composed of a plurality of frames of gastrointestinal submucosal ultrasound images with time sequence;

predicting the tumor type of the gastrointestinal tract submucosal ultrasound image frame by frame according to the time sequence based on the trained diagnosis model;

and screening the tumor type with the highest number of the predicted focuses in the gastrointestinal submucosa ultrasonic video segment as the gastrointestinal submucosa tumor diagnosis type, and marking the position of the tumor diagnosis type.

A third aspect of the present invention provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to perform the following steps:

acquiring gastrointestinal submucosal ultrasonic video fragments; wherein, the gastrointestinal submucosal ultrasound video segment is composed of a plurality of frames of gastrointestinal submucosal ultrasound images with time sequence;

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a gastrointestinal submucosal tumor diagnosis system based on deep learning multi-target detection, which predicts tumor types of gastrointestinal submucosal ultrasonic images frame by frame according to a time sequence based on a diagnosis model, takes the tumor type with the highest number of predicted focuses in a gastrointestinal submucosal ultrasonic video segment as the gastrointestinal submucosal tumor diagnosis type, and can accurately mark the position of the tumor diagnosis type, thereby realizing the accuracy of gastrointestinal submucosal tumor multi-classification, solving the problem that the existing two-classification diagnosis system cannot solve the problem of the multi-classification of the submucosal tumors in clinical practice, improving the sensitivity and specificity of diagnosing the gastrointestinal submucosal tumors under an ultrasonic endoscope, and assisting an endoscopic ultrasonic operator to identify, position and classify the gastrointestinal submucosal tumors in the inspection process.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a block diagram of a deep learning multi-target detection-based gastrointestinal submucosa tumor diagnostic system according to an embodiment of the present invention;

fig. 2(a) -2 (b) are schematic diagrams of ultrasound images of gastrointestinal submucosal tumors according to an embodiment of the present invention;

FIG. 3 is an ultrasound image annotation example of a gastrointestinal submucosal tumor according to an embodiment of the present invention;

FIG. 4 is a distribution of ultrasound image annotation training sets for gastrointestinal submucosal tumors in accordance with embodiments of the present invention;

FIG. 5 is an exemplary enhanced view of ultrasound image data of a submucosal tumor of the gastrointestinal tract in accordance with an embodiment of the present invention;

FIG. 6 is a CSP-block architecture diagram of an embodiment of the invention;

FIG. 7 is a block diagram of a Backbone in accordance with an embodiment of the present invention;

FIGS. 8(a) -8 (f) are schematic structural diagrams of FPN series according to the embodiment of the present invention;

FIG. 9 is a block diagram of the Head portion of an embodiment of the present invention;

fig. 10 is a tumor type and its location for ultrasound images of the submucosal mucosa of the gastrointestinal tract in accordance with an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Interpretation of terms:

imaging principle of ultrasound image:

ultrasound imaging requires three steps: transmitting sound waves, receiving reflected sound waves, and analyzing and processing signals to obtain images; the ultrasonic probe transmits ultrasonic waves through the piezoelectric ceramic transducer, and the sound wave frequencies which can be transmitted by different probes are different. The frequency of the acoustic ultrasonic wave is generally 2-13MHz, the higher the frequency of the acoustic wave is, the weaker the diffraction is, and the higher the imaging resolution is; at the same time, however, the higher the frequency, the faster the sound wave is attenuated and the less deep the penetration. Therefore, when the heart is detected, only the acoustic wave with lower frequency can be used, otherwise, the detection depth is not thick, and the imaging effect is poor; when the blood vessels below the epidermis, such as carotid artery, femoral artery and the like, are detected, sound waves with high frequency are used, and the imaging is clear. The heart probe used in the clinical trial is 2-4MHz, and the vascular probe is 10 MHz. The ultrasonic probe still receives the reflected wave, the piezoelectric ceramic transducer converts the sound wave signal into an electric signal, and then a system on a computer processes and images the signal. For type B ultrasound, a two-dimensional grayscale image of the section of tissue that the probe is facing is shown. It is known that 3 pieces of information (x, y, gray) are required to determine each point on the two-dimensional gray map, since the ultrasonic wave is reflected by contacting the tissue in the human body, different tissue acoustic impedances are different, acoustic impedance calculated according to the received echo reflectivity corresponds to gray scale on the image (for example, tissue acoustic impedance of the vascular wall is not much, gray scale value on the image is not much, and the shape of the blood vessel can be seen under the ultrasonic wave), assuming that the probe is 1-dimensional, the position of each probe on the probe corresponds to 1 abscissa x, the ordinate is determined by the time difference of transmitting and receiving the acoustic wave, assuming that the speed of the acoustic wave propagating in the human body is the same, the deeper the position of the reflected tissue is indicated by the longer the time, the tissue contour can be seen from the obtained gray map, and the measurement can be performed, such as the diameter and the area of the blood vessel, the imaging principle of the EUS is the above-described process, the imaging results are shown in fig. 2(a) to 2 (b).

Example one

As shown in fig. 1, the system for diagnosing gastrointestinal submucosal tumor based on deep learning multi-target detection of the present embodiment specifically includes the following modules:

(1) the ultrasonic video clip acquisition module is used for acquiring gastrointestinal tract submucosal ultrasonic video clips; wherein, the gastrointestinal submucosal ultrasound video fragment is composed of a plurality of frames of gastrointestinal submucosal ultrasound images with time sequence.

(2) And the tumor type prediction module is used for predicting the tumor type of the gastrointestinal tract submucosal ultrasonic image frame by frame according to the time sequence based on the trained diagnosis model.

In a specific implementation process, the diagnosis model is composed of a Backbone part, a Neck part and a Head part, wherein the Backbone part is used for extracting features from various frames of gastrointestinal tract submucosal ultrasonic images, the Neck part is used for fusing the features extracted from the Backbone part, and the Head part is used for predicting and obtaining a tumor type label (label) probability and a corresponding bounding box position (coordinate of a bounding box) based on the feature coupling fused by the Neck part. The Head part is shown in FIG. 9.

The Backbone part is formed by connecting a plurality of repeated CSP-block structures in series, and each CSP-block structure is formed by a CBW structure and an RES structure, as shown in figure 6. The CSP-block of FIG. 6 can also be applied to different network structures, such as DenseNet, ResNet, SE-Net, etc., and its basic structure has 2 parts of CBW and RES, respectively, where CBW is a basic unit formed by convolution (Conv), Batch Normalization (BN), and Swish activation function; RES is essentially a residual structure where one branch is concatenated by 2 CBW modules and then another identically transformed branch is added (add) element by element with it as an output of the structure. With the two basic structures, a CSP-block _ x is defined, wherein the CSP-block _ x indicates that some basic structures in the CSP-block structure are implemented by x repeated series structures, and as shown in fig. 6, one main branch of the CSP-block _ x is formed by x CBW structures and x RES structures.

It should be noted here that x can well control the complexity and depth of the network structure, and is better compatible with the deployment of different hardware, and the embodiment of the present invention proposes x to be 2 or 3.

As shown in fig. 7, the basic structure of the Backbone part is formed by CSP-block _ x.

FIGS. 8(a) to 8(f) are schematic views of FPN series structures. In which, fig. 8(a) is an FPN structure, fig. 8(b) is a PANet structure, fig. 8(c) is a NAS-FPN structure, fig. 8(d) is a full-connected FPN structure, fig. 8(e) is a Simplified FPN structure, and fig. 8(f) is a BiFPN structure. In order to achieve detection accuracy of large and small targets, the tack portion adopts a BiFPN structure, as shown in fig. 8 (f). The different feature extraction stages respectively have node outputs of P1-P5, and with the change of network depth, the feature map of a shallow network is small, the detection effect on small targets is relatively good, such as P1 output nodes, the feature map of a deep network is large, and the detection effect on large targets is good, such as P5 output nodes.

The function of the hack part is to better fuse/extract features given by the backbone, thereby improving the performance of the network and considering the detection and identification of small targets, medium targets, large targets and ultra-large targets. This section has modules such as ASFF, RFB, SPP, etc. in addition to the FPN type structures described using embodiments of the present invention.

In a specific implementation process, before the diagnosis model is trained, a training sample set is constructed, wherein the training sample set is composed of a plurality of labeled gastrointestinal tract submucosal ultrasonic images.

Endoscopic ultrasound of gastrointestinal submucosal tumors (SMT) the manifestations of the disease are similar in all but the spectrum of the disease is different and the population can be divided into two broad categories:

a: benignant SMT such as leiomyoma, lipoma, granulocytoma, schwannoma/schwannoma, ectopic pancreas, berkovich adenoma, cyst;

b: malignant SMT such as stromal tumors: low risk, high risk neuroendocrine tumor low risk, high risk hemangioblastoma, leiomyosarcoma, digestive tract metastatic cancer or primary cancer with SMT presentation.

The general regularity is:

1) glandular adenoma brucellosis is present only in the duodenum;

2) SMT leiomyomas of the esophagus are most common;

3) neuroendocrine tumors are mostly found in the rectum and the stomach, are yellow under white light, and have capillaries on the surface;

4) interstitial tumors are best developed in the stomach, with rare colorectal and minimal esophageal tumors;

5) most ectopic pancreas bodies are found in the antrum of the stomach, and navel-like depressions are formed in the centers of pathological changes visible under white light;

6) granulocytoma is usually found in esophagus and is faint yellow under white light;

7) lipoma is usually found in colon, then antrum of stomach, and the white light endoscope is yellow;

8) schwannoma is mostly seen in the stomach, followed by the colon and rectum.

Based on the open source labeling tool LabelImg, labeling leiomyoma (Liomyoma), Lipoma (Lipoma), ectopic pancreas (Panconventional Rest), interstitial tumor (GIST), Cyst (Cyst), neuroendocrine tumor (NET), digestive tract metastasis Cancer or SMT-expressed primary Cancer (Cancer), the labeling effect is shown in FIG. 3, and the distribution of the finally labeled data is shown in FIG. 4.

In one or more embodiments, after constructing the training sample set, the method further includes: and performing enhancement operation on the sample data in the training sample set.

The purpose of data enhancement is to: the robustness of the features and the generalization of the model are improved. Generally, data enhancement is to improve model performance, but not all methods hold, considering cost performance. Data enhancement has some relevance to data sets and tasks. Color enhancement is effective for classification problems, flipping, etc. for target detection.

Specifically, the data enhancement may be performed as follows:

A. data enhancement-pixel based:

brightness: like the mean value reaction. I.e. the mean of the image is changed.

Contrast ratio: the difference is scaled with the mean value unchanged. That is, the image mean is unchanged, and the variance of each color increases.

Color saturation: RGB shifts to HSV and saturation increases.

B. Data enhancement-location based transformation: such as: horizontal or vertical flipping, translation, rotation, zooming, cropping.

C. Data enhancement-others: like the Mixup method, two targets are fused into one target, and Cutout randomly discards a part of the image, which is equivalent to the enhancement of the occlusion problem. A part is found at random in the image, with pixels all set to 0, CutMix, Mosaic.

The data enhancement method is respectively enhanced with the probability of layering and random in the training process, and only 30% of images are subjected to data enhancement operation in each training batch. The effect of its random enhancement is shown in fig. 5.

In the embodiment, a deep learning training mode such as back propagation is adopted, an SGD is adopted as an optimizer, in order to avoid an over-fitting problem, a label smoothing, L2 regularization and Early Stopping mode is adopted, a Focal Loss is adopted for a classification Loss in a Loss function, a CIoU Loss is adopted for a regression Loss, an Epoch is trained on a V100 video card to be 300, an initial learning rate is set to be 0.001, a word is set to be 1, and a cosine learning rate attenuation mode is adopted to train a model.

(3) And the tumor diagnosis and position determination module is used for screening out the tumor type with the highest number of the predicted focuses in the gastrointestinal submucosa ultrasonic video segment as the gastrointestinal submucosa tumor diagnosis type and marking out the position of the tumor diagnosis type. The final prediction and inference results are shown in fig. 10.

In this embodiment, multiple target detections are performed on multiple kinds of SMT under the EUS in an artificial intelligence aided diagnosis manner, and the detection accuracy is shown in table 1:

TABLE 1 detection accuracy

As can be seen from table 1: the diagnosis model of the embodiment has high identification precision and low dependence on hardware, and the embodiment of the invention can meet the requirements of SMT auxiliary multi-target detection diagnosis in the EUS inspection process in real time no matter from the consideration of identification precision, identification speed and hardware cost.

In order to eliminate the general problems brought by the current target detection problems: the embodiment of the invention adds the time sequence judgment based on the video frame in the post-processing part, and the mode only exists in the SMT (gastrointestinal submucosal tumor) diagnosis process in the EUS (ultrasonic endoscopy) inspection process, and is realized without marking a large amount of marking data based on the video frame in the training stage. The main implementation logic is that the number of the SMT lesions of each category detected in a video segment is counted based on the detection result of each frame in the video segment, and the prediction result of a final diagnosis model in the video segment is the SMT category with the highest predicted lesion number and the position of the corresponding category. The post-processing mode based on the video frame has the advantages of simplicity and capability of effectively reducing false positive false identifications caused by feature expression reasons in target detection.

Example two

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

The diagnosis model is composed of a Backbone part, a Neck part and a Head part, wherein the Backbone part is used for extracting features from various frames of gastrointestinal tract submucosal ultrasonic images, the Neck part is used for fusing the features extracted from the Backbone part, and the Head part is used for predicting and obtaining the tumor type label probability and the corresponding bounding box position based on the feature coupling fused by the Neck part.

It should be noted that, the steps in this embodiment are the same as the modules in the first embodiment in terms of implementation, and are not described again here.

EXAMPLE III

The embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps:

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A gastrointestinal submucosal neoplasm diagnosis system based on deep learning multi-target detection, comprising:

2. The deep learning multi-target detection-based gastrointestinal submucosal tumor diagnosis system according to claim 1, wherein the diagnosis model is composed of a Backbone part for extracting features from each frame of gastrointestinal submucosal ultrasonic image, a Neck part for fusing the extracted features of the Backbone part and a Head part for obtaining the tumor type label probability and the corresponding position of a boundary box based on the feature coupling prediction fused by the Neck part.

3. The deep learning multi-target detection-based gastrointestinal submucosal tumor diagnostic system of claim 2, wherein the Backbone section is composed of a plurality of repetitive CSP-block structures connected in series, each CSP-block structure being composed of a CBW structure and a RES structure.

4. The deep learning multi-target detection-based gastrointestinal submucosal tumor diagnostic system of claim 2, wherein the hack moiety employs a BiFPN structure.

5. The deep learning multi-target detection-based gastrointestinal submucosal neoplasm diagnosis system of claim 1, further comprising, before the diagnosis model is trained, constructing a training sample set, wherein the training sample set is composed of a plurality of labeled gastrointestinal submucosal ultrasonic images.

6. The deep learning multi-target detection-based gastrointestinal submucosal tumor diagnosis system of claim 5, after constructing the training sample set, further comprising: and performing enhancement operation on the sample data in the training sample set.

7. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, performing the steps of:

8. The computer-readable storage medium of claim 7, wherein the diagnostic model is composed of a Backbone section for extracting features from frames of the gastrointestinal submucosal ultrasound image, a Neck section for fusing the features extracted from the Backbone section, and a Head section for deriving a tumor type tag probability and its corresponding bounding box location based on the Neck section fused feature coupling prediction.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of:

10. The electronic device of claim 9, wherein the diagnostic model is composed of a Backbone section for extracting features from frames of the gastrointestinal submucosa ultrasound image, a Neck section for fusing the features extracted from the Backbone section, and a Head section for predicting a tumor type tag probability and its corresponding bounding box position based on the fused features of the Neck section.