CN113744203A

CN113744203A - Method and device for determining upper digestive tract lesion area based on multitask assistance

Info

Publication number: CN113744203A
Application number: CN202110930193.0A
Authority: CN
Inventors: 郑泽峰; 唐穗谷; 梁延研; 于晓渊; 余汉濠; 徐义祥
Original assignee: Macau University of Science and Technology
Current assignee: Macau University of Science and Technology
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-12-03

Abstract

The application is applicable to the technical field of image processing, and provides a method and a device for determining an upper gastrointestinal lesion area based on multitask assistance, wherein the method for determining the upper gastrointestinal lesion area comprises the following steps: inputting the upper gastrointestinal endoscope image to be processed into the trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed; searching the upper digestive tract endoscope image to be processed by using a searching model, and selecting a first subdata set with similar characteristics to the upper digestive tract endoscope image to be processed from the training data set; determining a final lesion category of the endoscope image of the upper digestive tract to be processed according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set; if the final lesion type belongs to the preset lesion type, the lesion area in the upper gastrointestinal endoscope image to be processed can be segmented by using the trained segmentation model.

Description

Method and device for determining upper digestive tract lesion area based on multitask assistance

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a method and a device for determining an upper gastrointestinal lesion area based on multitask assistance.

Background

According to the '2020 global cancer statistical data' published by the world health organization, 2 of the top ten tumors with morbidity come from the upper digestive tract and 2 of the top ten tumors with mortality come from the upper digestive tract, namely gastric cancer and esophageal cancer, respectively, the diseases from the upper digestive tract not only seriously threaten the life quality and life safety of patients, but also bring huge health burden to the world. It is known that the cure rate of upper digestive tract cancer is much higher in the early stage than in the late stage, and therefore it is important for patients that an endoscopist can diagnose early stage disorders from endoscopic images.

However, when an endoscopist diagnoses early diseases according to an endoscopic image, it is very important to acquire a lesion region in the endoscopic image, and there are still many problems in the current endoscopist when diagnosing according to the endoscopic image, for example, a large number of endoscopic images are generated in each endoscopic examination, the manual examination speed and efficiency of the endoscopist are slow, the time required for determining the lesion region is long, and the manual detection method is very dependent on the experience and mental state of the endoscopist, which not only affects the diagnosis speed, but also causes a high rate of missed diagnosis and misdiagnosis in the diagnosis.

Therefore, how to determine the lesion region of the upper digestive tract more efficiently becomes an important problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an upper gastrointestinal lesion area determining method and device based on multitask assistance, and the lesion area can be determined more efficiently.

A first aspect of embodiments of the present application provides a method for determining a lesion region of an upper gastrointestinal tract based on multitask assistance, where the method includes:

inputting an upper gastrointestinal endoscope image to be processed into a trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed, wherein the classification model is obtained by training through a training data set;

searching the upper digestive tract endoscope image to be processed by using a searching model, and selecting a first subdata set with similar characteristics to the upper digestive tract endoscope image to be processed from the training data set;

determining a final lesion category of the to-be-processed upper gastrointestinal endoscope image according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set;

and if the final lesion category belongs to a preset lesion category, performing semantic segmentation on the upper gastrointestinal endoscope image to be processed by using a trained segmentation model to obtain a lesion area in the upper gastrointestinal endoscope image to be processed.

A second aspect of embodiments of the present application provides an upper gastrointestinal lesion region determining apparatus based on multitask assistance, the lesion region determining apparatus including:

the classification module is used for inputting the upper gastrointestinal endoscope image to be processed into a trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed, wherein the classification model is obtained by training through a training data set;

the retrieval module is used for retrieving the upper gastrointestinal endoscope image to be processed by using a retrieval model and selecting a first sub data set with similar characteristics to the upper gastrointestinal endoscope image to be processed from the training data set;

a category determination module, configured to determine a final lesion category of the to-be-processed upper gastrointestinal endoscope image according to the lesion category of the endoscope sample image in the first sub data set and the initial lesion category;

and the segmentation module is used for performing semantic segmentation on the upper gastrointestinal endoscope image to be processed by using the trained segmentation model if the final lesion category belongs to a preset lesion category, so as to obtain a lesion area in the upper gastrointestinal endoscope image to be processed.

A third aspect of an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor when executing the computer program implementing the method for determining an upper gastrointestinal lesion region based on multitask assistance according to the first aspect.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, which stores a computer program, which when executed by a processor, implements the multitask assist based upper gastrointestinal lesion region determining method according to the first aspect.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the method for determining an upper gastrointestinal lesion region based on multitask assistance according to the first aspect.

In the embodiments of the present application: inputting an upper gastrointestinal endoscope image to be processed into a trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed, searching the upper gastrointestinal endoscope image to be processed through a search model, selecting a first sub data set with similar characteristics to the upper gastrointestinal endoscope image to be processed from a training data set, determining a final lesion category of the upper gastrointestinal endoscope image to be processed according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set because the endoscope sample image in the first sub data set is correctly labeled with the lesion category, facilitating an endoscopist to carry out primary diagnosis on the endoscope image, and if the lesion category obtained by the primary diagnosis belongs to a preset lesion category, carrying out semantic segmentation on the upper gastrointestinal endoscope image to be processed by using the trained segmentation model, the method and the device have the advantages that the lesion area in the upper gastrointestinal endoscope image to be processed is obtained, the classification, image retrieval and segmentation tasks are carried out on the upper gastrointestinal endoscope image to be processed, the process that an endoscopist manually inspects a large number of endoscope images is reduced by executing the three tasks, time and energy are saved, and therefore the lesion area is determined more efficiently.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a method for determining an upper gastrointestinal lesion area based on multitask assistance according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of different tasks performed by the training data set and the testing data set;

FIG. 3 is a diagram showing the results of classification and retrieval of an endoscopic image of the upper digestive tract to be processed;

FIG. 4 is a view showing a result of segmentation of an endoscopic image of the upper digestive tract to be processed;

FIG. 5 is a diagram showing the result of the method for determining lesion region in upper digestive tract based on multitask assistance according to the first embodiment of the present application;

fig. 6 is a schematic flowchart of a method for determining an upper gastrointestinal lesion area based on multitask assistance according to a second embodiment of the present application;

FIG. 7 is a diagram of a confusion matrix for a method of upper gastrointestinal lesion area determination based on multitask assistance;

FIG. 8 is a ROC (receiver operating characteristics) graph of the upper digestive tract lesion region determination method based on multitask assistance;

fig. 9 is a schematic structural diagram of an upper gastrointestinal lesion region determination device based on multitask assistance according to a third embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In order to explain the technical solution of the present application, the following description is given by way of specific examples.

Referring to fig. 1, a flowchart of a method for determining an upper gastrointestinal lesion area based on multitask assistance according to an embodiment of the present application is shown, where the method for determining an upper gastrointestinal lesion area is applied to a terminal device, and the method for determining an upper gastrointestinal lesion area as shown in fig. 1 may include the following steps:

step 101, inputting an upper gastrointestinal endoscope image to be processed into the trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed.

The to-be-processed upper gastrointestinal endoscope image may refer to any upper gastrointestinal endoscope image captured by an endoscope, and after the upper gastrointestinal endoscope image is captured by the endoscope, the captured upper gastrointestinal endoscope image may be sent to the terminal device, and the terminal device may acquire the to-be-processed upper gastrointestinal endoscope image by receiving the upper gastrointestinal endoscope image sent by the endoscope. The upper digestive tract may refer to any upper digestive tract, such as the esophagus, stomach, pharynx, etc., of the human digestive tract.

In the embodiment of the application, the classification model is obtained by training by using a training data set, and the classification model is used for acquiring the initial lesion classification of the upper gastrointestinal endoscope image to be processed.

The training data set comprises N pieces of endoscope sample images with first labels, and the first labels are used for indicating lesion types of the endoscope sample images, namely the first labels of the endoscope sample images are obtained by artificial marking, in order to ensure the correctness of the artificial marking of the first labels in the endoscope sample images, firstly, lesion types of all the endoscope sample images are determined through a preliminary upper gastrointestinal endoscopy examination report and a pathological result, and finally, the lesion types of the endoscope sample images are manually marked by a plurality of endoscopists who have years (for example, more than 10 years) of experience in endoscopic surgery.

For example, if the above-mentioned upper gastrointestinal endoscope image to be processed is an esophageal endoscope image, the N endoscope sample images included in the training data set are N esophageal endoscope images. 1003 images of the esophagus endoscope of 319 patients from 2016 to 2019 of a hospital can be collected to obtain a training data set, the 1003 images of the esophagus endoscope are randomly divided into the training data set and a test data set according to the ratio of 4:1, namely 805 images of the esophagus endoscope subjected to manual marking are used as the training data set, and 194 images of the esophagus endoscope subjected to manual marking are used as the test data set. The test data set is used for testing the classification accuracy of the trained classification model, and when no repeated image exists in the training data set and the test data set, the confidence of the classification accuracy tested by the test data set is higher.

Since esophageal endoscopy is used in clinical practice for routine screening or preoperative examinations, all esophageal endoscopic images are acquired using a conventional endoscope with standard inner diameter white light imaging and narrow band imaging, and are obtained via different standards of single accessory channel endoscopy (e.g., different standards of single accessory model may be: GIF-Q240Z, GIF-RQ260Z, GIF-FQ260Z, GIF-H260Z, GIF-Q260J GIF-H290Z, GIF-HQ290, GIF-XP290N, Olympus, Tokyo, Japan).

It should be understood that the above-mentioned upper gastrointestinal endoscope image to be processed corresponds to a training data set used for training a classification model, that is, any one of the endoscope images is classified using the classification model, and then the classification model is trained using the training data set corresponding to any one of the endoscope images. If the to-be-processed upper gastrointestinal endoscope is an esophageal endoscope image, the esophageal endoscope images with the first label are all included in the training data set for training the classification model, and if the to-be-processed upper gastrointestinal endoscope image is a stomach endoscope image, the stomach endoscope images with the first label are all included in the training data set for training the classification model.

In one possible implementation, the training process of the classification model may include:

sequentially inputting endoscope sample images in a training data set into a classification model, wherein the classification model at the moment is a model pre-trained on the basis of ImageNet, the classification model comprises a first input layer, 16 first rolling layers, 3 first full-link layers, 5 first maximum pooling layers and a first output layer, when the model pre-trained on the basis of ImageNet is continuously trained by adopting the training data set, the endoscope sample images firstly enter the first rolling layers through the first input layer in the classification model, wherein the sizes of convolution kernels of the first rolling layers can be 3 x 3, a Relu activation function is adopted in the first rolling layers to further obtain a 3 x 3 characteristic diagram of the endoscope sample images, secondly, the 3 x 3 characteristic diagram of the endoscope sample images is input into the first maximum pooling layers, unimportant characteristics in the 3 x 3 characteristic diagram can be removed, and parameter parameters in the classification model are further reduced, wherein, the window size of the first max pooling layer may be 2 × 2. And finally, classifying the endoscope sample images by adopting a first full-connection layer, wherein the input of the first full-connection layer is the 2 x 2 feature map extracted by the first convolution layer and the first maximum pooling layer, the output layer connected with the last full-connection layer is a classifier, the number of neurons of the output layer (namely the number of lesion types of the endoscope sample images) can be set, for example, the number of the neurons is 3, and the possible lesion types of the endoscope sample images are represented as 3.

The classification model is output as a lesion type of each endoscope sample image, and the lesion type of each endoscope sample image output by the classification model is compared with a lesion type indicated by a first label of each endoscope sample image to obtain first difference information. Based on the first difference information, performing model back propagation on the classification model by adopting a focalloss function, adjusting the class weight in the classification model, and then classifying the endoscope sample images in the training data set again until the value of the focalloss function is optimal, obtaining the classification model at the moment, wherein the classification model at the moment is the trained classification model.

In one possible embodiment, inputting the upper gastrointestinal endoscope image to be processed into the trained classification model, and obtaining the initial lesion class of the upper gastrointestinal endoscope image to be processed comprises:

inputting the upper gastrointestinal endoscope image to be processed into a trained classification model, and classifying the upper gastrointestinal endoscope image to be processed;

calculating probability values of the upper digestive tract endoscope image to be processed belonging to each lesion category by using a Softmax function;

and determining the lesion category with the maximum probability value in the probability values of all lesion categories as the initial lesion category of the endoscope image of the upper digestive tract to be processed.

In the embodiment of the present application, as shown in fig. 2, a schematic flow chart of different tasks performed by the training data set and the testing data set is shown, wherein the training data set executes a training task, the testing data set executes a testing task, the classification accuracy of the classification model after the classification model in the upper digestive tract lesion area determination device is tested by using the testing data set is obtained, when the classification precision reaches the preset precision, classifying the upper gastrointestinal endoscope image to be processed by using the trained classification model, if the number of the neurons of the first output layer in the classification model is 3, calculating probability values of the to-be-processed upper gastrointestinal endoscope image respectively belonging to the 3 lesion classes by adopting a softmax function, and determining the lesion class corresponding to the maximum probability value in the probability values belonging to the 3 lesion classes as an initial lesion class of the to-be-processed upper gastrointestinal endoscope image.

Illustratively, the esophageal endoscopic image to be processed is input into a trained classification model, the number of neurons of a first output layer of the trained classification model is 3, and the lesion classes are three classes, namely 'esophageal cancer', 'esophagitis' and 'normal'. The probability values of the esophageal endoscope image to be processed belonging to the 3 categories are calculated by using a Softmax function as follows: the probability value of belonging to esophageal cancer is 99.22%; the probability value of belonging to esophagitis is 0.49%; the probability value of the normal is 0.29%, so that the initial lesion type of the esophageal endoscopic image to be processed can be determined to be esophageal cancer.

And 102, searching the upper digestive tract endoscope image to be processed by using a search model, and selecting a first sub data set with similar characteristics to the upper digestive tract endoscope image to be processed from the training data set.

In the embodiment of the application, the retrieval model shares a classification model, and a deep learning model is adopted to realize two tasks of classification and retrieval, namely, a full connection layer and a Sigmoid function are added in front of a first output layer of the classification model to perform binary hash codes (binary hashcodes), the full connection layer is called an image retrieval layer, and neurons of the layer have activation values between [0,1] through the Sigmoid function. A coarse-to-fine (coarse-to-fine) search strategy is adopted at an image retrieval layer to quickly select a first sub data set with similar characteristics with an upper gastrointestinal endoscope image to be processed from a training data set.

The retrieval of the upper gastrointestinal endoscope image to be processed can adopt an image retrieval technology, a first sub data set with similar characteristics to the upper gastrointestinal endoscope image to be processed is searched and selected from a training data set which is manually marked with a correct lesion category, the first sub data set is used as a further basis for determining the final lesion category of the upper gastrointestinal endoscope image to be processed, and meanwhile, the initial lesion category output by the classification model can be further corrected.

In one possible embodiment, the searching the upper gastrointestinal endoscope image to be processed by using the search model, and the selecting the first sub data set having similar characteristics to the upper gastrointestinal endoscope image to be processed from the training data set comprises:

determining the image characteristics of the upper gastrointestinal endoscope image to be processed based on the image characteristics extracted from the image characteristics output by the image retrieval layer;

determining a coding value of the upper gastrointestinal endoscope image to be processed according to the image characteristics of the upper gastrointestinal endoscope image to be processed;

determining the coding value of each endoscope sample image in the training data set according to the image characteristics of each endoscope sample image in the training data set;

calculating the Hamming distance between the coding value of the upper digestive tract endoscope image to be processed and the coding value of each endoscope sample image in the training data set;

determining a candidate data set from the training data set according to the Hamming distance;

calculating Euclidean distances between an upper gastrointestinal endoscope image to be processed and each endoscope sample image in the candidate data set;

and according to the ranking of the Euclidean distance from small to large, selecting a first sub data set with similar characteristics to the endoscope image of the upper digestive tract to be processed from the candidate data set, wherein the first sub data set comprises endoscope sample images of K before ranking in all Euclidean distances, and K is an integer larger than zero.

In the embodiment of the present application, since the classification model is shared by the search models, the search models include the image features of the respective endoscopic sample images in the training data set, and the image features of the respective endoscopic sample images are feature maps output by the first largest pooling layer of the classification model.

Illustratively, inputting an upper gastrointestinal endoscope image to be processed into a retrieval model, firstly acquiring image features of the upper gastrointestinal endoscope image to be processed extracted at a first maximum pooling layer and image features of each endoscope sample image in a training data set, expressing the image features of the upper gastrointestinal endoscope image to be processed extracted from the first maximum pooling layer by out (h) for the upper gastrointestinal endoscope image to be processed, and then obtaining a binary code thereof by setting a threshold value, wherein for each byte i being 1,2,3, …, n (n being the number of nodes of an image retrieval layer), the threshold value is set to 0.5, and the binary code of the upper gastrointestinal endoscope image to be processed is:

by adopting the manner of acquiring the binary code according to the image characteristics, and according to the image characteristics of each endoscope sample image in the training data set, acquiring the binary code of each endoscope sample image in the training data set is as follows: h₁,H₂,H₃,…,H_m(m is the number of endoscopic sample images in the training dataset), where the training dataset may be represented as Γ ═ I₁,I₂,I₃,…,I_m}。

Secondly, if an endoscopic image I of the upper digestive tract to be processed is given_qAnd binary coding H of the endoscopic image of the upper digestive tract to be processed_qFrom a binary code H of the endoscopic image of the upper digestive tract to be processed_qAnd binary coding H of endoscopic sample images in training dataset_mThe hamming distance between, and the candidate image set P ═ { I ═ is obtained from the training data set₁,I₂,I₃,…,I_gAnd the Hamming distance between the code value of each endoscope sample image in the candidate data set and the code value of the upper digestive tract endoscope image to be processed is smaller than a distance threshold value.

Finally, the upper digestive tract endoscope image I to be processed can be calculated_qAnd candidate image set P ═ { I ═ I₁,I₂,I₃,…,I_gDetermining the Euclidean distance of the endoscope sample image in the sequence I to be processed for the upper gastrointestinal endoscope image I_qAnd candidate image set P ═ { I ═ I₁,I₂,I₃,…,I_gSimilarity of each endoscopic sample image in (1) is larger when distance is smaller. The euclidean distance may be specifically obtained by the following equation:

wherein, Λ_qFor the endoscopic image I of the upper digestive tract to be processed_qIs determined by the feature vector of (a),

aggregating endoscopic sample images I for candidate images_gThe feature vector of (1), i | x | means that the norm, s, of x is calculated_lFor the endoscopic image I of the upper digestive tract to be processed_qAnd candidate image set P ═ { I ═ I₁,I₂,I₃,…,I_gEndoscope sample image I_gThe euclidean distance of (c).

The smaller the Euclidean distance is, the higher the similarity of the two images is, and after each endoscope sample image in the candidate image set is arranged from small to large according to the Euclidean distance, the endoscope sample image K before ranking can be obtained. As shown in fig. 3, which is a diagram of classification and search results of an upper gastrointestinal endoscope image to be processed, it can be derived from the images that the initial lesion category of the upper gastrointestinal endoscope image to be processed input in fig. 3(a) is esophageal cancer, and endoscope sample images having similar features and ranked from small to large in euclidean distance of the candidate image set and the upper gastrointestinal endoscope image to be processed are output; the initial lesion category of the to-be-processed upper gastrointestinal endoscopic image input in fig. 3(b) is esophagitis, and endoscopic sample images having similar features and ranked top 5 in euclidean distance from small to large in the candidate image set and the to-be-processed upper gastrointestinal endoscopic image are output; the initial lesion type of the to-be-processed upper gastrointestinal endoscope image input in fig. 3(c) is normal, and an endoscope sample image having similar features and ranked from small to large by the euclidean distance of top 5 is also output as well.

And 103, determining the final lesion category of the endoscope image of the upper digestive tract to be processed according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set.

In the embodiment of the present application, the endoscope sample image in the first sub data set has been manually marked with the correct lesion type, and the endoscope sample image in the first data set is the most similar image to the endoscope image of the upper gastrointestinal tract to be processed, so that the lesion type of the endoscope sample image in the first data set can be referred to, the final lesion type of the endoscope image of the upper gastrointestinal tract to be processed can be determined, and the confidence of the final lesion type can be made higher.

In one possible embodiment, determining the final lesion category of the endoscope image of the upper digestive tract to be processed according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set comprises:

acquiring the number of first images corresponding to different lesion types according to the lesion types of the endoscope sample images in the first sub-data set;

if the lesion category corresponding to the maximum value in the first image quantity corresponding to different lesion categories is the same as the initial lesion category, determining the initial lesion category as a final lesion category; or

And if the lesion category corresponding to the maximum value in the first image quantity is different from the initial lesion category and the ratio of the image quantity of the maximum value in the first image quantity to the K is greater than the probability value of the initial lesion category, determining the lesion category corresponding to the maximum value in the first image quantity as the final lesion category, otherwise, determining the initial lesion category as the final lesion category.

In this embodiment of the application, a voting algorithm may be used to obtain the number of images corresponding to different lesion types of each endoscope sample image in the first sub-data set.

Taking the to-be-processed upper gastrointestinal endoscope image as the to-be-processed esophageal endoscope image, if the first sub-data set includes 5 endoscope sample images, and the corresponding lesion types are esophageal cancer, normal esophageal cancer, and esophagitis, respectively, the number of the first images corresponding to esophageal cancer is 3, the number of the first images corresponding to normal esophageal cancer is 1, and the number of the first images corresponding to esophagitis is 1, if the initial lesion type of the to-be-processed esophageal endoscope image is esophageal cancer, the final lesion type of the to-be-processed esophageal endoscope image is determined to be esophageal cancer, if the initial lesion type of the to-be-processed esophageal endoscope image is esophagitis, and the probability value corresponding to esophagitis is 90% for calculation, the maximum number of the images in the first images can be obtained according to the lesion types corresponding to the 5 endoscope sample images included in the first sub-data set, and the ratio of the maximum number of images in the first image to K is 3/5, and since the ratio is less than 90%, the initial lesion category esophagitis of the esophageal endoscopic image to be processed is determined as the final lesion category.

And 104, if the final lesion type belongs to the preset lesion type, performing semantic segmentation on the upper gastrointestinal endoscope image to be processed by using the trained segmentation model to obtain a lesion area in the upper gastrointestinal endoscope image to be processed.

In the embodiment of the present application, a segmentation model is trained by using a training data set, and the segmentation model is used for acquiring a lesion region in an endoscopic image of an upper gastrointestinal tract to be processed when a final lesion category belongs to a preset lesion category.

The preset lesion type may be a cancer type of an upper gastrointestinal tract, such as esophageal cancer, and if the preset lesion type is set as esophageal cancer, all training data sets used for training the segmentation model may be esophageal cancer endoscope sample images, where the esophageal cancer endoscope sample images in the training data sets all have a second label, and the second label is used for indicating a lesion area in each esophageal cancer endoscope sample image, and the second label is obtained by manual labeling.

In one possible implementation, the training process of the segmentation model may include:

inputting the training data set into a segmentation model, and outputting lesion areas of all endoscope sample images in the training data set through a second output layer;

comparing the lesion area of each endoscope sample image in the training data set with the lesion area indicated by the second label to obtain second difference information;

and performing model back propagation on the segmentation model by adopting a mean square error loss function based on the second difference information to obtain the trained segmentation model.

In the embodiment of the application, a training data set is input into a segmentation model, the segmentation model comprises a second input layer, a second convolution layer, a second maximum pooling layer, an expansion convolution layer and a second output layer, wherein the convolution kernel of the convolution layer is 3 × 3, and a Relu activation function is adopted to obtain a 3 × 3 feature map of an upper gastrointestinal endoscope image to be processed; secondly, a maximum pooling layer with the size of 2 multiplied by 2 windows is adopted to remove unimportant features in the 3 multiplied by 3 feature diagram, and the number of parameters in the classification model is further reduced. The convolution kernel of the final output layer is 1 × 1, and the output is the most important feature map (i.e., the lesion region of each endoscopic sample image in the training data set).

The segmentation model is output as a lesion region of each endoscope sample image, and the lesion region of each endoscope sample image output by the segmentation model is compared with the lesion region indicated by the second label of each endoscope sample image, so that second difference information is obtained. And performing model back propagation on the segmentation model by adopting a mean square error loss function based on the second difference information, performing semantic segmentation on the endoscope sample image in the training data set again after adjusting the weight value of each layer in the segmentation model, and obtaining the segmentation model at the moment when the value of the mean square error loss function is optimal, wherein the segmentation model at the moment is a trained segmentation model.

Optionally, the 6 second convolutional layers of the segmentation model before the second output layer can be replaced by the dilated convolutional layers, which can increase the field of view of the convolutional kernel while keeping the number of parameters unchanged, so that the resolution of the image is increased in the upsampling process of the segmentation model, the loss of the image is effectively prevented, and the accuracy of segmentation is improved.

In one possible embodiment, performing semantic segmentation on the to-be-processed upper gastrointestinal endoscope image by using the trained segmentation model, and obtaining a lesion region in the to-be-processed upper gastrointestinal endoscope image comprises:

inputting the final upper gastrointestinal endoscope image to be processed, of which the lesion category belongs to the preset lesion category, into the trained segmentation model, and performing semantic segmentation on the upper gastrointestinal endoscope image to be processed;

and determining the lesion area output by the second output layer as the lesion area in the endoscope image of the upper digestive tract to be processed.

In the embodiment of the application, because the endoscope sample images in the training data set are all marked with correct lesion areas manually, the endoscope images of the upper digestive tract to be processed, which belong to the preset lesion category, are input into the trained segmentation model, and are subjected to semantic segmentation, so that the lesion areas can be segmented from the endoscope images of the upper digestive tract to be processed. As shown in fig. 4, which is a segmentation result diagram of an endoscopic image of an upper gastrointestinal tract to be processed, fig. 4(a) is an original picture of the endoscopic image of the upper gastrointestinal tract to be processed, fig. 4(b) is a lesion region that is marked manually by an endoscopist, and fig. 4(c) is a segmentation result diagram output by a segmentation model, it can be seen that the segmentation model provided by the present application can segment the lesion region from the endoscopic image of the upper gastrointestinal tract to be processed, and the segmentation accuracy is high.

Fig. 5 is a diagram showing a result of a method for determining an upper gastrointestinal tract lesion area based on multitask assistance according to an embodiment of the present invention, taking an upper gastrointestinal tract endoscopic image to be processed as an example of an esophageal endoscopic image, first inputting 3 esophageal endoscopic images to be processed, classifying and retrieving the 3 esophageal endoscopic images to be processed, obtaining a probability of a lesion category to which each image belongs and 5 images with similar features, then determining an initial lesion category of each image according to the probability of each lesion category to which each image belongs, and obtaining the lesion category to which the image 1 belongs as esophageal cancer and as a preset lesion category according to the inputted image 1, image 2, and image 3 in fig. 5. And finally, inputting the picture 1 into a segmentation model, performing semantic segmentation on the picture 1, and segmenting a lesion area from the picture 1.

In the embodiment of the application, an upper gastrointestinal endoscope image to be processed is input into a trained classification model to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed, the upper gastrointestinal endoscope image to be processed is retrieved through a retrieval model, a first sub data set with similar characteristics to the upper gastrointestinal endoscope image to be processed is selected from a training data set, as the lesion category is correctly marked on an endoscope sample image in the first sub data set, a final lesion category of the upper gastrointestinal endoscope image to be processed can be determined according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set, an endoscopist can perform preliminary diagnosis on the endoscope image conveniently, and if the lesion category obtained by the preliminary diagnosis belongs to a preset lesion category, the trained segmentation model is used for performing semantic segmentation on the upper gastrointestinal endoscope image to be processed, the method and the device have the advantages that the lesion area in the upper gastrointestinal endoscope image to be processed is obtained, the classification, image retrieval and segmentation tasks are carried out on the upper gastrointestinal endoscope image to be processed, the process that an endoscopist manually inspects a large number of endoscope images is reduced by executing the three tasks, time and energy are saved, and therefore the lesion area is determined more efficiently.

Referring to fig. 6, a flowchart of a method for determining an upper gastrointestinal lesion region based on multitask assistance according to a second embodiment of the present application is shown, where the method for determining an upper gastrointestinal lesion region as shown in fig. 6 may include the following steps:

step 601, collecting a test data set to test the classification performance of the trained classification model.

In the embodiment of the application, the test data set comprises G endoscope test images, G is an integer greater than zero, each endoscope test image is manually marked with a correct lesion type and a correct lesion area, the test data set is two data sets which are not overlapped with the training data set completely, the trained classification model is tested by using the test data set, and the obtained classification performance confidence coefficient is high. The classification performance of the classification model can be determined by obtaining a confusion matrix and an ROC (Receiver Operating Characteristic) curve of the classification model according to the labeled lesion class (i.e., the correct lesion class indicated by the third label) of each endoscopic test image in the test data set and the final lesion class obtained through the classification model.

In a further specific implementation, a confusion matrix map of the method for determining an upper gastrointestinal tract lesion area based on multi-task assistance as shown in fig. 7 can be obtained by counting the labeled lesion category and the final lesion category of each endoscopic test image, where the abscissa in the confusion matrix map is a predicted value (i.e., the final lesion category of each endoscopic test image), the ordinate is a true value (i.e., the labeled lesion category of each endoscopic test image), and the color of a matrix block in the matrix represents the corresponding probability. For example, a matrix block representation with normal abscissa and ordinate: the true value is normal, the probability that the final lesion category of the endoscope test image obtained by the classification model is also normal, the probability that the horizontal and vertical coordinates are esophageal cancer and the probability that the horizontal and vertical coordinates are esophagitis can be more than 80% according to the corresponding probability of the illustrated color, an ROC curve graph of the upper gastrointestinal lesion area determination method based on multi-task assistance as shown in fig. 8 can be drawn according to the corresponding probability of each matrix block in the confusion matrix, the solid line in the graph is an ROC curve, and the size of AUC (area Under curve) can be obtained by calculating the area of a closed area surrounded by the ROC curve, the horizontal axis and the axis with the specificity equal to 1, wherein the AUC is an evaluation index for measuring the superiority and inferiority of the classification model, and the larger value of the AUC indicates the classification performance of the classification model is better. The AUC value calculated by the classification model in the application is 0.9435, which can indicate that the classification performance of the classification model is better.

In the embodiment of the application, the accuracy, the precision, the sensitivity, the specificity and the negative predictive value of the classification model can be obtained according to the confusion matrix and the ROC curve of the classification model.

By performing calculations using the confusion matrix in fig. 7 and the data in the ROC curve in fig. 8, the accuracy, precision, sensitivity, specificity, and negative predictive value of the classification model can be obtained. The classification results obtained by the classification model of the application and the classification results manually marked by an endoscopist are respectively compared with the actual belonged categories of the endoscope test images in the test data set, and corresponding index values can be obtained as shown in table 1:

TABLE 1

According to the data in table 1, the accuracy, precision, sensitivity, specificity and negative predictive value of the classification result of the classification model of the present application are respectively 8.75%, 13.45%, 9.58%, 6.25% and 6.69% higher than those of the classification result manually by an endoscopist.

In order to verify the effect of the classification model of the present application on the improvement of the manual classification efficiency of the endoscopist, the classification result of the classification model of the present application referred to by the endoscopist and the classification result of the classification model of the present application not referred to by the endoscopist are respectively compared with the actual belonged category of the endoscopic test image in the test data set, and corresponding index values can be obtained as shown in table 2:

TABLE 2

According to the data in the table 2, the classification accuracy, precision, sensitivity, specificity and negative predictive value of the endoscopist are improved after referring to the classification result of the classification model of the application. Therefore, the method provided by the invention can be used for more efficiently classifying the upper gastrointestinal endoscope image to be processed.

Step 602, testing the segmentation performance of the trained segmentation model by using the test data set.

In this embodiment of the present application, the Dice coefficient (set similarity metric function) and the average IOU (overlap degree) index of the segmentation model may be obtained according to the marked lesion region (that is, the lesion region indicated by the fourth label) of each endoscopic test image and the lesion region obtained by segmenting the model, so as to determine the segmentation performance of the segmentation model.

In a specific implementation, obtaining the Dice coefficient and the average IOU index of the segmentation model may be performed in the following manner:

wherein, A is the area of the marked lesion area, and B is the area of the lesion area obtained by each endoscope test image through a segmentation model.

Wherein, the numerator is the intersection area of A and B, and the denominator is the union area of A and B.

From the above equation, the Dice coefficient and average IOU of the segmentation model can be calculated to be 0.7784 and 0.6563, respectively, by testing the test data set of the present application. And the larger the value of the Dice coefficient and the average IOU is, the better the segmentation performance of the segmentation model is.

Step 603, inputting the upper gastrointestinal endoscope image to be processed into the trained classification model to obtain the initial lesion category of the upper gastrointestinal endoscope image to be processed.

And step 604, searching the upper gastrointestinal endoscope image to be processed by using the search model, and selecting a first sub data set with similar characteristics to the upper gastrointestinal endoscope image to be processed from the training data set.

And step 605, determining a final lesion category of the endoscope image of the upper digestive tract to be processed according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set.

And 606, if the final lesion type belongs to the preset lesion type, performing semantic segmentation on the upper gastrointestinal endoscope image to be processed by using the trained segmentation model to obtain a lesion area in the upper gastrointestinal endoscope image to be processed.

Steps 603-606 of this embodiment are the same as steps 101-104 of the previous embodiment, and reference may be made to these steps, which are not described herein again.

Compared with the first embodiment of the application, in the method provided in the second embodiment of the application, the trained classification model and segmentation model are respectively tested by adopting the test data set, and the performances of the classification model and segmentation model are respectively evaluated by adopting different evaluation indexes, so that the classification performance of the classification model and the segmentation performance of the segmentation model can be effectively evaluated, and when the classification performance of the classification model and the segmentation performance of the segmentation model reach corresponding indexes, the upper gastrointestinal endoscope image to be processed is input into the classification model and the segmentation model, so that the output lesion area is more accurate.

Referring to fig. 9, a schematic structural diagram of a lesion region determining device based on multitask assistance according to a third embodiment of the present application is shown, and for convenience of explanation, only the portions related to the third embodiment of the present application are shown.

The multitask-assisted upper gastrointestinal lesion region determining device specifically comprises the following modules:

a classification module 901, configured to input an upper gastrointestinal endoscope image to be processed into a trained classification model, so as to obtain an initial lesion category of the upper gastrointestinal endoscope image to be processed, where the classification model is obtained by using a training data set for training;

a retrieval module 902, which retrieves the upper gastrointestinal endoscope image to be processed by using a retrieval model, and selects a first sub data set having similar characteristics to the upper gastrointestinal endoscope image to be processed from the training data set;

a category determination module 903, configured to determine a final lesion category of the to-be-processed upper gastrointestinal endoscope image according to the lesion category and the initial lesion category of the endoscope sample image in the first sub data set;

and a segmentation module 904, configured to perform semantic segmentation on the to-be-processed upper gastrointestinal endoscope image by using the trained segmentation model if the final lesion category belongs to the preset lesion category, so as to obtain a lesion area in the to-be-processed upper gastrointestinal endoscope image.

In the embodiment of the present application, the classification model may specifically include the following sub-modules in the training process:

the class output submodule is used for inputting the training data set into the classification model and outputting the lesion class of each endoscope sample image in the training data set through a first output layer, wherein the output layer in the classification model is a full-connection layer containing M neurons, the M neurons represent M lesion classes, and M is an integer greater than zero;

the first comparison submodule is used for comparing the lesion type of each endoscope sample image in the training data set with the lesion type indicated by the first label to obtain first difference information;

and the first back propagation submodule is used for performing model back propagation on the classification model by adopting a focal local function based on the first difference information to obtain the trained classification model.

In this embodiment, the classification module 901 may specifically include the following sub-modules:

the classification submodule is used for inputting the upper gastrointestinal endoscope image to be processed into the trained classification model and classifying the upper gastrointestinal endoscope image to be processed;

the probability calculation submodule is used for calculating the probability value of the upper digestive tract endoscope image to be processed belonging to each lesion category by using a Softmax function;

and the maximum probability determination submodule is used for determining the lesion category with the maximum probability value in the probability values of all the lesion categories as the initial lesion category of the endoscope image of the upper digestive tract to be processed.

In this embodiment, the retrieving module 902 may specifically include the following sub-modules:

the characteristic determination submodule is used for determining the image characteristics of the upper gastrointestinal endoscope image to be processed based on the image characteristics extracted from the image characteristics output by the image retrieval layer;

the first code determining submodule is used for determining a code value of the upper gastrointestinal endoscope image to be processed according to the image characteristics of the upper gastrointestinal endoscope image to be processed;

the second code determining submodule is used for determining the code value of each endoscope sample image in the training data set according to the image characteristics of each endoscope sample image in the training data set;

the first distance calculation submodule is used for calculating the Hamming distance between the coding value of the upper digestive tract endoscope image to be processed and the coding value of each endoscope sample image in the training data set;

the candidate determining submodule is used for determining a candidate data set from the training data set according to the Hamming distance, and the Hamming distance between the coding value of each endoscope sample image in the candidate data set and the coding value of the upper digestive tract endoscope image to be processed is smaller than a distance threshold value;

the second distance calculation submodule is used for calculating Euclidean distances between the endoscope images of the upper digestive tract to be processed and the endoscope sample images in the candidate data set;

and the sorting submodule is used for selecting a first sub data set with similar characteristics to the endoscope image of the upper digestive tract to be processed from the candidate data set according to the ranking of the Euclidean distance from small to large, wherein the first sub data set comprises the endoscope sample images of K before the ranking in all the Euclidean distances, and K is an integer larger than zero.

In this embodiment, the category determining module 903 may specifically include the following sub-modules:

the quantity obtaining submodule is used for obtaining the quantity of first images corresponding to different lesion types according to the lesion types of the endoscope sample images in the first sub data set;

the first final determining submodule is used for determining the initial lesion category as a final lesion category if the lesion category corresponding to the maximum value in the first image quantity corresponding to different lesion categories is the same as the initial lesion category;

and the second final determining submodule is used for determining the lesion category corresponding to the maximum value in the first image quantity as the final lesion category if the lesion category corresponding to the maximum value in the first image quantity is different from the initial lesion category and the ratio of the image quantity of the maximum value in the first image quantity to the K is greater than the probability value of the initial lesion category, otherwise, determining the initial lesion category as the final lesion category. .

In the embodiment of the present application, the segmentation model may specifically include the following sub-modules in the training process:

the region output submodule is used for inputting the training data set into the segmentation model and outputting the lesion region of each endoscope sample image in the training data set through the second output layer;

the second comparison submodule is used for comparing the lesion area of each endoscope sample image in the training data set with the lesion area indicated by the second label to obtain second difference information;

and the second back propagation submodule is used for performing model back propagation on the segmentation model by adopting a mean square error loss function based on the second difference information to obtain the trained segmentation model.

In this embodiment, the segmentation module 904 may specifically include the following sub-modules:

the semantic segmentation submodule is used for inputting the upper gastrointestinal endoscope image to be processed, of which the final lesion category belongs to the preset lesion category, into the trained segmentation model and performing semantic segmentation on the upper gastrointestinal endoscope image to be processed;

and the region determining submodule is used for determining the lesion region output by the second output layer as the lesion region in the upper digestive tract endoscope image to be processed.

In an embodiment of the present application, the multitask-assisted upper gastrointestinal lesion region determining apparatus may further include:

the classification testing module is used for testing the classification performance of the trained classification model by adopting a testing data set;

the segmentation test module is used for testing the segmentation performance of the trained segmentation model by adopting a test data set;

the device for determining the lesion region of the upper digestive tract based on multitask assistance provided by the embodiment of the application can be applied to the embodiments of the method, and for details, reference is made to the description of the embodiments of the method, and details are not repeated here.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same. Although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for determining an upper digestive tract lesion area based on multitask assistance is characterized by comprising the following steps:

2. The upper gastrointestinal lesion region determining method of claim 1, wherein the classification model includes a first input layer, 16 first convolution layers, 3 first full-link layers, 5 first maximum pooling layers, and a first output layer, wherein a convolution kernel size of the convolution layers is 3 × 3, a window size of the maximum pooling layers is 2 × 2, the training data set includes N endoscopic sample images having a first label indicating a lesion category of each endoscopic sample image, N is an integer greater than zero, and the training process of the classification model includes:

inputting the training data set into a classification model, and outputting lesion classes of all endoscope sample images in the training data set through the first output layer, wherein the output layer in the classification model is a fully-connected layer containing M neurons, the M neurons represent M lesion classes, and M is an integer greater than zero;

comparing the lesion type of each endoscope sample image in the training data set with the lesion type indicated by the first label to obtain first difference information;

and performing model back propagation on the classification model by adopting a focal local function based on the first difference information to obtain the trained classification model.

3. The method of claim 2, wherein inputting the upper gastrointestinal endoscope image to be processed into the trained classification model to obtain the initial lesion category of the upper gastrointestinal endoscope image to be processed comprises:

inputting an upper gastrointestinal endoscope image to be processed into a trained classification model, and classifying the upper gastrointestinal endoscope image to be processed;

and determining the lesion category with the maximum probability value in the probability values of all lesion categories as the initial lesion category of the to-be-processed upper gastrointestinal endoscope image.

4. The method of claim 1, wherein the search model shares a classification model, and the search model adds an image search layer to the search model based on the classification model, the image search layer is located before and immediately adjacent to an output layer of the classification model, the image search layer includes a full connection layer and a Sigmoid function, the search model is used to search the upper gastrointestinal endoscope image to be processed, and the selecting the first subset of the training data set having similar features to the upper gastrointestinal endoscope image to be processed includes:

determining the coding value of the upper gastrointestinal endoscope image to be processed according to the image characteristics of the upper gastrointestinal endoscope image to be processed;

determining the coding value of each endoscope sample image in a training data set according to the image characteristics of each endoscope sample image in the training data set;

determining a candidate data set from the training data set according to the Hamming distance, wherein the Hamming distance between the coding value of each endoscope sample image in the candidate data set and the coding value of the upper digestive tract endoscope image to be processed is smaller than a distance threshold value;

calculating Euclidean distance between the upper gastrointestinal endoscope image to be processed and each endoscope sample image in the candidate data set;

and selecting a first sub data set with similar characteristics to the endoscope image of the upper digestive tract to be processed from the candidate data set according to the ranking of the Euclidean distance from small to large, wherein the first sub data set comprises endoscope sample images of K before ranking in all Euclidean distances, and K is an integer larger than zero.

5. The upper gastrointestinal lesion region determining method of claim 1, wherein determining the final lesion category of the to-be-processed upper gastrointestinal endoscope image from the lesion category and the initial lesion category of the endoscope sample image in the first sub-data set comprises:

if the lesion type corresponding to the maximum value in the first image quantity corresponding to different lesion types is the same as the initial lesion type, determining that the initial lesion type is a final lesion type; or

If the lesion category corresponding to the maximum value in the first image quantity is different from the initial lesion category, and the ratio of the image quantity of the maximum value in the first image quantity to K is greater than the probability value of the initial lesion category, determining that the lesion category corresponding to the maximum value in the first image quantity is the final lesion category, otherwise, determining that the initial lesion category is the final lesion category.

6. The upper gastrointestinal lesion region determining method of claim 1, wherein the segmentation model includes a second input layer, a second convolution layer, a second maximum pooling layer, a dilated convolution layer, and a second output layer, wherein a convolution kernel of the convolution layer is 3 x 3, a convolution kernel of the output layer is 1 x 1, a window size of the maximum pooling layer is 2 x 2, the training data set further includes a second label indicating a lesion region, and the training process of the segmentation model includes:

inputting the training data set into a segmentation model, and outputting lesion areas of all endoscope sample images in the training data set through the second output layer;

7. The method of claim 6, wherein the semantic segmentation of the to-be-processed upper gastrointestinal endoscope image using the trained segmentation model if the final lesion type belongs to a preset lesion type, to obtain the lesion region in the to-be-processed upper gastrointestinal endoscope image comprises:

inputting the to-be-processed upper gastrointestinal endoscope image of which the final lesion category belongs to a preset lesion category into a trained segmentation model, and performing semantic segmentation on the to-be-processed upper gastrointestinal endoscope image;

and determining the lesion area output by the second output layer as the lesion area in the upper digestive tract endoscope image to be processed.

8. The lesion region determination method according to any one of claims 1 to 7, wherein the inputting of the upper gastrointestinal endoscope image to be processed to the classification model further comprises:

testing the classification performance of the trained classification model by adopting a test data set;

testing the segmentation performance of the trained segmentation model by using the test data set;

the test data set comprises L endoscopic test images with a third label and a fourth label, wherein L is a positive integer less than N, the third label is used for indicating the lesion category of each endoscopic test image, and the fourth label is used for indicating the lesion area of each endoscopic test image.

9. An upper gastrointestinal lesion region determining apparatus based on multitask assistance, characterized in that the lesion region determining apparatus comprises: