CN115345819A

CN115345819A - Gastric cancer image recognition system, device and application thereof

Info

Publication number: CN115345819A
Application number: CN202210461368.2A
Authority: CN
Inventors: 朱圣韬; 张澍田; 闵力; 陈蕾
Original assignee: Beijing Friendship Hospital
Current assignee: Beijing Friendship Hospital
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2022-11-15
Also published as: CN109671053A

Abstract

The invention relates to a stomach cancer image recognition system, a stomach cancer image recognition device and application thereof, wherein the system comprises a data input module, a data preprocessing module, an image recognition model construction module and a lesion recognition module; meanwhile, the system can realize self-training, so that the lesion part in the gastric cancer image can be accurately identified.

Description

Gastric cancer image recognition system, device and application thereof

Technical Field

The invention belongs to the field of medicine, and particularly relates to the technical field of pathological image recognition by using an image recognition system.

Background

Although the incidence of gastric cancer has gradually declined from 1975, nearly 100 million new cases remain in 2012, making it the fifth most common malignancy in the world. In terms of mortality, gastric cancer is the third leading cause of cancer death in the world.

The prognosis of gastric cancer depends to a large extent on its ramifications. Research shows that the 5-year survival rate of stomach precancer is almost over 90 percent, while the survival rate of stomach cancer in the advanced stage is lower than 20 percent. Therefore, early detection and regular follow-up diagnosis in the population suffering from high-risk cancer are the most effective means for reducing the incidence rate of gastric cancer and improving the survival rate of patients.

Because the misdiagnosis and missed diagnosis rate of common white light endoscope diagnosis of gastric cancer (especially superficial flat lesion) is quite high, various endoscope diagnosis technologies are generated. However, the use of such endoscopic devices requires not only a high level of operating skill, but also considerable economic support. Therefore, there is an urgent need to develop a simple, readily available, economical, practical, safe and reliable diagnostic technique for the discovery and diagnosis of gastric precancers and precancerous lesions.

Disclosure of Invention

In long-term medical practice, in order to reduce various problems caused by artificial endoscopic diagnosis, the inventor utilizes a machine learning technology, obtains a system for gastric cancer diagnosis through multiple development, repeated optimization and training, and further improves the training efficiency by means of systematic and strict image screening and preprocessing. The diagnosis system of the invention can identify the cancer lesion part in pathological images (such as gastroscope images and real-time images) with very high precision, and the identification rate of the cancer lesion part even exceeds that of a medical specialist physician.

A first aspect of the present invention provides a gastric cancer image recognition system, comprising:

a. the data input module is used for inputting an image containing a gastric cancer lesion part, and the image is preferably an endoscope image;

b. the data preprocessing module is used for receiving the image from the data input module, accurately framing the lesion part of the gastric cancer, defining the part inside the framing as a positive sample and defining the part outside the framing as a negative sample, and outputting coordinate information and/or lesion type information of the lesion part; preferably, before the frame selection, the module also carries out desensitization treatment on the image in advance to remove personal information of the patient;

preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion site; the coordinate information is preferably coordinate information of points at the upper left corner and the lower right corner of the rectangular frame or the square frame;

further preferably, the boxed site is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians randomly divide the 2n endoscopic physicians into n groups and 2 endoscopic physicians/groups, simultaneously randomly divide all the images into n images and randomly distribute the images to all the endoscopic physicians for selecting the images; after the frame selection is finished, comparing the frame selection results of each group of two doctors, evaluating the consistency of the frame selection results between the two doctors, and finally determining frame selection parts, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

further preferably, the criteria for evaluating the consistency of the results of the frame selection between two physicians are as follows:

aiming at each lesion picture, comparing the overlapping area of the framing results of two doctors in each group, if the area (namely intersection) of the overlapping parts of the parts respectively framed and selected by the two doctors in each group is more than 50% of the area covered by the union of the two doctors, considering that the framing judgment results of the two doctors have good consistency, and storing the diagonal coordinates corresponding to the intersection, namely the coordinates of the points at the upper left corner and the lower right corner as the final positioning of the target lesion;

if the area of the overlapped part (i.e. the intersection) is less than 50% of the area covered by the union of the two, the frame selection judgment results of the two doctors are considered to have larger difference, the lesion pictures are separately selected, and all 2n doctors participating in the frame selection work discuss and determine the final position of the target lesion together;

c. the image recognition model building module can receive the image processed by the data preprocessing module and is used for building and training an image recognition model based on a neural network, and the neural network is preferably a convolutional neural network;

d. and the lesion recognition module is used for inputting the image to be detected into the trained image recognition model and judging whether a lesion and/or the position of the lesion exist in the image to be detected based on the output result of the image recognition model.

In one embodiment, the image recognition model building module comprises a feature extractor, a candidate region generator, and a target recognizer, wherein:

the feature extractor is used for extracting features of the image from the data preprocessing module to obtain a feature map, and preferably, the feature extraction is performed through a convolution operation;

the candidate region generator is used for generating a plurality of candidate regions based on the feature map;

the target identifier calculating a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjusting value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value;

preferably, a mini-batch-based gradient descent method is adopted during the training, namely, a mini-batch comprising a plurality of positive candidate regions and negative candidate regions is generated for each training picture; then randomly sampling 256 candidate regions from each picture until the proportion of the positive candidate region to the negative candidate region is close to 1, and then calculating a loss function of the corresponding mini-batch; if the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

further preferably, the learning rate of the first 50000 mini-batchs is set to be 0.001, and the learning rate of the last 50000 mini-batchs is set to be 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

In another embodiment, the feature extractor can perform feature extraction on an input image of any size and/or resolution, which may be the original size and/or resolution, or may be an input image after the size and/or resolution is changed, to obtain a feature map in multiple dimensions (e.g., 256 dimensions or 512 dimensions);

specifically, the feature extractor comprises X convolutional layers and Y sampling layers, wherein the ith (i is between 1 and X) convolutional layer comprises Q _i Size of m x m p _i Wherein m x m represents the pixel values of the length and width of the convolution kernel，p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

wherein X is between 1-20, e.g., 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20; y is between 1 and 10, e.g. 1, 2,3, 4, 5, 6, 7, 8, 9 or 10; m is between 2 and 10, such as 2,3, 4, 5, 6, 7, 8, 9 or 10; p is between 1 and 1024, Q is between 1 and 1024, and the value of p or Q is, for example, 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 64, 128, 256, 512, or 1024, respectively.

In another embodiment, wherein the candidate region generator sets a sliding window in the feature map, the size of the sliding window is n × n, such as 3 × 3; sliding the sliding window along the feature map, and simultaneously for each position where the sliding window is located, corresponding relation exists between the center point of the sliding window and the corresponding position in the original image, and k candidate areas with different scales and aspect ratios are generated in the original image by taking the corresponding position as the center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

In another embodiment, the target recognizer further comprises an intermediate layer, a classification layer and a bounding box regression layer, wherein the intermediate layer is used for mapping data of candidate regions formed by sliding window operation and is a multi-dimensional (for example, 256-dimensional or 512-dimensional) vector;

and the classification layer and the frame regression layer are respectively connected with the intermediate layer, the classification layer is used for judging whether the target candidate region is a foreground (namely a positive sample) or a background (namely a negative sample), and the frame regression layer is used for generating an x coordinate and a y coordinate of the center point of the candidate region and the width w and the height h of the candidate region.

A second aspect of the present invention provides a gastric cancer image recognition apparatus, including a storage unit in which a gastric cancer diagnosis image, an image preprocessing program, and a trainable image recognition program are stored, and preferably further including an arithmetic unit and a display unit;

the device can be trained (preferably supervised training) by using an image recognition program containing images of gastric cancer lesions, so that the trained image recognition program can recognize gastric cancer lesion parts in images to be detected;

preferably, the image to be detected is an endoscopic photograph or a real-time image.

In one embodiment, wherein the image preprocessing procedure precisely frames a lesion site of gastric cancer in the gastric cancer diagnostic image, the portion inside the frame is defined as a positive sample, and the portion outside the frame is defined as a negative sample, and outputs position coordinate information and/or lesion type information of the lesion; preferably, before the frame selection, desensitization treatment is carried out on the image in advance to remove personal information of the patient;

preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion site; the coordinate information is preferably coordinate information of points at the upper left corner and the lower right corner;

also preferably, the boxed site is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians randomly divide the 2n endoscopic physicians into n groups and 2 endoscopic physicians/groups, simultaneously randomly divide all the images into n images and randomly distribute the images to all the endoscopic physicians for selecting the images; after the framing is finished, comparing the framing results of each group of two doctors, evaluating the consistency of the framing results between the two doctors, and finally determining the framing part, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

further preferably, the criteria for evaluating the consistency of the frame-selected results between two physicians are as follows:

aiming at each lesion image, comparing the overlapping areas of the framing results of the 2 doctors in each group, and if the area (namely intersection) of the overlapping parts of the parts respectively framed and selected by the two doctors in each group is larger than 50% of the area covered by the union of the two doctors, considering that the framing judgment results of the 2 doctors have good consistency, and storing the diagonal coordinates corresponding to the intersection as the final positioning of the target lesion;

if the area of the overlapped part (i.e. the intersection) is less than 50% of the area covered by the union of the two, the framing judgment results of 2 doctors are considered to be greatly different, such lesion pictures are separately selected, and all 2n doctors participating in the framing work discuss the final position of the target lesion together.

In another embodiment, the image recognition program is a trainable neural network based image recognition program, preferably a convolutional neural network; preferably, the image recognition program comprises a feature extractor, a candidate region generator and an object recognizer, wherein:

the feature extractor is configured to perform feature extraction on the image to obtain a feature map, and preferably, the feature extraction is performed by a convolution operation;

the target identifier calculates a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjusting value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value;

in another embodiment, wherein the training is performed using a mini-batch based gradient descent method, a mini-batch comprising multiple positive and negative candidate regions is generated for each training picture. Then 256 candidate regions are randomly sampled from each picture until the ratio of the positive candidate region to the negative candidate region approaches 1, and then the corresponding mini-batch penalty function is calculated. If the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

preferably, the learning rate of the first 50000 mini-batch is set to 0.001, and the learning rate of the last 50000 mini-batch is set to 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

specifically, the feature extractor comprises X convolutional layers and Y sampling layers, wherein the ith (i is between 1 and X) convolutional layer comprises Q _i Size of m x m p _i Where m x m represents the pixel values of the length and width of the convolution kernel, p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

In another embodiment, wherein the candidate region generator sets a sliding window in the feature map, the size of the sliding window is n × n, such as 3 × 3; sliding the sliding window along the characteristic map, and simultaneously for each position where the sliding window is locatedThe central point and the corresponding position in the original image have a corresponding relation, and k candidate areas with different scales and aspect ratios are generated in the original image by taking the corresponding position as the center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

A third aspect of the invention provides the use of the system of the first aspect or the device of the second aspect of the invention for the prediction and diagnosis of gastric cancer and/or pre-gastric lesions.

A fourth aspect of the invention provides use of the system of the first aspect or the device of the second aspect of the invention for identification of a gastric cancer image or a lesion in a gastric cancer image.

A fifth aspect of the invention provides the use of the system of the first aspect or the device of the second aspect of the invention for real-time diagnosis of gastric cancer and/or pre-gastric lesions.

A sixth aspect of the invention provides use of the system of the first aspect of the invention or the device of the second aspect of the invention for real-time identification of gastric cancer images or lesions in gastric cancer images.

The inventor finds that the stomach cancer lesion part has self characteristics, namely the lesion part is not obvious enough and is not clear enough with surrounding tissue boundaries, so that the difficulty of image recognition model training is higher than that of a conventional task (such as recognition of objects in life), and the training is difficult to converge and fails due to carelessness. In the invention, the inventor improves the training method (for example, precisely defines the target lesion position in the training image by frame selection, improves the recognition accuracy of the image recognition model, and the like) through the image recognition model based on the neural network, so as to obtain a recognition system (and/or device) for intelligently and efficiently recognizing the gastric cancer lesion in the endoscopic image, wherein the recognition rate of the recognition system (and/or device) is higher than that of a common endoscopic physician. The real-time diagnosis system after the machine learning is strengthened can also monitor and identify the pathological changes of the digestive tract and the position and the probability thereof in real time, thereby greatly improving the detection rate of common doctors on the gastric cancer, reducing the misdiagnosis rate and providing a safe and reliable technology for the diagnosis of the gastric cancer.

Drawings

FIG. 1 is an endoscopic image including a gastric cancer lesion site

FIG. 2 is a schematic diagram of the framing process

FIG. 3 shows the lesion site of gastric cancer identified by the image recognition system of the present invention.

Detailed description of the preferred embodiments

Unless otherwise indicated, terms used in the present disclosure have the ordinary meaning as understood by one of ordinary skill in the art. The following terms are used in the present disclosure to define the meaning of some terms, if inconsistent with other definitions.

Definition of

The term "gastric cancer" refers to malignant tumors derived from epithelial cells of the gastric mucosa, including early stage gastric cancer and advanced stage gastric cancer.

The term "module" refers to a set of functions that can achieve a specific effect, and the module can be executed by a computer alone, a human, or both.

Obtaining lesion data

The key role of the step of obtaining lesion data is to obtain sample material for deep learning.

In one embodiment, the acquisition process may specifically include the steps of collection and prescreening.

The term "acquiring" refers to searching and acquiring all endoscopic diagnostic images of all patients with gastric cancer in all endoscopic databases according to the standard of "diagnosing gastric cancer", for example, all images in a folder to which a patient diagnosed with gastric cancer belongs, that is, all stored images of a certain patient in the whole endoscopic examination process, and therefore, the acquired images may also include endoscopic examination images of other target parts than pathological changes, for example, the patient is diagnosed with benign ulcer, polyp, etc., but the named folder also includes stored images of each part in the examination process of esophagus, fundus, stomach body, duodenum, etc.

The preliminary screening is a step of screening the acquired pathological images of the gastric cancer patients, and can be specifically performed by experienced endoscopists according to the description of the relevant contents in the endoscopic examination and pathological diagnosis. Since the pictures used for the deep learning network must be clear in quality and accurate in characteristics, otherwise, the learning difficulty is increased or the recognition result is inaccurate. Therefore, the module and/or the step of primary screening of lesion data can select pictures with specific gastric cancer focus parts from a set of examination pictures.

Importantly, the primary screening can accurately position the pathological changes by combining the histopathological results of the biopsy of the patient, namely the description of the atrophy part in the pathological diagnosis, and simultaneously considers the definition, the shooting angle, the amplification degree and the like of the picture, and selects the endoscopic images which have high definition and moderate amplification degree and can peep the whole appearance of the pathological changes as much as possible.

Through preliminary screening, can guarantee that the picture of input training set all is high-quality image that contains the position of confirming pathological change to improve the characteristic accuracy of the data set of typing into the training, so that artificial intelligence network can conclude the image characteristic of atrophic pathological change from it to betterly, improve the diagnosis rate of accuracy.

Lesion data preprocessing

The preprocessing is to finish the process of precisely frame selecting the focus part of the gastric cancer, the part in the frame selection is defined as a positive sample, the part outside the frame selection is defined as a negative sample, and the position coordinate information and the focus type information of the focus are output.

In one embodiment, lesion data preprocessing is implemented in whole or in part by an "image preprocessing routine".

The term "image preprocessing program" refers to a program that enables the framing of a target area in an image, thereby indicating the type and extent of the target area.

In one embodiment, the image pre-processing program is also capable of desensitizing the image to remove patient personal information.

In one embodiment, the image pre-processing program is software written in a computer programming language capable of performing the aforementioned functions.

In another embodiment, the image pre-processing program is software capable of performing a framing function.

In a specific embodiment, the software executing the framing function can import the picture to be processed into the software and display the picture on the operation interface, at this time, the operator (e.g. a doctor) performing the framing only needs to drag the mouse along the direction from top left to bottom right (or other diagonal directions) at the target lesion site to be framed, so as to form a rectangular frame or a square frame covering the target lesion, and simultaneously, the background generates and stores the accurate coordinates of the top left corner and the bottom right corner of the rectangular frame for unique positioning.

In order to ensure the accuracy of preprocessing (or frame selection), the invention further strengthens the control of frame selection quality, which is an important guarantee that the method/system of the invention can obtain higher accuracy, and the concrete mode is as follows: selecting 2n (such as 6, 8, 10 and the like) endoscopists to perform frame selection in a back-to-back mode, namely randomly dividing 2n people into n groups and 2 people/group, simultaneously randomly dividing all screened training images into n parts equally and randomly distributing the n parts to all groups of physicians to perform frame selection; and after the selection is finished, comparing the selection results of each group of 2 doctors, evaluating the consistency of the selection results between the two doctors, and finally determining the selection part.

In one embodiment, the criteria for assessing consistency are: for the same lesion picture, comparing the framing result of each group of 2 physicians, that is, comparing the overlapping area of the rectangular frames determined by the diagonal coordinates, and if the area of the overlapping part (i.e., intersection) of the two rectangular frames is greater than 50% of the area covered by the union of the two rectangular frames, the framing judgment result of the 2 physicians is considered to have good consistency, and the diagonal coordinates corresponding to the intersection are stored as the final positioning of the target lesion. On the contrary, if the area (i.e. intersection) of the overlapped part of the two rectangular frames is less than 50% of the area covered by the union of the two rectangular frames, the frame selection judgment results of 2 physicians are considered to be greatly different, such lesion pictures will be individually selected by the software background, and in the later stage, all the physicians participating in the frame selection work jointly discuss and determine the final position of the target lesion.

Image recognition model

The term "image recognition model" refers to an algorithm that is built based on the principles of machine learning and/or deep learning, and may also be referred to as a "trainable image recognition model" or "image recognition program".

In one embodiment, the program is a neural network, preferably a convolutional neural network; in another embodiment, the neural network is based on a convolutional neural network of LeNet-5, RCNN, SPP, fast-RCNN, and/or Faster-RCNN architecture; where the master-RCNN can be viewed as a combination of Fast-RCNN and RPN, in one embodiment, based on a master-RCNN network.

The image recognition program comprises at least the following levels: the original image feature extraction layer, the candidate area selection layer and the target identification layer are used for adjusting trainable parameters through a preset algorithm.

The term "original image feature extraction layer" refers to a level or a level combination capable of performing mathematical computation on an input image to be trained so as to extract original image information in multiple dimensions. The layer may actually represent a combination of a plurality of different functional layers.

In one embodiment, the artwork feature extraction layer may be based on ZF or VGG16 networks.

The term "convolution layer" refers to a network layer in the original image feature extraction layer, which is responsible for performing convolution operation on the original input image or the image information processed by the sampling layer, so as to extract information. The convolution operation is actually performed by sliding a convolution kernel (e.g. 3 x 3) of a certain size over the input image in certain steps (e.g. 1 pixel), multiplying the pixels on the slice with the corresponding weights of the convolution kernel during the convolution kernel movement, and finally adding all the products to obtain an output. In image processing, an image is often represented as a vector of pixels, so that a digital image can be regarded as a discrete function in a two-dimensional space, for example, represented as f (x, y), and if there is a function C (u, v) for two-dimensional convolution operation, an output image g (x, y) = f (x, y) × C (u, v) is generated, and image blurring processing and information extraction can be realized by convolution.

The term "training" refers to repeatedly self-adjusting parameters of a trainable image recognition program by inputting a large number of manually labeled samples, so as to achieve the intended purpose, i.e., recognizing a lesion in a gastric cancer image.

In one embodiment, the present invention is based on a faster-rcnn network and employs the following end-to-end training method in step S4:

(1) Initializing parameters of a target candidate region generation network (RPN) by using a model pre-trained on ImageNet, and finely adjusting the network;

(2) Initializing Fast R-CNN network parameters by using a pre-trained model on ImageNet, and then training by using region pro-posal extracted from the RPN network in (1);

(3) Reinitializing the RPN by using the Fast R-CNN network in the step (2), and finely adjusting the RPN network by fixing the convolution layer, wherein only the cls and/or reg layer of the RPN in the fine adjustment is adjusted;

(4) Fixing the convolution layer of Fast R-CNN in (2), and fine-tuning the Fast R-CNN network by using region pro posal extracted from RPN in (3), wherein only the full connection layer of Fast R-CNN is fine-tuned.

The term "candidate region selection layer": the method refers to a hierarchy or a hierarchy combination for classification recognition and border regression by selecting a specific area on an original image through an algorithm, and similar to an original image feature extraction layer, the layer can also represent a combination of a plurality of different layers.

In one embodiment the candidate region selection layers are directly connected with respect to the original input layer.

In one embodiment, the candidate area selection layer is directly connected to the last layer of the artwork feature extraction layer.

In one embodiment, the "candidate region selection layer" may be based on the RPN.

The term "target recognition layer" the term "sampling layer", which may sometimes be called a pooling layer, operates similarly to a convolutional layer except that the convolutional kernels of the sampling layer are the maximum, average, etc., taken only at the corresponding locations (max pooling, average pooling).

The term "feature map" refers to a small-area high-dimensional multi-channel image obtained by performing convolution operation on an original image through an original image feature extraction layer, and the feature map may be a 256-channel image with a scale of 51 × 39, for example.

The term "sliding window" refers to a window of small size (e.g., 2 x 2,3 x 3) generated on a feature map, moving along each position of the feature map, although the feature map size is not large, but because the feature map has undergone multiple layers of data extraction (e.g., convolution), a larger field of view can be achieved using a smaller sliding window on the feature map.

The term "candidate region" may also be referred to as a candidate window, a target candidate region, a reference box, a bounding box, and may also be used interchangeably with an anchor or an anchor box herein.

In one embodiment, first, a sliding window is positioned to a position of the feature map, for the position, k rectangular or square windows with different areas and different proportions, for example, 9 windows, are generated and anchored to the center of the position, and therefore, called anchors or anchors box, and based on the relationship between each sliding window in the feature map and the center position of the original, a candidate region is formed, which can be essentially considered as the original region range corresponding to the sliding window (3 x 3) moved on the last convolution layer.

In one embodiment of the present invention, k =9, the generating of the candidate region comprises the steps of:

(1) Firstly, generating 9 types of anchor boxes according to different areas and aspect ratios, wherein the anchor boxes do not change according to the size of a feature map or an original input image;

(2) For each input image, calculating the central point of the original image corresponding to each sliding window according to the size of the image;

(3) And establishing a mapping relation between the position of the sliding window and the position of the original image based on the calculation.

The term "intermediate layer" refers to a new level, which is referred to as an intermediate layer in the present invention, by further mapping the feature map into a multi-dimensional (e.g. 256-dimensional or 512-dimensional) vector after the target candidate region is formed by using the sliding window. And the middle layer is connected with the classification layer and the window regression layer.

The term "classification layer" (cls _ score), a branch connected to the intermediate layer output, the branch being capable of outputting 2k scores, one of which is a foreground (i.e. positive sample) score and one of which is a background (i.e. negative sample) score, corresponding to two scores of k target candidate regions, and this score can determine whether the target candidate region is a true target or background. Thus, for each sliding window position, the classification layer can output the probability of belonging to the foreground (i.e., positive samples) and the background (i.e., negative samples) from the high-dimensional (e.g., 256-dimensional) features.

Specifically, in one embodiment, the candidate region and any group-channel box (true sample boundary, i.e. the boundary of the object to be identified in the original image) whose IOU (cross-over ratio) is greater than 0.7 may be regarded as a positive sample or a positive label, and when the candidate region and any group-channel box whose IOU is less than 0.3, it may be regarded as a background (i.e. a negative sample), and each anchor is assigned a class label. The IOU represents the overlapping degree of the candidate area and the ground-route box from the mathematics, and the calculation method is as follows:

IOU＝(A∩B)/(A∪B)

the classification layer may output a k +1 dimensional array p representing the probability of belonging to class k and the background. For each RoI (Region of interest), a discrete probability distribution is output, and p is calculated by using softmax for the k +1 type full connection layer. The mathematical expression is as follows:

p＝(p ₀ ，p ₁ ，...，p _k )

the term "window regression layer" (bbox _ pred), another branch of the connection to the intermediate layer output, is juxtaposed to the classification layer. This layer can output parameters that at each position, 9 anchors correspond to the window should be scaled by translation. Corresponding to k target candidate regions respectively, each target candidate region having 4 frame position adjustment values, wherein the 4 frame position adjustment values refer to x of the upper left corner of the target candidate region _a Coordinate, y _a Coordinates and height h of target candidate area _a And width w _a The adjustment value of (2). The branch circuit is used for finely adjusting the position of the target candidate region, so that the position of the finally obtained result is more accurate.

The window regression layer can output the displacement of the bounding box regression, and output 4 × k dimensional arrays t which represent parameters which should be translated and scaled when the parameters respectively belong to the k classes. The mathematical expression is as follows:

k denotes an index of the category,

refers to translation that is invariant with respect to the object pro posal scale,

refers to the height and width of object propofol in logarithmic space.

In one embodiment, the present invention implements simultaneous training of the classification layer and the window regression layer by a Loss function (Loss function) which is composed of a classification Loss (i.e., classification layer software Loss) and a regression Loss (i.e., L1 Loss) with a certain weight. :

calculating a calibration result and a prediction result of a ground truth corresponding to a candidate area required by softmax loss; three sets of information are required for computing the regression loss:

(1) Predicting coordinates x, y and width and height w, h of the center position of the candidate region;

(2) Position coordinates x of each central point in 9 anchor point reference boxes around the candidate area _a ,y _a And width and height w _a ,h _a 。

(3) And the position coordinates x, y and the width and height w, h of the center point corresponding to the real calibration frame (ground route).

The regression Loss and total Loss calculation modes are as follows:

t _x ＝(x-x _a )/w _a ，t _y ＝(y-y _a )/h _a ，

t _w ＝log(w/w _a )，t _h ＝log(h/h _a )，

wherein p is _i The probability of being the target is predicted for the anchor.

There are two values of the number of the bits,

equal to 0 is a negative label for the tag,

equal to 1 is a positive label.

t _i Representation predictionOf the candidate region of (4) is determined.

The coordinate vector representing the group truth bounding box corresponding to the positive anchor.

In one embodiment, in the training of the loss function, a mini-batch-based gradient descent method is adopted, i.e. a mini-batch comprising a plurality of positive and negative candidate regions (anchors) is generated for each training picture. Subsequently, 256 anchors were randomly sampled from each picture until the ratio of positive and negative anchors was close to 1, and then the corresponding mini-batch Loss function (Loss function) was calculated. If the number of positive anchors in a picture is less than 128, then negative anchors are used to fill in the mini-batch.

In a specific embodiment, the learning rate of the first 50000 mini-batchs is set to 0.001, and the learning rate of the last 50000 mini-batchs is set to 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

After the training, the trained deep learning network is used for recognizing the endoscope picture of the target lesion. In one embodiment, the classification score is set to 0.85, i.e., the deep learning network confirms that the lesion probability exceeds 85% of the lesions will be marked, and thus the picture is determined to be positive; conversely, if no suspicious lesion area is detected in a picture, the picture is determined to be negative.

Examples

1. Exemption from informed consent statement:

(1) The research only utilizes the endoscope picture and related clinical data obtained by the endoscope center of the digestive department of the Beijing friendship hospital in the past clinical diagnosis and treatment to carry out retrospective observation research, and does not cause any influence on the disease condition, treatment, prognosis and even life safety of patients;

(2) One main researcher finishes all data acquisition work independently, and immediately applies special software to wipe off personal information processing to all pictures after the picture data acquisition is finished, so that the privacy information of a patient is not leaked in the subsequent processes of screening, framing and artificial intelligent programming expert entry training, debugging and testing;

(3) In an electronic medical record query system of an endoscope center in a gastroenterology department, terms such as contact information or home address are not set for displaying, namely the system does not input contact information of a patient, so that the study cannot trace back that the patient enters an informed consent.

2. Pathological image acquisition

Inclusion criteria：

(1) Patients who are subjected to endoscopy (including an electronic gastroscope, an electronic colonoscope, an ultrasonic endoscope, an electronic staining endoscope, a magnifying endoscope and a pigment endoscope) at the hospital digestive endoscopy center of Beijing friendship from 1 month and 1 day in 2013 to 6 months and 10 days in 2017;

(2) Patients with a sub-scope diagnosis of "gastric cancer" (including and without distinction between early stage and advanced stage gastric cancers);

exclusion criteria：

(1) The affected part of the malignant tumor of the digestive tract is wide or undefined;

(2) Those with only malignant tumors of the pancreatic-biliary system;

(3) Meanwhile, other system malignant tumor patients are merged;

(2) The endoscope picture is not clear and/or the shooting angle is not satisfactory.

3. Experimental procedures and results

(1) Data acquisition: the method comprises the steps that a researcher searches for endoscope pictures and related clinical data of a patient who receives endoscopic examination (including an electronic gastroscope, an electronic colonoscope, an ultrasonic endoscope, an electronic staining endoscope, a magnifying endoscope and a chromoscope) from 2013, 1 month and 1 day to 2017, 6 months and 10 days in gastroenterology department of Hospital department of friendship from an electronic medical record system, wherein the patient has gastric cancer (including early gastric cancer and advanced gastric cancer without distinguishing) under the endoscope;

(2) Erasing personal information: and immediately erasing personal information of all pictures after collection.

(3) Picture screening: finishing all the processed pictures, screening endoscopic pictures corresponding to the cases with definite pathological results and determined as gastric cancer, and finally screening clear pictures with little background interference in each case according to biopsy pathological parts, wherein the number of the pictures is 3774;

(4) Constructing a test data set: the total number of test pictures is 100, and the test pictures comprise 50 gastric cancers (both early gastric cancers and advanced gastric cancers) confirmed by pathological results, and 50 endoscopic pictures of non-tumor lesions (including benign ulcers, polyps, interstitial tumors, lipomas and ectopic pancreas) of the stomach confirmed by pathological results are randomly collected in a database. The specific operation comprises the following steps:

firstly, randomly selecting 50 pictures of all the gastric cancer pictures screened in the step (3);

and randomly collecting 'non-tumor lesion' of stomach confirmed by pathological result in database "

50 endoscopic pictures (including benign gastric ulcer, polyp, interstitial tumor, lipoma and ectopic pancreas) are obtained, and personal information is immediately erased from the 50 pictures;

(5) Constructing a training data set: randomly selecting pictures for constructing a test data set in the step (4) from the stomach cancer pictures screened in the step (3), and using the rest 3724 pictures for deep learning network training to form a training data set;

(6) And (3) framing target lesions: 6 endoscopists randomly divide 6 persons into 3 groups and 2 persons/group in a back-to-back mode; all screened training pictures were equally divided into 3 randomly and assigned to each group of physicians for selection. The implementation of the lesion framing step is based on self-written software, the software can display the picture to be processed on an operation interface after the picture is imported into the software, at the moment, a doctor needs to drag a mouse along the direction from the upper left to the lower right at the target lesion part to be framed, so that a rectangular frame covering the target lesion is formed, and meanwhile, accurate coordinates of the upper left corner and the lower right corner of the rectangular frame are generated and stored in a background to be uniquely positioned.

After the framing is finished, the framing results of each group of 2 doctors are compared, for the same pathological change picture, the overlapping area of the rectangular frames determined by the diagonal coordinates is compared, after testing, if the area (namely intersection) of the overlapping part of the two rectangular frames is larger than 50% of the area covered by the union of the two rectangular frames, the framing judgment results of the 2 doctors are considered to be good in consistency, and the diagonal coordinates corresponding to the intersection are stored as the final positioning of the target pathological change. On the contrary, if the area of the overlapping portion of the two rectangular frames (i.e. the intersection) is less than 50% of the area covered by the union of the two rectangular frames, the frame selection judgment results of the 2 physicians are considered to be greatly different, then such lesion pictures will be individually selected by the software background (or manually marked), and in the later period, all the physicians participating in the frame selection work jointly discuss and determine the final position of the target lesion.

(7) Inputting training: recording all the frames of the selected pictures into a faster-rcnn convolution-based neural network for training, and testing two network structures of ZF and VGG 16; training is performed in an end-to-end mode;

the ZF network comprises 5 convolutional layers, 3 fully-connected layers and a softmax classified output layer, the VGG16 network comprises 13 convolutional layers, 3 fully-connected layers and a softmax classified output layer, and both ZF and VGG16 models are basic CNN used for extracting the characteristics of the training images under the framework of fast-RCNN.

During training, a mini-batch gradient descent method based on the mini-batch is adopted, namely, a mini-batch comprising a plurality of positive candidate regions and negative candidate regions (anchors) is generated for each training picture. Subsequently, 256 anchors were randomly sampled from each picture until the ratio of positive and negative anchors was close to 1, and then the Loss function (Loss function) of the corresponding mini-batch was calculated. If the number of positive anchors in a picture is less than 128, then negative anchors are used to fill in the mini-batch.

Setting the learning rate of the first 50000 mini-batch to 0.001 and setting the learning rate of the last 50000 mini-batch to 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

The penalty Function (Loss Function) used in training is as follows:

in the above formula, i represents the index of anchor in each batch, p _i A probability representing whether the anchor is a target (Object); p is a radical of _i * Is the true tag of the anchor: when the anchor is Object, the label is 1, otherwise the label is 0.t is t _i Is a 4-dimensional vector representing the parameterized coordinates of the bounding box, respectively, and t _i * The label representing the bounding box parameterized coordinates used in the bounding box regression prediction.

(8) Testing and result statistics: the test data sets (including 50 stomach cancer pictures and 50 stomach non-tumor lesion pictures) are utilized to respectively test an artificial intelligent system and gastroenterologists of different ages, compare and evaluate the indexes of the artificial intelligent system and the gastroenterologists in terms of diagnosis, such as sensitivity, specificity, accuracy, consistency and the like, and perform statistical analysis. In the test, the classification score of the trained deep learning network when used for identifying the endoscope picture of the target lesion is set to be 0.85, namely the deep learning network confirms that the lesion with the probability of exceeding 85 percent is marked, so that the picture is judged to be positive; conversely, if a suspicious lesion area is not detected in a picture, the picture is determined to be negative.

The results are as follows:

based on a platform of a national digestive disease clinical research center, in a lesion diagnosis test under a gastric cancer endoscope, the overall sensitivity fluctuation of 89 doctors is in a range of 48-100%, wherein the number of bits is 88%, and the average sensitivity is 87%; the specificity varied from 10% to 98% (78% of the median, the average specificity was 74%) and the accuracy varied from 51% to 91% (82% of the median, the average accuracy was 80%). The recognition sensitivity of deep learning network model diagnosis is 90%, the specificity is 50%, and the accuracy is 70%. Therefore, in the aspect of stomach cancer diagnosis based on gastroscope pictures, the sensitivity of artificial intelligence is higher than the level of general doctors, the specificity is lower than the median level, the accuracy is slightly lower than the median level of the doctors, but the deep learning network model diagnosis model has excellent stability in recognition, and different doctors have great fluctuation and instability in the aspects of specificity and accuracy, so that the diagnosis deviation caused by the individual difference of the doctors can be effectively eliminated by using the artificial intelligence to recognize the focus, and the application prospect is good.

Sensitivity is also called Sensitivity (SEN), also called True Positive Rate (TPR), which is the percentage of actual disease that is correctly diagnosed by a diagnostic standard.

Specificity, also known as Specificity (SPE), also known as True Negative Rate (TNR), reflects the ability of screening tests to identify non-patients.

Accuracy = total number of correctly identified individuals/total number of identified individuals.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A gastric cancer image recognition system, comprising:

a. the data input module is used for inputting an image containing a gastric cancer lesion part, wherein the image is preferably an endoscope image;

b. the data preprocessing module is used for receiving the image from the data input module, precisely framing the lesion part of the gastric cancer, defining the part inside the framing as a positive sample and defining the part outside the framing as a negative sample, and outputting coordinate information and/or lesion type information of the lesion part; preferably, before the frame selection, the module also carries out desensitization treatment on the image in advance to remove personal information of the patient;

preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion; the coordinate information is preferably coordinate information of points at the upper left corner and the lower right corner of the rectangular frame or the square frame;

further preferably, the boxed location is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians randomly divide the 2n endoscopic physicians into n groups and 2 endoscopic physicians/groups, simultaneously randomly divide all the images into n images and randomly distribute the images to all the endoscopic physicians for selecting the images; when the frame selection is completed, comparing the frame selection results of each group of two physicians, and evaluating the consistency of the frame selection results between the two physicians, and finally determining a frame selection part, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

further preferably, the criterion for evaluating the consistency of the results of the frame selection between two physicians is as follows:

if the area of the overlapped part (namely the intersection) is less than 50% of the area covered by the union of the two, the frame selection judgment results of the two doctors are considered to have larger difference, the lesion pictures are selected independently, and all 2n doctors participating in the frame selection work discuss and determine the final position of the target lesion together;

2. The system of claim 1, the image recognition model building module comprising a feature extractor, a candidate region generator, and a target identifier, wherein:

the feature extractor is used for performing feature extraction on the image from the data preprocessing module to obtain a feature map, and preferably, the feature extraction is performed through a convolution operation;

the target identifier calculating a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjustment value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value;

it is also preferable that the training is performed by a mini-batch based gradient descent method, that is, a mini-batch containing a plurality of positive and negative candidate regions is generated for each training picture; then randomly sampling 256 candidate regions from each picture until the proportion of the positive candidate region to the negative candidate region is close to 1, and then calculating a loss function of the corresponding mini-batch; if the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

further preferably, the learning rate of the first 50000 mini-batch is set to 0.001, and the learning rate of the last 50000 mini-batch is set to 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

3. The system of claim 2, wherein the feature extractor can perform feature extraction on an input image of any size and/or resolution, the image may be an original image size and/or resolution, or an input image with changed size and/or resolution, to obtain a feature map in multiple dimensions (e.g., 256 dimensions or 512 dimensions);

specifically, the feature extractor comprises X volumesA packed layer and Y sampling layers, wherein the ith (i is between 1-X) packed layer contains Q _i Size of m x m p _i Where m x m represents the pixel values of the length and width of the convolution kernel, p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

4. A system according to claim 2 or 3, wherein the candidate region generator sets a sliding window in the feature map, the sliding window having a size of n x n, such as 3 x 3; enabling the sliding window to slide along the feature map, enabling a central point of each position where the sliding window is located to have a corresponding relation with a corresponding position in the original image, and generating k candidate regions with different scales and aspect ratios in the original image by taking the corresponding position as a center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

5. The system according to any one of claims 2-4, wherein the target identifier further comprises an intermediate layer, a classification layer and a bounding box regression layer, wherein the intermediate layer is used for mapping data of candidate regions formed by sliding window operations, and is a multi-dimensional (e.g. 256-dimensional or 512-dimensional) vector;

the classification layer and the frame regression layer are respectively connected with the intermediate layer, the classification layer is used for judging whether the target candidate region is a foreground (positive sample) or a background (negative sample), and the frame regression layer is used for generating an x coordinate and a y coordinate of a central point of the candidate region and a width w and a height h of the candidate region.

6. A stomach cancer image recognition device comprises a storage unit for storing a stomach cancer diagnosis image, an image preprocessing program and a trainable image recognition program, and preferably further comprises an arithmetic unit and a display unit;

the device can be trained (preferably supervised training) by using an image recognition program containing images of gastric cancer lesions, so that the trained image recognition program can recognize gastric cancer lesion parts in an image to be detected;

7. The apparatus according to claim 6, wherein the image preprocessing program precisely frames a lesion site of gastric cancer in the gastric cancer diagnostic image, the portion inside the frame is defined as a positive sample, and the portion outside the frame is defined as a negative sample, and outputs position coordinate information and/or lesion type information of the lesion; preferably, before the frame selection, desensitization treatment is carried out on the image in advance to remove personal information of the patient;

also preferably, the boxed site is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians are randomly divided into n groups, 2 endoscopic physicians/group, all the images are randomly divided into n parts at the same time, and the n parts are randomly distributed to all the endoscopic physicians for selecting the images; when the frame selection is completed, comparing the frame selection results of each group of two physicians, evaluating the consistency of the frame selection results between the two physicians, and finally determining a frame selection part, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

aiming at each lesion image, comparing the overlapping area of the framing result of each group of 2 doctors, if the overlapping area (namely intersection) of the parts respectively framed by the two doctors in each group is more than 50% of the area covered by the union of the two doctors, considering that the framing judgment result of the 2 doctors has good consistency, and storing the diagonal coordinate corresponding to the intersection as the final positioning of the target lesion;

8. The apparatus of claim 6 or 7, the image recognition program being a trainable neural network based image recognition program, the neural network preferably being a convolutional neural network; preferably, the image recognition program comprises a feature extractor, a candidate region generator and an object recognizer, wherein:

the target identifier calculating a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjustment value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value.

9. The apparatus according to any one of claims 6 to 8, wherein in the training, a mini-batch based gradient descent method is used, i.e. a mini-batch comprising a plurality of positive and negative candidate regions is generated for each training picture. Then 256 candidate regions are randomly sampled from each picture until the ratio of positive candidate region to negative candidate region approaches 1, and then the corresponding mini-batch penalty function is calculated. If the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

preferably, the learning rate of the first 50000 mini-batchs is set to be 0.001, and the learning rate of the last 50000 mini-batchs is set to be 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

10. The apparatus according to claim 8 or 9, wherein the feature extractor is capable of performing feature extraction on an input image of any size and/or resolution, the image may be an original image size and/or resolution, or an input image after the size and/or resolution is changed, so as to obtain a feature map of multiple dimensions (for example, 256 dimensions or 512 dimensions);

specifically, the feature extractor comprises X convolutional layers and Y sampling layers, wherein the ith convolutional layer (i is between 1-X) comprises Q _i Size of m x p _i Wherein m x m represents the pixel values of the length and width of the convolution kernel, p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

11. The apparatus according to any of claims 8 to 10, wherein the candidate region generator sets a sliding window in the feature map, the sliding window having a size of n x n, such as 3 x 3; enabling the sliding window to slide along the feature map, enabling a central point of each position where the sliding window is located to have a corresponding relation with a corresponding position in the original image, and generating k candidate regions with different scales and aspect ratios in the original image by taking the corresponding position as a center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

12. The apparatus of any one of claims 8 to 11, wherein the target identifier further comprises an intermediate layer, a classification layer and a bounding box regression layer, wherein the intermediate layer is used for mapping data of candidate regions formed by sliding window operations, and is a multi-dimensional (e.g. 256-dimensional or 512-dimensional) vector;

and the classification layer and the frame regression layer are respectively connected with the intermediate layer, the classification layer is used for judging whether the target candidate region is a foreground (namely a positive sample) or a background (namely a negative sample), and the frame regression layer is used for generating an x coordinate and a y coordinate of a central point of the candidate region and a width w and a height h of the candidate region.

13. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for the prediction and diagnosis of gastric and/or pre-gastric cancer lesions.

14. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 in gastric cancer images or identification of lesion sites in gastric cancer images.

15. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for real-time diagnosis of gastric and/or pre-gastric cancer lesions.

16. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for real-time identification of gastric cancer images or lesion sites in gastric cancer images.